If you are polite, AI may not necessarily give a better answer

Public opinion on whether AI is valuable is almost as frequent as the latest coffee or wine verdicts – a month of celebration, a challenge to the next month. Even so, more and more users are now adding ‘please’ or ‘Thanks’ At their prompt, not only out of habit, but also concern that cruel communication may continue into real life, but also because people believe that politeness will bring better, more productive results from AI.
This hypothesis has been circulated between users and researchers and has been studied rapidly in the research circle as a tool for alignment, security and tone control, even as user habits reinforce and reshape these expectations.
For example, a 2024 study from Japan found that rapid politeness can change the behavior of language models, test GPT-3.5, GPT-4, Palm-2, and Claude-2 on English, Chinese, and Japanese tasks, and rewrite each prompt at three polite levels. The authors of this work observed that the wording of “blunt” or “rude” resulted in lower factual accuracy and shorter answers, while the request for moderate politeness produced clearer explanations and fewer rejections.
Additionally, Microsoft puts forward a polite tone with the co-pilot from a performance rather than a culture perspective.
But a new research paper from George Washington University challenges this increasingly popular idea by proposing a mathematical framework that predicts when the output of large language models will “crash” from coherent to misleading and even dangerous content. In this case, the author considers it polite No meaningful delay Or prevent This “collapse”.
tip
Researchers believe that polite language use is often not related to the main topic of the prompt and therefore does not meaningfully affect the focus of the model. To support this, they introduce a detailed statement of how a single attention updates its internal orientation when processing each new token, ostensibly indicating that the model’s behavior is made by Cumulative impact Token with content.
As a result, polite language is considered to have little basis when it starts to degrade. What’s the decision Critical pointThe paper points out that meaningful tokens align with good or bad output paths overall, rather than the existence of socially polite language.
Illustrations that simplify attention heads generate sequences from user prompts. The model starts with a good token (g) and hits a critical point (n*) and the output will be converted to a bad token (b). The polite terms (p₁, p₂, etc.) in the prompt have no effect in this transition, supporting the claim in this paper that politeness has little effect on model behavior. Source: https://arxiv.org/pdf/2504.20980
If correct, this result contradicts both general beliefs and even the implicit logic of instructional adjustments, which assumes that the wording of the prompt affects the model’s interpretation of user intentions.
clumsy
The paper examines the internal context vector of the model (its evolving compass of token selection) Transfer Passed from generation to generation. For each token, this vector is updated in direction and the next token is selected, which is most closely aligned with the candidate.
The model’s response remains stable and accurate when prompted to intervene in good content. But over time, this directional pulling force can Revoketurning the model toward an increasingly off-topic, incorrect or internally inconsistent output.
The critical point of this transition (the author defines iteration in mathematics) n*), occurs when the context vector is more consistent with the “bad” output vector than the “good” output vector. At that stage, each new token further pushes the model along the wrong path, enhancing patterns of increasingly flawed or misleading output.
Critical point n* Calculate by finding the moment when the internal direction of the model is aligned with the good and bad output types averagely. The geometry of the embedding space shaped by the training corpus and user prompts determines the speed of this crossing:

An illustration that illustrates how the critical point n* appears in the author’s simplified model. Geometric settings (a) define the critical vector for when prediction output changes from good to bad. In (b), the authors plot these vectors using test parameters, while (c) compares the predicted critical points with the simulation results. The game is accurate, supporting the researchers’ claim that once the internal dynamics cross the threshold, a mathematical collapse is inevitable.
Courteous terms do not affect the model’s choice between good yield and bad output, because according to the authors, they are not meaningfully connected to the main topic of the prompt. Instead, they end up in part of the model’s internal space that has nothing to do with what the model actually determines.
When these terms are added to the hint, they increase the number of vectors considered by the model, but not in a way that diverts attention trajectory. As a result, polite terms like statistical noise: now, but lazy, leaving the critical point n* constant.
The author points out:
‘[Whether] Our AI response will depend on the training of our LLM, which provides token embeddings, as well as the substantial token provided in our prompts, rather than whether we are polite about it.
The models used in the new work are intentionally narrow, focusing on a single attention head with linear token dynamics – a simplified setup where each new token updates the internal state by adding direct vectors without the need for nonlinear transformations or gating.
This simplified setup allows the authors to have exact results and provides them with clear pictures of geometric shapes, explaining how and when the output of the model can suddenly shift from the benefits to the disadvantages. In their tests, they came up with a formula that predicts the offset matches what the model actually does.
chat..?
However, this accuracy only works because the model is intentionally simple. Although the authors acknowledge that their conclusions should be tested later on more complex multi-head models such as the Claude and Chatgpt series, they also argue that the theory can still be replicated as attention heads increase, and states *:
“As the number of connected attention heads and layers expands, there is a problem with what other phenomena arises, it’s a fascinating question. However, any transition of a single attention will still occur and may be coupled amplified and/or synchronized – just like a group of connected people being dragged onto the cliff when they fall.”

The description of how the predicted critical point n* changes depends on the extent to which the cues are strongly inclined to good or bad content. Appearance from the author’s approximate formula, showing that polite terms do not explicitly support either party, have little impact on when the collapse will be. The tagged value (n* = 10) matches earlier simulations, supporting the internal logic of the model.
It is unclear whether the same mechanism will survive modern transformer architectures. The bullish attention introduces interactions across dedicated heads, which may buffer or mask the described tipping behavior.
The author acknowledges this complexity, but believes that attention heads are usually loosely coupled, and that internal crashes of their models may be strengthen rather than suppression in full-scale systems.
Without extensions of the model or empirical testing across production LLM, the claim remains unproven. However, the mechanism seems to support subsequent research projects accurately enough, and the authors provide a clear opportunity to challenge or confirm the scale of the theory.
signature
Currently, from a (pragmatic) perspective, the topic of politeness to consumer-facing LLM seems acceptable, that a trained system may respond more effectively to polite investigations. Alternatively, this systematic style of insensible and straightforward communication has the potential to spread through habits into the user’s real social relationships.
It can be said that in the real world, LLM has not been widely used in the research literature, and the latter situation cannot be confirmed. But the new paper does raise some interesting doubts about the benefits of this type of anthropomorphic AI system.
A study from Stanford University last October showed (as opposed to a 2020 study), treating LLMS as if they were human and at risk of reducing the meaning of language, concludes that “rotten” politeness ultimately loses its original social significance:
[A] If the statements generated by the AI system seem unpopular, because the latter lacks meaningful commitment or intention, thus making the statement empty and deceptive.
However, according to a 2025 survey published in the Future Publishing, about 67% of Americans say they are polite to AI chatbots. Most say it’s just “doing the right thing”, while 12% admit they’re cautious – just in case the machine goes up.
* I convert the author’s inline reference to a hyperlink. To some extent, hyperlinks are arbitrary/exemplary, as authors at certain points link to extensive footnote citations rather than being associated with a particular publication.
First published on Wednesday, April 30, 2025. Revised on Wednesday, April 30, 2025 at 15:29:00 for formatting.