Physical breakthrough reveals why AI models hallucinate and show bias

Researchers unlock the mathematical secrets behind AI’s most confusing behavior, which may pave the way for safer and more reliable AI systems. The George Washington University physics team developed the first comprehensive theory that explains why models like ChatGPT sometimes repeat themselves endlessly from innocent questions, making up for or producing harmful content.
The study, led by Neil Johnson and Frank Yingjie Huo, provides a rare glimpse into the “black box” of large language models by analyzing its core mechanism (attention process) through the lens of physics.
“We obtained the first principle physics theory of AI engines in the core “magic” of LLMS’Magic,” the researchers wrote in their preprint paper.
Their work reveals something surprising: The attention process behaves very much like two rotating tops. This “2-body Hamiltonian” system explains why AI systems can exhibit strange behavior despite their impressive capabilities.
Although most users experience AI models that occasionally produce duplicate text or fabricate information, the underlying reason remains mysterious. The team found that these problems were caused by the basic properties of how AI processes information, not just flaws in the training data.
According to the researchers, the way AI models predict the next word in sequence is similar to how physicists calculate probabilities in statistical sets of interacting particles. This conceptual breakthrough helps explain why harmful content sometimes appears “When training, when specific ‘bad’ words are buried deep in the vocabulary, temporarily discovering their own production of the system”.
This study shows how relatively small deviations in model training can make huge changes in output. This explains why even a highly protected model can still produce problematic content.
Dr. Elizabeth Morgan, an AI ethics researcher who was not involved in the study, found important influences. “Understanding the physics of AI attention can provide us with new tools to prevent harmful output without compromising performance,” she said. “This is exactly the basic research needed in the field.”
The analysis of George Washington’s team goes beyond current approaches to AI interpretability, which often involves complex analysis of the entire neural architecture. Instead, they start with the first principle to build a mathematical framework that accurately predicts when and why AI output is problematic.
Their work shows that current AI systems rely heavily on the two-body interaction between tokens (words or word slices), similar to simpler descriptions of how complex physical systems are generally approximate. Even more interestingly, they speculated that adding tribody interactions could make AI systems work better – potentially leading to a more powerful model for the next generation.
“The similarity to spin baths means that existing physics expertise can be used immediately to help society ensure that AI is trustworthy and resilient,” the researchers concluded in the abstract.
As governments deal with AI regulations globally and companies compete with increasingly powerful models, such theoretical breakthroughs can be a key tool to ensure that these systems remain beneficial rather than harmful.
These findings also highlight how interdisciplinary approaches bring physics to computational problems – possibly helping to solve some of the most pressing challenges in advanced technologies. For a frequently criticized field that moves rapidly because there is no sufficient understanding, this deeper theoretical basis cannot come in a better moment.
If our report has been informed or inspired, please consider donating. No matter how big or small, every contribution allows us to continue to provide accurate, engaging and trustworthy scientific and medical news. Independent news takes time, energy and resources – your support ensures that we can continue to reveal the stories that matter most to you.
Join us to make knowledge accessible and impactful. Thank you for standing with us!