Science

Why can’t gpt think like we do

Artificial intelligence (AI), especially large language models such as GPT-4, show impressive performance on inference tasks. But does AI really understand abstract concepts, or is it just a model of imitation? A new study from the University of Amsterdam and the Santa Fe Institute shows that although GPT models perform well on certain analogous tasks, they lack when problems change, highlighting key weaknesses in AI’s inference ability.

Similar reasoning is the ability to compare two different things based on their similarity in some respects. It is one of the most common ways humans try to understand the world and make decisions. An example of analogy reasoning: a cup is drinking coffee because soup is coffee? ? ? (The answer is: bowl)

Large language models such as GPT-4 perform well in a variety of tests, including those requiring analogical reasoning. But can AI models really be reasoned in general, or do they over-fusion from the patterns in training data? This study by language and AI experts Martha Lewis (Amsterdam University Logic, Language and Computing Institute) and Melanie Mitchell (Santa Fe Institute) examines whether GPT models are as flexible and robust as humans in making analogies. “This is crucial because AI is increasingly used in real-world decision-making and problem-solving,” Lewis explained.

Comparing AI models with human performance

Lewis and Mitchell compared the performance of human and GPT models on three different types of analogy problems:

  1. Alphabetical sequence – Identify patterns in alphabetical sequences and complete correctly.
  2. Number matrix – Analyze the digital patterns and determine the missing numbers.
  3. Story analogy – Learn which of the two stories is best for a given example story.

Systems that truly understand analogies should maintain high performance even in terms of variation

In addition to testing whether the GPT model can solve the original problem, the study also examines the performance when the problem is cleverly modified. The author points out in his article that “systems that truly understand analogies should also maintain high performance.”

GPT model struggles with robustness

Humans maintain high performance on most improved versions of the problem, but the GPT model performs well on standard analogy issues and struggles with changes. “This suggests that AI models are often more flexible than humans, and their reasoning has fewer reasons than true abstract understandings and involve more pattern matching,” Lewis explained.

In the numeric matrix, the GPT model shows a significant drop in performance when the position of the missing number changes. Humans have no difficulties with this. In the story analogy, GPT-4 tends to choose the first given answer, because humans are not affected by the order of answers. Furthermore, when the key elements of the story are rewritten, GPT-4 struggles more than humans, suggesting a dependency on surface-level similarity rather than deeper causal reasoning.

On simpler analogy tasks, the GPT model showed a decline in performance when tested on modified versions, while humans remained consistent. But humans and artificial intelligence are struggling with more complex inference tasks.

Weaker than people’s cognition

This study challenges the broad hypothesis that AI models like GPT-4 can reason like humans. “While AI models show impressive functionality, that doesn’t mean they really understand what they are doing.” “Their ability to generalize across variants is still significantly weaker than humans,” Lewis and Mitchell concluded. Cognition. GPT models often rely on surface patterns rather than deep understanding. ”

This is a key warning for the use of AI in important decision-making areas such as education, law, and health care. Artificial intelligence can be a powerful tool, but it cannot replace human thinking and reasoning.

Article details

Martha Lewis and Melanie Mitchell, 2025, “Evaluating the Robustness of Analogy Inference in Large Language Models”, in: Transactions in machine learning research.

If you find this piece useful, consider supporting our work with a one-time or monthly donation. Your contribution allows us to continue to bring you accurate, thought-provoking scientific and medical news that you can trust. Independent reporting requires time, effort, and resources, and your support makes it possible for us to continue exploring stories that are important to you. Together, we can ensure that important discoveries and developments attract those who need them the most.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button