Science

Artificial intelligence fails to read human social tips

Despite rapid advances in artificial intelligence, humans still maintain significant advantages in understanding social interactions, according to new research from Johns Hopkins University, which reveals the fundamental limitations of AI’s ability to explain human behavior.

The study, presented at the International Conference on Learning Performance, found that even the most complex AI models cannot grasp the nuances of social dynamics that humans effortlessly explain – key skills for technologies designed to interact with people in the real world.

“For example, artificial intelligence for self-driving cars needs to recognize the intentions, goals and actions of human drivers and pedestrians. You want it to know the way pedestrians are about to start walking, or two people are talking to the soon to be on the street.”

A research team, including doctoral student Kathy Garcia, conducted experiments to compare human perception with AI performance. Participants watched short video clips, interacted with each other, acted side by side or independently. They rate various characteristics of these social interactions on a scale of one to five.

When the task of more than 350 AI models, including language, video and image systems, is to predict how humans will judge the same video, the results show a significant disconnect. Although human participants largely agree with each other in the assessment, AI models have never matched human perceptions.

“Just seeing the image and identifying objects and faces is not enough. This is the first step, which takes a long way to go in AI. But real life is not static. We need AI to understand the story that is happening in the scene,” Garcia explained.

These findings highlight significant gaps in AI capabilities that may affect the development of autonomous vehicles, assisted robots and other systems designed to browse the human social environment. Although AI has shown significant progress in identifying objects and in front of static images, understanding dynamic social interactions has brought a different challenge altogether.

Video models especially strive to accurately describe what people do in editing. Even though an image model is provided with a series of stationary frames for analysis, they cannot reliably determine whether people are communicating. Language models perform better in predicting human behavior, while video models show stronger correlations with patterns of neural activity in the human brain.

The researchers believe that this limitation may stem from basic architectural problems. Most AI neural networks are designed based on the structure of brain regions that process static images, rather than the different regions responsible for interpreting dynamic social scenes.

“There are a lot of nuances, but the biggest takeaway is that none of the AI ​​models can respond like the human brain and behavioral behaviors to the entire scene, just like they do to static scenes,” Isik noted. “I think the way humans are dealing with scenes where these models are missing.”

Research shows that for AI to truly integrate into human society, development methods may need to be expanded from simply extending existing models to architectures that better reflect how humans process social information.

For billions of dollars investing in self-driving cars and social robots, this research reminds people that the complexity of teaching machines to navigate human interaction remains one of the most important unsolved challenges for AI.


Discover more from Neuroweed

Subscribe to send the latest posts to your email.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button