Human AI team makes better medical diagnosis

A hybrid collective of humans and artificial intelligence can make medical diagnosis more accurate than only medical professionals or AI systems. New research analyzing more than 40,000 diagnoses shows that combining human expertise with AI models creates a strong diagnostic partnership that outperforms traditional approaches.
The study, published in the Proceedings of the National Academy of Sciences, examines how physicians and five leading AI language models diagnose more than 2,100 clinical cases. When working together, these human teams achieve diagnostic accuracy, surpassing individual physicians and AI-level-only systems.
Complementary advantages and disadvantages
The key to success lies in error complementarity – humans and AI make different mistakes systematically. When AI models fail to recognize the correct diagnosis, human physicians often provide the right answers and vice versa.
“Our results show that collaboration between humans and AI models has great potential to improve patient safety,” explains Nikolas Zöller, a postdoctoral researcher at the Max Planck Institute for Human Development.
The team found that AI collectively performed better than 85% of human diagnosticians. However, in many cases where AI fails completely, humans know the correct diagnosis, often ranking first on their differential diagnosis list.
Dramatic performance improvements
Progress has been made in multiple metrics by adding only one AI model to a group of human diagnosticians, or adding one human model to an AI system:
- The top 5 accuracy improves when combining the best AI model with the physician group
- Even the worst performing AI models improve human diagnostic teams
- Multiple AI models work together often outperform a single system
- Hybrid teams show the most reliable results in all tested medical professions
Real-world clinical potential
The researchers used a clinical vignette from the Human Diagnostics program, which provides a similar description of real-life cases that physicians encounter in practice. Each case includes patient symptoms, medical records and test results, posing a real diagnostic challenge.
“This is not about replacing humans with machines. Instead, we should think of artificial intelligence as a complementary tool that delivers its full potential in collective decision-making,” noted Stefan Herzog, a senior research scientist at the Max Planck Institute for Human Development.
The study uses complex text processing techniques to standardize diagnosis using SNOMED CT medical terminology, allowing accurate comparisons between humans and AI responses. This approach allows researchers to analyze diagnostic accuracy in different ranking positions and medical professions.
Error mode reveals opportunity
When the AI system completely misses the correct diagnosis (according to the model case, in 34% to 54% of the cases, in 34% to 54%, in 34% to 54%, in the case of 30% to 38%. Conversely, when humans fail completely, the AI model compensates with 31% to 51% of cases.
Studies have shown that humans and AI disagree with their best diagnostic options on a large number of cases, but this disagreement proves beneficial rather than problematic. Mistake diversity ensures that correct diagnosis is more frequent than error diagnosis in collective decision-making.
Widespread applications and limitations
Research coordinator Vito Trianni saw applications outside of medicine: “The approach can also be moved to other key areas (such as legal systems, disaster response or climate policy), where complex, high-risk decisions are required everywhere.”
However, researchers acknowledge important limitations. This study analyzed text-based case vials rather than actual patients in clinical settings. Whether the results are directly translated into real medical practice requires further investigation.
The study also focused only on diagnosis, not treatment decisions. The correct diagnosis does not automatically guarantee optimal patient care, and the study did not examine how medical staff and patients receive AI-based support systems.
What is the future
These findings highlight special hopes for areas with limited access to health care, where hybrid human systems can contribute to more equitable health care. This approach may help bridge the gap in medical expertise while maintaining basic human supervision.
As diagnostic misconceptions estimate that 795,000 deaths and permanent disability in the United States each year, these results suggest that the important potential for improving patient safety can be enhanced through thoughtful human collaboration rather than wholesale replacement of human judgment.
Related
Discover more from Neuroweed
Subscribe to send the latest posts to your email.