ETH and Stanford researchers present Miriad: 5.8 million data sets to improve LLM accuracy in medical AI

by admin · June 25, 2025

Challenges of LLM in medical decision-making: Solving hallucinations through knowledge retrieval

LLM will revolutionize healthcare with intelligent decision support and chat-based assistants. However, one major challenge is that they tend to generate actually incorrect medical information. To solve this problem, a common solution is to wipes, where external medical knowledge is divided into smaller text works that LLM can retrieve and use during power generation. The current rag method, while promising, relies on unstructured medical content, which is often noisy, unfiltered and difficult to interpret effectively. It is clear that better presentation of organizational and medical knowledge is needed to ensure that LLM can use it more reliably and accurately.

Limitations of rag methods in current healthcare AI

Although LLMs perform impressively on cross-language tasks, they are often lacking in areas where the latest and precise knowledge (such as medicine). RAG provides a cost-effective alternative to expensive fine-tuning through basic models in external literature. However, many current rag systems rely on common text embeddings and standard vector databases that are not optimized for medical content. Unlike the general field, the medical field lacks large high-quality datasets that pair medical questions with relevant answers. Existing data sets, such as PubMedQA or MEDQA, are either too small, over-structured (e.g., multiple choice), or lack the open, real-world responses required to build a powerful medical retrieval system.

MiriAD dataset: Constructing medical quality inspection through peer review grounding

Researchers from Zurich, Stanford University, Mayo Clinic and other institutions have developed Miriad, a large-scale dataset that includes over 5.8 million high-quality medical guidance-response pairs. Each pair is carefully rewritten and rooted in peer-reviewed literature through a semi-automated process involving LLM, filters and expert reviews. Unlike previous unstructured datasets, Miriad provides structured retrievalable medical knowledge, improving LLM accuracy for complex medical quality inspection tasks by up to 6.7%, and improving 22.5-37% of hallucination detection. They also launched Miriad-Atlas, a visual tool covering 56 medical fields that enable users to explore and interact with this rich resource, enhancing trusted AI in healthcare.

Data Pipeline: Filtering and Structural Medical Literature using LLM and Classifiers

To establish Miriad, the researchers filtered 894,000 medical articles from the S2orc corpus and broke them into clean, sentence-based paragraphs that did not include overly long or noisy content. They generated over 10 million Q&A pairs using LLM with structured hints, which later increased to 5.8 million through rule-based filtering. It was further narrowed to 4.4 million high-quality pairs through a custom classifier based on GPT-4 tags. Human medical experts also verified the accuracy, relevance and grounding of the samples. Finally, they created an interactive 2D graph of the dataset for the topic and discipline using embedding and size reduction.

Performance Improvement: Improve Quality Inspection Accuracy and Hallucination Detection with MiriAD

MiriAD dataset significantly improves the performance of the big-word model on medical tasks. When used in a rag, the model has a precision of up to 6.7% even with the same amount of search content. Miriad also improved the model’s ability to detect medical hallucinations, with the improvement of F1 scores ranging from 22.5% to 37%. In addition, training hound models for MiriAD can improve retrieval quality. The structure of this dataset is based on a proven literature that allows more precise and reliable access to information, thus supporting a wide range of downstream medical applications.

Miriad-Atlas: Visual Exploration across 56 Medical Areas

In summary, Miriad is a large structured dataset that includes 5.8 million medical Q&A pairs, based on peer-reviewed literature and aims to support a range of medical AI applications. It includes an interactive map atlas that are easily explored and combines strict quality control through automated filters, LLM evaluation and expert review. Unlike previous unstructured corpus, Miriad improves retrieval accuracy in medical Q&A and can help identify hallucinations in language models. Although not yet exhaustive, it provides a solid foundation for future datasets. Continuous improvements can make it more accurate, participate in user retrieval, and better integrate with clinical tools and medical AI systems.

Check Paper, github pages and datasets on hug face. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.

ETH and Stanford researchers present Miriad: 5.8 million data sets to improve LLM accuracy in medical AI

Challenges of LLM in medical decision-making: Solving hallucinations through knowledge retrieval

Limitations of rag methods in current healthcare AI

MiriAD dataset: Constructing medical quality inspection through peer review grounding

Data Pipeline: Filtering and Structural Medical Literature using LLM and Classifiers

Performance Improvement: Improve Quality Inspection Accuracy and Hallucination Detection with MiriAD

Miriad-Atlas: Visual Exploration across 56 Medical Areas

You may also like...

live chat

Recent Posts

ETH and Stanford researchers present Miriad: 5.8 million data sets to improve LLM accuracy in medical AI

Challenges of LLM in medical decision-making: Solving hallucinations through knowledge retrieval

Limitations of rag methods in current healthcare AI

MiriAD dataset: Constructing medical quality inspection through peer review grounding

Data Pipeline: Filtering and Structural Medical Literature using LLM and Classifiers

Performance Improvement: Improve Quality Inspection Accuracy and Hallucination Detection with MiriAD

Miriad-Atlas: Visual Exploration across 56 Medical Areas

You may also like...

Live nerve interface can change the treatment of brain diseases

Effect of concentration on penetration and diffusion in dialysis tubing

Mixed computing unlocks new boundaries

live chat

Recent Posts