IIT Patna Releases Multimodal Hindi-English Medical Dataset

The researchers used Llama 2, Mistral 7B, Vicuna, FLAN-T5, and Zephyr-7B for the final summary generation.
Views : 1,977
Researchers From IIT Patna Launch Hindi-English Dataset for Medical Queries

Researchers from IIT Patna have introduced MedSumm, a multimodal approach that amalgamates Hindi-English codemixed medical queries with visual aids, providing a more comprehensive perspective on a patient’s medical condition.

Click here to read the paper, also to be published on ECIR 2024.

The researchers announce their intention to make the dataset, code, and pre-trained models publicly accessible.

The primary contributions of this research encompass the introduction of the MMCQS task, the creation of the MMCQS dataset, and the proposal of the advanced MedSumm framework. The MMCQS dataset comprises 3015 multimodal medical queries in Hindi-English codemixed language, accompanied by golden summaries in English that seamlessly merge visual and textual data.

The proposed framework harnesses the capabilities of LLMs and Vision Language Models (VLMs) namely CLIP, to facilitate multimodal medical question summarisation. The researchers showcase the tangible value of integrating visual information from images, demonstrating its potential to not only improve healthcare decision-making but also deepen the understanding of patient queries.

Researchers of the paper are Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep Majumdar, Sriparna Saha, Raghav Jain from IIT Patna along with Setu Sinha and Shivani Agarwal from Indira Gandhi Insitute of Medical Sciences, and Aman Chadha from Amazon generative AI and Stanford University.

The researchers used Llama 2, Mistral 7B, Vicuna, FLAN-T5, and Zephyr-7B for the final summary generation. 

Leveraging the HealthCareMagic Dataset derived from MedDialog data, comprising 226,395 samples, 523 duplicates were removed. Guided by medical doctors, co-authors of the paper, 18 medical symptoms challenging to convey through text were identified and categorised into four groups: ENT, EYE, LIMB, and SKIN.

This framework would help in enhancing doctor-patient interactions and medical decision-making by summarising medical questions posed by patients. Despite the increasing complexity and quantity of medical data, existing research has predominantly centred on text-based methods, sidelining the integration of visual cues. 

Furthermore, the scope of prior works in medical question summarisation has been confined to the English language, this dataset expands it to Hindi. The strategic integration of visual information from images aims to enhance the creation of medically detailed summaries.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey

Mohit Pandey

Mohit writes about AI in simple, explainable, and sometimes funny words. He holds keen interest in discussing AI with people building it for India, and for Bharat, while also talking a little bit about AGI.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Women in Tech Summit
Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem.
Rising 2025 is just around the corner! Book your passes now to lock in your ticket at the lowest price.