Microsoft Research says GPT-4 is good enough for medical tasks


Companies like Google are developing language models optimized for medical purposes. Microsoft believes that GPT-4 is sufficient.

According to Microsoft, large language models can help speed up medical processes by, for example, structuring “large unstructured data” that currently requires time-consuming manual processing.

As an example, Microsoft cites the faster development of cancer drugs, where many clinical trials would have to be abandoned due to insufficient recruitment. Billions of dollars would be wasted in lengthy processes.

This pie chart shows the reasons for discontinuation of cancer clinical trials. 38.7% of these discontinuations are due to insufficient enrollment. | Image: Microsoft

Large language models such as GPT-4 could significantly accelerate such processes by efficiently abstracting patient information from large clinical texts. The impact of language models here would be similarly transformative to that of programming or productivity applications.


GPT-4 achieves SOTA results without special medical training

Although GPT-4 was trained only on generic Internet data and not on specific medical data, it was able to structure complex clinical studies according to specified criteria. In this respect, it outperforms current systems such as Criteria2Query, even though they were developed specifically for this task.

Comparison of test results for structuring clinical trial acceptance criteria. GPT-4 outperformed the state-of-the-art method without requiring any special fine-tuning. | Image: Microsoft

OpenAI’s large language model could achieve expert-level performance on medical question-answer datasets such as MedQA (USMLE exam) without requiring “costly task-specific fine-tuning or intricate self-refinement”, according to the report.

Microsoft has also introduced language models such as BioGPT specifically for medical tasks, but is now making it clear that it will rely primarily on GPT-4 in the future.

For Microsoft Research, GPT-4 is also the dominant language model in the medical field. | Image: Microsoft

GPT-4 could also structure patient data sets on a large scale, for example in cancer research. The model could act as a kind of super-organizer, enabling the use of real-world data on an unprecedented scale.

Although pretrained on general web content, GPT-4 has demonstrated impressive competence in biomedical tasks straightaway and has the potential to perform previously unseen natural language processing (NLP) tasks in the biomedical domain with exceptional accuracy.


Toward evidence-based precision medicine

LLMs could also serve as universal annotators, supporting the training of other models by generating labeled examples from unstructured data or finding cause-and-effect relationships.


LLaVA-Med, a sort of chatbot for biomedical imaging data available to medical professionals. Google also recently unveiled Med-Palm M, a multimodal medical model that can solve medical tasks in many domains and offers a chat mode.

The ultimate goal, according to Microsoft’s research team, is “precision health copilots” that can assist anyone involved in biomedical processes. They would provide a real-time view of large amounts of health data, accelerate care and new discoveries, and ensure a closer connection between clinical research and care.

Any clinical observation could be used immediately to update the patient’s health status. This would enable physicians and caregivers to make decisions based on the latest and most comprehensive evidence.

“This vision embodies the dream of evidence-based precision health. Generative AI, including large language models, will play a pivotal role in propelling us towards this exciting and transformative future.”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top