Eric Topol, Nature Medicine, and Nicholas Zaorsky, MD highlight both limitations and new workflows for medical AI, including evidence that human plus LLM can underperform LLM alone and calls for better model evaluation beyond benchmarks.
performance of humans when assisted by an #LLM was inferior to the LLM alone
The current science of evaluating AI models, such as primarily relying on benchmarks, is far from optimal.
A Practical Workflow: Using AI to Help Publish Academic Medical Research
The need for embodiment for large language models
This finding is one of many signals tracked across Healthcare. The live feed updates every few hours with new expert voices, debates, and emerging ideas.
← Back to Healthcare