Huang, et al JAMA Ophthalmology February 22, 2024 LINK
We’re all watching eagerly to see how artificial intelligence can improve medical care, and some of the most interesting research has been published recently in JAMA Ophthalmology. One study, pictured above, showed that Chat GPT4 (LLM Chatbot) outperformed fellowship trained ophthalmologists on questions regarding glaucoma cases seen at Mount Sinai in New York. Score one for artificial intelligence.
A second study showed that artificial intelligence was excellent at diagnosing retinopathy of prematurity from photographs, with a sensitivity of 80% (meaning it missed 1 out of 5 cases, and a specificity of 100% (meaning no false positives). This AI was used in conjunction with ophthalmologic evaluation, and AI decreased the workload of the ophthalmologists by 80%. This demonstrates that AI can be used in a “hybrid” fashion to allow fewer clinicians to effectively treat more patients.
Source: Mihalache, A JAMA Ophthalmology February 29, 2024 LINK
A third study (above) showed that AI did poorly answering ophthalmology multiple choice questions based on a standard set of images. This study didn’t compare the chatbot to actual ophthalmologists, although we would all be nervous if a pediatric ophthalmologist was right only a bit more than half the time. This suggests that we will continue to need human oversight of AI in diagnostic ophthalmology.
Source: Hua, et al JAMA Ophthalmology July 27, 2023 LINK
Here is one more publication from JAMA Ophthalmology last summer. Researchers asked ChatGPT3.5 and ChatGPT4 to create scientific abstracts with references for 7 ophthalmology topics. Pictured above is a list of references created by ChatGPT4 which illustrates the problem of “hallucination.” These large language models “learn” what the most likely next word is, and so they are notorious for inventing references. The chatbot on average hallucinated 30% of references; in this example, the 7 references in the red box are false.
Implications for employers:
Artificial intelligence will continue to penetrate clinical medicine. It’s likely to be enormously helpful in documentation, reviewing images and suggesting potential diagnoses that a clinician has missed.
We are not yet at a point where we can trust large language models to practice without close human supervision, and we’ll need to carefully check references.
AI could lower the resource cost of care delivery, but can also increase billed amounts. We’ll see if health care purchasers benefit from the increased productivity that AI can bring to health care delivery.