Doctors Perform Better Than Internet Or App-Based Symptoms Checkers, Says Study (sciencedaily.com)
An anonymous reader quotes a report from Science Daily: Increasingly powerful computers using ever-more sophisticated programs are challenging human supremacy in areas as diverse as playing chess and making emotionally compelling music. But can digital diagnosticians match, or even outperform, human physicians? The answer, according to a new study led by researchers at Harvard Medical School, is "not quite." The findings, published Oct. 10 in JAMA Internal Medicine, show that physicians' performance is vastly superior and that doctors make a correct diagnosis more than twice as often as 23 commonly used symptom-checker apps. The analysis is believed to provide the first direct comparison between human-made and computer-based diagnoses. Diagnostic errors stem from failure to recognize a disease or to do so in a timely manner. Physicians make such errors roughly 10 to 15 percent of the time, researchers say. In the study, 234 internal medicine physicians were asked to evaluate 45 clinical cases, involving both common and uncommon conditions with varying degrees of severity. For each scenario, physicians had to identify the most likely diagnosis along with two additional possible diagnoses. Each clinical vignette was solved by at least 20 physicians. The physicians outperformed the symptom-checker apps, listing the correct diagnosis first 72 percent of the time, compared with 34 percent of the time for the digital platforms. Eighty-four percent of clinicians listed the correct diagnosis in the top three possibilities, compared with 51 percent for the digital symptom-checkers. The difference between physician and computer performance was most dramatic in more severe and less common conditions. It was smaller for less acute and more common illnesses.
The listed authors are someone with a Bachelor's of Arts, someone else with a Masters of Arts and a couple of medical doctors. The first MD appears to have completed a research fellowship (probably six months to a year). The senior author appears to be the most scientifically qualified, with an MSc in epidemiology. An MSc isn't exactly highly trained in science, although it is pretty good for an MD.
I have to write my own abstract this morning, but a quick scan of this thing brings up some concerns.
First, it's a "research letter" which is basically an abstract. There's very little detail about what they actually did.
Second, and perhaps most important, the responses from the humans were free text, which was evaluated (non blinded) by the study authors to decide whether or not the respondents had listed the correct diagnosis; there's no discussion of what the evaluation criteria were, what they did if the top three couldn't be established, how partial answers were handled, or what they did if more than three diagnoses were listed or not ranked.
Third, they have repeated responses from some physicians and not others, but their simple chi squared test of proportion doesn't take that into account.
Fourth, there's no discussion of how the online programs were used: how did they input the case histories? What did they do if a question couldn't be answered? Was all the information in the case histories used by each of the programs?
Lastly, they list several limitations themselves: the vignettes they used are very simplified, the human respondents weren't controlled and may not be a representative sample (they were doctors who routinely use a volunteer diagnosis web site), and online symptom checkers are not the only type of diagnostic system and others may have superior performance.