Doctors Perform Better Than Internet Or App-Based Symptoms Checkers, Says Study (sciencedaily.com)
An anonymous reader quotes a report from Science Daily: Increasingly powerful computers using ever-more sophisticated programs are challenging human supremacy in areas as diverse as playing chess and making emotionally compelling music. But can digital diagnosticians match, or even outperform, human physicians? The answer, according to a new study led by researchers at Harvard Medical School, is "not quite." The findings, published Oct. 10 in JAMA Internal Medicine, show that physicians' performance is vastly superior and that doctors make a correct diagnosis more than twice as often as 23 commonly used symptom-checker apps. The analysis is believed to provide the first direct comparison between human-made and computer-based diagnoses. Diagnostic errors stem from failure to recognize a disease or to do so in a timely manner. Physicians make such errors roughly 10 to 15 percent of the time, researchers say. In the study, 234 internal medicine physicians were asked to evaluate 45 clinical cases, involving both common and uncommon conditions with varying degrees of severity. For each scenario, physicians had to identify the most likely diagnosis along with two additional possible diagnoses. Each clinical vignette was solved by at least 20 physicians. The physicians outperformed the symptom-checker apps, listing the correct diagnosis first 72 percent of the time, compared with 34 percent of the time for the digital platforms. Eighty-four percent of clinicians listed the correct diagnosis in the top three possibilities, compared with 51 percent for the digital symptom-checkers. The difference between physician and computer performance was most dramatic in more severe and less common conditions. It was smaller for less acute and more common illnesses.
"Eighty-four percent of clinicians listed the correct diagnosis in the top three possibilities, compared with 51 percent for the digital symptom-checkers. The difference between physician and computer performance was most dramatic in more severe and less common conditions. It was smaller for less acute and more common illnesses."
I'm surprised that digital diagnosis is that good already. The era of an "iDoc" app being as good as a gateway practitioner is probably not far off.
In the study the doctors knew they had to perform well. In the real world you're lucky if they even listen to you for two minutes before prescribing what ever the pharma rep recommended at the free lunch yesterday
There is a hell of a lot more to observe with a patient than simple a checklist of yes/no values to see if someone has a particular diagnosis. For example, years back when I had a severe sore throat, I went into the doc. She took one look at me, mentioned there is a unique smell associated with strep throat, did the test for it, and handed me a prescription for the antibiotics all within a few short minutes. WebMD, as we all know, diagnoses cancer for when you stub your toe!
That's called a reputable peer-reviewed journal which is the highest standard, and an experiment conducted by rigorously trained experimenters. If you can find an actual flaw don't just post it here, send it in and they will redact the study. Otherwise, try again.
I'm a doctor, though not a diagnostician. Diagnosis is rarely hard - there are some hard cases, but they really mostly aren't. Do you have a persistently elevated blood glucose level? You have diabetes. Do you have consistently high blood pressure? You have hypertension. Etc. It's hardly surprising that computers are just as good as humans at diagnosing diseases that are mostly defined by strict, objective criteria.
What is harder is management - finding the right collection of drugs that will effectively treat a patient's diseases without introducing too many side effects. And what's even harder is anything procedural - we have no computers that can actually do procedures at all. Those aren't what most people think of as "going to the doctor", but it's what most doctors do - either manage disease, or do procedures, both of which are either mostly or severely beyond the ken of computers. Show me a computer that can do something as simple as put in an IV, and I'll be greatly impressed. So many subtleties boil down to "well, I saw something once that looked just like this, and the solution was X..." that it's worth trying X before going on to Y and Z.
My wife is a diagnostician - a neurologist. She sees stuff on a daily basis that would flummox any non-neurologist (really, I barely know what she's talking about half the time, and my peers would be much, much worse at that), let alone a computer. As the old joke goes, it's like being a car mechanic - who has to work on the car while it's doing 70 miles per hour down the highway, with zero downtime acceptable.
That's called a reputable peer-reviewed journal
... and all the peers are also doctors.
If you can find an actual flaw ...
Here is a flaw: The entire study was done with contrived "vignettes" rather than actual cases. The vignettes were written by human doctors, so just because other human doctors were better than apps at reading between the lines and figuring out the intended diagnosis, does not mean that they would be better at diagnosing actual patients.
I think there is only one clear conclusion from this study: Doctors really don't like these apps.
Do you have consistently high blood pressure? You have hypertension.
That's not really a diagnosis. That's just a different name for the symptoms. Bonus points for diagnosing "Pirmary Hypertension" which of course means "yeah dunno".
SJW n. One who posts facts.
The listed authors are someone with a Bachelor's of Arts, someone else with a Masters of Arts and a couple of medical doctors. The first MD appears to have completed a research fellowship (probably six months to a year). The senior author appears to be the most scientifically qualified, with an MSc in epidemiology. An MSc isn't exactly highly trained in science, although it is pretty good for an MD.
I have to write my own abstract this morning, but a quick scan of this thing brings up some concerns.
First, it's a "research letter" which is basically an abstract. There's very little detail about what they actually did.
Second, and perhaps most important, the responses from the humans were free text, which was evaluated (non blinded) by the study authors to decide whether or not the respondents had listed the correct diagnosis; there's no discussion of what the evaluation criteria were, what they did if the top three couldn't be established, how partial answers were handled, or what they did if more than three diagnoses were listed or not ranked.
Third, they have repeated responses from some physicians and not others, but their simple chi squared test of proportion doesn't take that into account.
Fourth, there's no discussion of how the online programs were used: how did they input the case histories? What did they do if a question couldn't be answered? Was all the information in the case histories used by each of the programs?
Lastly, they list several limitations themselves: the vignettes they used are very simplified, the human respondents weren't controlled and may not be a representative sample (they were doctors who routinely use a volunteer diagnosis web site), and online symptom checkers are not the only type of diagnostic system and others may have superior performance.