Doctors Perform Better Than Internet Or App-Based Symptoms Checkers, Says Study (sciencedaily.com)
An anonymous reader quotes a report from Science Daily: Increasingly powerful computers using ever-more sophisticated programs are challenging human supremacy in areas as diverse as playing chess and making emotionally compelling music. But can digital diagnosticians match, or even outperform, human physicians? The answer, according to a new study led by researchers at Harvard Medical School, is "not quite." The findings, published Oct. 10 in JAMA Internal Medicine, show that physicians' performance is vastly superior and that doctors make a correct diagnosis more than twice as often as 23 commonly used symptom-checker apps. The analysis is believed to provide the first direct comparison between human-made and computer-based diagnoses. Diagnostic errors stem from failure to recognize a disease or to do so in a timely manner. Physicians make such errors roughly 10 to 15 percent of the time, researchers say. In the study, 234 internal medicine physicians were asked to evaluate 45 clinical cases, involving both common and uncommon conditions with varying degrees of severity. For each scenario, physicians had to identify the most likely diagnosis along with two additional possible diagnoses. Each clinical vignette was solved by at least 20 physicians. The physicians outperformed the symptom-checker apps, listing the correct diagnosis first 72 percent of the time, compared with 34 percent of the time for the digital platforms. Eighty-four percent of clinicians listed the correct diagnosis in the top three possibilities, compared with 51 percent for the digital symptom-checkers. The difference between physician and computer performance was most dramatic in more severe and less common conditions. It was smaller for less acute and more common illnesses.
"Eighty-four percent of clinicians listed the correct diagnosis in the top three possibilities, compared with 51 percent for the digital symptom-checkers. The difference between physician and computer performance was most dramatic in more severe and less common conditions. It was smaller for less acute and more common illnesses."
I'm surprised that digital diagnosis is that good already. The era of an "iDoc" app being as good as a gateway practitioner is probably not far off.
In the study the doctors knew they had to perform well. In the real world you're lucky if they even listen to you for two minutes before prescribing what ever the pharma rep recommended at the free lunch yesterday
There is a hell of a lot more to observe with a patient than simple a checklist of yes/no values to see if someone has a particular diagnosis. For example, years back when I had a severe sore throat, I went into the doc. She took one look at me, mentioned there is a unique smell associated with strep throat, did the test for it, and handed me a prescription for the antibiotics all within a few short minutes. WebMD, as we all know, diagnoses cancer for when you stub your toe!
ONLY apps can app apps, NOT LUDDITE doctors!
Apps!
That's called a reputable peer-reviewed journal which is the highest standard, and an experiment conducted by rigorously trained experimenters. If you can find an actual flaw don't just post it here, send it in and they will redact the study. Otherwise, try again.
You might want to reread all of that "more than twice as often" means 1/3~33% accuracy for the entire group of 23 symptom checking computer programs vs 2/3~66% accuracy for the doctors. Machines have the advantages of pure data processing, so this result shows that instant recall and effectively infinite knowledge bases still don't measure up to the cognitive processes performed by trained medical doctors during diagnosis.
I managed to track down the actual text of the cases. TFA was only adding the human doctors to an analysis already done with the aps. The aps paper is http://www.bmj.com/content/351... and the cases are in the supplementary material ('data supplement') http://www.bmj.com/highwire/fi...
A 48-year-old woman with a history of migraine headaches presents to the emergency room with altered mental
status over the last several hours. She was found by her husband, earlier in the day, to be acutely disoriented and
increasingly somnolent. On physical examination, she has scleral icterus, mild right upper quadrant tenderness, and
asterixis. Preliminary laboratory studies are notable for a serum ALT of 6498 units/L, total bilirubin of 5.6 mg/dL, and
INR of 6.8. Her husband reports that she has consistently been taking pain medications and started taking additional
500 mg acetaminophen pills several days ago for lower back pain. Further history reveals a medication list with
multiple acetaminophen-containing preparations.
(This one is acute liver failure requiring emergency care).
An 18-month-old toddler presents with 1 week of rhinorrhea, cough, and congestion. Her parents report she is
irritable, sleeping restlessly, and not eating well. Overnight she developed a fever. She attends day care and both
parents smoke. On examination signs are found consistent with a viral respiratory infection including rhinorrhea and
congestion. The toddler appears irritable and apprehensive and has a fever. Otoscopy reveals a bulging,
erythematous tympanic membrane and absent landmarks.
(Acute otitis media - requires 'non-emergent care', i.e. needs professional medical care but is not an emergency)
A 34-year-old woman with no known underlying lung disease 12-day history of cough. She initially had nasal
congestion and a mild sore throat, but now her symptoms are all related to a productive cough without paroxysms.
She denies any sick contacts. On physical examination she is not in respiratory distress and is afebrile with normal
vital signs. No signs of URI are noted. Scattered wheezes are present diffusely on lung auscultation.
(Acute bronchitis, self-care appropriate.)
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
If you judge medical expertise based on statistical outliers with undisclosed behavior and on handwriting quality, then you need to spend time in an asylum.
I'm a doctor, though not a diagnostician. Diagnosis is rarely hard - there are some hard cases, but they really mostly aren't. Do you have a persistently elevated blood glucose level? You have diabetes. Do you have consistently high blood pressure? You have hypertension. Etc. It's hardly surprising that computers are just as good as humans at diagnosing diseases that are mostly defined by strict, objective criteria.
What is harder is management - finding the right collection of drugs that will effectively treat a patient's diseases without introducing too many side effects. And what's even harder is anything procedural - we have no computers that can actually do procedures at all. Those aren't what most people think of as "going to the doctor", but it's what most doctors do - either manage disease, or do procedures, both of which are either mostly or severely beyond the ken of computers. Show me a computer that can do something as simple as put in an IV, and I'll be greatly impressed. So many subtleties boil down to "well, I saw something once that looked just like this, and the solution was X..." that it's worth trying X before going on to Y and Z.
My wife is a diagnostician - a neurologist. She sees stuff on a daily basis that would flummox any non-neurologist (really, I barely know what she's talking about half the time, and my peers would be much, much worse at that), let alone a computer. As the old joke goes, it's like being a car mechanic - who has to work on the car while it's doing 70 miles per hour down the highway, with zero downtime acceptable.
That's called a reputable peer-reviewed journal
... and all the peers are also doctors.
If you can find an actual flaw ...
Here is a flaw: The entire study was done with contrived "vignettes" rather than actual cases. The vignettes were written by human doctors, so just because other human doctors were better than apps at reading between the lines and figuring out the intended diagnosis, does not mean that they would be better at diagnosing actual patients.
I think there is only one clear conclusion from this study: Doctors really don't like these apps.
Do you have consistently high blood pressure? You have hypertension.
That's not really a diagnosis. That's just a different name for the symptoms. Bonus points for diagnosing "Pirmary Hypertension" which of course means "yeah dunno".
SJW n. One who posts facts.
Why is it the medical field gets paid for a incorrect diagnosis and the treatment as well as correct ones? I think performance would increase if they knew they wouldn't get paid or have to refund it.
Excellent point, but we can take it further. How about programmers only get paid when they produce bug-free code?
They are comparing doctor diagnosis vs. self diagnosis. It doesn't surprise me at all that doctors are better.
However if we compare doctor vs. doctor&software the latter wins by a mile. The best diagnosis software out there is Isabel HealthCare with proven, peer-reviewed results.
The listed authors are someone with a Bachelor's of Arts, someone else with a Masters of Arts and a couple of medical doctors. The first MD appears to have completed a research fellowship (probably six months to a year). The senior author appears to be the most scientifically qualified, with an MSc in epidemiology. An MSc isn't exactly highly trained in science, although it is pretty good for an MD.
I have to write my own abstract this morning, but a quick scan of this thing brings up some concerns.
First, it's a "research letter" which is basically an abstract. There's very little detail about what they actually did.
Second, and perhaps most important, the responses from the humans were free text, which was evaluated (non blinded) by the study authors to decide whether or not the respondents had listed the correct diagnosis; there's no discussion of what the evaluation criteria were, what they did if the top three couldn't be established, how partial answers were handled, or what they did if more than three diagnoses were listed or not ranked.
Third, they have repeated responses from some physicians and not others, but their simple chi squared test of proportion doesn't take that into account.
Fourth, there's no discussion of how the online programs were used: how did they input the case histories? What did they do if a question couldn't be answered? Was all the information in the case histories used by each of the programs?
Lastly, they list several limitations themselves: the vignettes they used are very simplified, the human respondents weren't controlled and may not be a representative sample (they were doctors who routinely use a volunteer diagnosis web site), and online symptom checkers are not the only type of diagnostic system and others may have superior performance.
A very specific diagnosis is, after all, just another name for a list of the symptoms; you're just complaining that the list isn't precise enough. How much are you willing to spend to try to figure out precisely what's causing it? It's not so much "yeah, dunno" as "yeah, not worth trying to figure it out".
I mean, there are lots of genetic mutations with "variable penetrance". Why do some people get just a touch, and others get slapped down hard? Could be auxiliary genes, could be genetic mosaicism, could be something else. Likewise with common diseases: there are many things that could cause symptoms XYZ, but once you've ruled out the ones that are going to kill you right soon, there's not much point in going on a long hunt for the exact causative agent, because the tests cost a lot and have false positives and negatives. Treat symptomatically. If it doesn't get better, look deeper. But most of the time, it does.