The Fallacy of Hard Tests

← Back to Stories (view on slashdot.org)

Posted by kdawson on Saturday June 16, 2007 @06:47PM from the do-the-math dept.

Al Feldzamen writes in with a blog post on the fallacious math behind many specialist examinations. "'The test was very hard,' the medical specialist said. 'Only 35 percent passed.' 'How did they grade it?' I asked. 'Multiple choice,' he said. 'They count the number right.' As a former mathematician, I immediately knew the test results were meaningless. It was typical of the very hard test, like bar exams or medical license exams, where very often the well-qualified and knowledgeable fail the exam. But that's because the exam itself is a fraud."

11 of 404 comments (clear)

Min score:

Reason:

Sort:

Re:Worthless by nephyo · 2007-06-16 19:52 · Score: 5, Informative

His argument is that the harder the test the less relevant knowledge of the actual answers to the questions posed on the test are to determining your relative score. As a result, on a very hard test, two test takers with vastly different levels of knowledge of the correct answers to the test questions do not on average end up with scores that reflect that difference.
The "educated guess" does not contradict that argument. Again, the harder the test then the smaller the difference between the number of potentially correct answers you can eliminate versus the number that he can eliminate will be. With a sufficiently hard test, "educated guessing" makes no difference whatsoever.
So basically with a multiple choice, count only the correct answers test, increasing the difficulty is not an effective means of increasing the likelihood of the test to accurately filter out candidates with lesser knowledge of the subject matter covered by the test. Increasing the difficulty only increases the degree to which randomness has an impact on the results.
This is true, well known, and not very controversial. However, you would of course need to examine the specific tests in question to determine whether they are effective. They may have other features to help mitigate this effect. Also, his analysis is purely mathematical. It doesn't take into account the likelihood of a challenging test to create social pressure that influences people to self-filter. It could be argued that most of these tests are not testing the takers knowledge of the material so much as they are testing the takers ability to study and react to the pressure that the tests provide.

--
I grant all that I write to the public domain.
Re:warning moronic blog post linked by Looshi · 2007-06-16 21:11 · Score: 3, Informative

I just skimmed TFA, but it seemed to me like he was advocating a guessing penalty.
Re:Worthless by DocDJ · 2007-06-16 21:50 · Score: 2, Informative

Well, your IQ may be 140, but you don't understand IQ tests or probability distributions if you think 2*IQ == twice as smart.
Re:Worthless by gnasher719 · 2007-06-16 22:17 · Score: 1, Informative

'' Also, 2X as smart == 2X right answers? What the hell? My IQ is 140, find me somebody with an IQ of 70 and give us a test on anything. ''

Anyone who really has an IQ of 140 would know that IQ = 140 doesn't mean "twice as smart" as someone with an IQ of 70.

IQ is an adjusted measurement; adjusted in such a way that it is normal distributed with an average of 100 and a standard deviation of 10.
Re:Worthless by Firethorn · 2007-06-17 00:16 · Score: 2, Informative

We give 1 person 5 different tests. We allow for random guessing with no penalty, and the test is very hard. He takes them all and scores wildly different, but averages 65% across all of the tests.

Statistics show that this would be very unlikely for 5 tests with questions pulled from a common pool.

The odds of WAGing a multiple choice test is 25% per question. When distributed over a hundred questions, it's very unlikely that random guessing will score above 30% or below 20%, and that's for guessing the entire test.

--
I don't read AC A human right
Medical Specialist Exams have an Oral Component by neoshmengi · 2007-06-17 01:30 · Score: 3, Informative

The hardest part of most medical specialist exams are the orals. Nobody ever complains about the written component. You get a to sit in a room with one or more examiners for a few hours of intense grilling. There is no way to hide any lack of knowledge and your deficiencies are exposed for all to see.

Also the US has a strange system of certifying specialists. After completing residency (usually based on putting in your hours) you can practice medicine under the application 'board-eligible.' Once you've passed your exams, then you can be called 'board certified.'

In Canada, you can't practice at all unless you pass your board (Royal College) exams. The exams are reputedly harder in Canada as well (from those I know who have written both).
Re:Disturbing by Brother+Seamus · 2007-06-17 02:04 · Score: 3, Informative

Almost all of the Professional Engineering certification exams in the United States are multiple choice, with no penalty for guessing incorrectly.
Re:He is totally and completely wrong. by kklein · 2007-06-17 02:30 · Score: 3, Informative

Well, a poorly-written item will always be out-fitting. If the answer doesn't match the question, then everyone will have to guess. If everyone has to guess, the information curve (a great graph I'd love to show you, but can't here, and need to go to bed) will be about flat. There should be a big hump that shows that it gives us a lot of information about people a certain number of standard deviations above or below the mean. Questions like you describe won't have that.
Also, I wrote about this in another comment, but a lot of the items you get on high-stakes tests don't really go toward your score. They are actually pilot items that the company is trying out a few thousand times to see if they work correctly before they start contributing to anyone's score.
I don't know any cheap item-writers, though. Everyone I know in this field has at least a master's degree, and most have PhDs. We don't come cheap.
As for being a good or bad item writer, there are just a handful of rules to follow to avoid the big blunders. After that, it's all about taking them for a spin and seeing how they handle. As I've said a few times now, there's no telling what will happen when you release these things into the wild. I've written items that I doted on and cared for and nurtured and cuddled and put my all into, fully expecting that they would grow up to be model items, ones that the other items would look up to and aspire to becoming, only to be totally and utterly betrayed by them in real-world piloting, my time and devotion wasted, finally having to drag them out back and shoot them in the back of the head. On the other hand, there are sometimes items you add to a section last-minute, just trying to get the number up for piloting or whatever, and find that you have written some ridiculously wonderful item purely by accident.
It gets easier with practice, though. To be fair, I'm not a very good item-writer. But that is why I, especially, need the stats.
Re:Disturbing by CmdrPorno · 2007-06-17 03:36 · Score: 2, Informative

I am currently studying for the bar exam (at the end of July). There is a one-day-long multiple-choice component in most states (including mine), a standardized national test. Every state that I know of also has at least a one-day-long essay component that is graded by an actual human, and many states also have a one-day-long performance test. So admission to the bar is not governed by just a multiple choice test.

--
Sent from my iPhone
Re:Worthless by Puff+of+Logic · 2007-06-17 05:13 · Score: 3, Informative

But you seem to be saying that, because such situations do occur, then it would be healthy to severely punish medical errors to the point where most doctors' first instinct is to do nothing, run another test, etc. Even though there may be times when that state of affairs would help certain patients, on the balance I think it would make medical care worse. Indeed it would. My understanding is that the cost of defensive medicine (defensive in terms of liability) is not just measured in dollars; invasive, harmful, or otherwise painful tests are often done in a full-court-press just to say that every possibility was checked, regardless of whether such tests are indicated. That we, as a society, demand a level of perfection from our doctors that is simply unreasonable to expect from any human merely exacerbates matters. A doctor cannot openly say "guys, I screwed this one up, so learn from my mistakes" because the family will be howling for compensation and the lawyers will be trying to hush it all up. A failure to act (doing nothing, as the GPP suggests) is just as damning as doing the wrong thing, so what other choice does a physician have than to fire the medical artillery, even if he thinks only a BB gun is indicated?

I should immediately point out that IANAD but I hope to play one in front of an admissions committee soon, so I may be talking out of my rear. However, the above seems to be the sentiment of most doctors I've spoken to. I just got done with the MCAT recently, so this topic is a bit close to my heart! An interesting site with a good take on the situation is here.

--
P.P.S. I'm doing Science and I'm still alive.
Re:I find Mr. Feldzamen's post hard to believe. by Old+Wolf · 2007-06-17 12:42 · Score: 4, Informative

Did you actually read the article? His whole point was that the multi-choice test is invalid because it is too hard.