The Fallacy of Hard Tests
Al Feldzamen writes in with a blog post on the fallacious math behind many specialist examinations. "'The test was very hard,' the medical specialist said. 'Only 35 percent passed.' 'How did they grade it?' I asked. 'Multiple choice,' he said. 'They count the number right.' As a former mathematician, I immediately knew the test results were meaningless. It was typical of the very hard test, like bar exams or medical license exams, where very often the well-qualified and knowledgeable fail the exam. But that's because the exam itself is a fraud."
What a worthless post. He gave one situation where guessing is more important than knowledge, but didn't at all address the specifics of the tests he was talking about. A typical vapid blog that for some reason gets posted to /.
Stories like this could never get on Slashdot. Seriously, this is like a maths problem I'd give to my Year 9 kids. This is definitely not news, and certainly doesn't matter.
It's hard to believe this guy is really a mathematician. I read this with interest as I teach college classes and have to give tests. However, there's not much content in the article.
His point about only counting the correct answers is rather silly. In a test where each question is either right or wrong, counting the wrong answers into the score does not add any information (you can tell how many are wrong if you know how many are right). The only thing it does is change the scaling of the resulting scores. This only makes a difference if you have an issue interpreting the scores. He seems to want the scores to proportional to the amount of knowledge someone has, so that if I have twice as much knowledge as you my score is twice as high. But in the example case of a professional qualifying examination, all that matters is whether or not you achieve some minimum. Whether that is represented as % correct or % correct - %incorrect/2 really makes no difference.
Designing better tests generally involves moving beyond multiple choice, not manipulating the scoring process.
2 + 49 - (49 ÷ 2) = 75.5?
Seems like he added rather than subtracted the (49/2). Pretty much ruins the whole argument.
And now we know why this man is a former mathematician. This is just bad math.
Suppose the test is really hard and contains many answers which are wrong, but can be thought as correct by a person who is moderately knowledgeable about the question. Now if you penalize guessing, I may answer 20 questions correctly and 80 with "reasonable" answer which are not correct, my score is 0 assuming 4 questions per choice. On the other hand, someone who answers 10 questions correctly and puts random guesses for the other 90, will likely get a score close to 10.
Basically, multiple choice tests which are so hard that even successful candidates will get most questions wrong are worthless. Consider also the potential of undetectable fraud if, say the janitor cleaning instructors room leaks questions in advance.
I haven't had many exams with multiple choice, but my university statistics course was one of them.
:-)
Each question had 5 options, and only one was correct. A correct answer gave 5 points, an incorrect answer gave -1 point.
Now, as the smart reader can guess, 4 x -1 + 5 = 1, so guessing still pays off... especially if one or more of the questions are very unlikely to be correct.
Did the teacher design this test incorrectly, since guessing was rewarded? Well, actually, the only test of real-life application of statistical knowledge was to understand this, so those who started to guess, basically demonstrated their statistical knowledge, and I guess that should be rewarded.
One of the questions was about the outcome of a distribution, where the value should be looked up in a distribution table that was used by the course. Only one of the 5 options was in the table as a result value. That made this one easy
I am a french student and we have very rarely, if any, multiple choices questions (QCM in french) in our exams. When there are some QCM, like in the maths test of the baccalauréat, it counts only as a small part of the final grade and it is very recent. The only QCM-only test I passed was the TOEFL.
Is it that common in the US ? Is it common even outside scientific studies ?
If you have 100 questions, and 20 right ones and 20 wrong ones, it leaves 60 unanswered questions.
That's why the articles talks about only counting right ones. In order to avoid guessing, there should be a difference between picking a wrong answer and not picking an answer at all.
As a medical student, I know how much our education is divided into what we do in real life, and what is the proper answer for exams. Quite often, during our education exercises, we're given senarios like "A patient presents with symptoms X, Y and Z. What do you do next?". At that point, that's when the resident says "You would diagnose condition A from those symptoms, but for the exam, you'd say you'd get an MRI to rule out B". So many questions are basically having intuition for where the question is guiding you too, rather than practical medicine. Often, it's extremely difficult to discern what the question wants. There will be some question along the lines of "A patient presents with general fatigue over the past 3 months, which one blood test do you want to order?" and you'll narrow down the answer choices to either thyroid stimulating hormone, or a complete blood count, both studies are equally important in the evaluation of fatigue, but the question wants you to know which one is more important. In real life, you would always get both because both conditions fairly common, and you want to evaluate both at once to save the patient time and effort. However, the question will nail you if you don't know some obscure study which states that there like is a 1% difference in the incidence of hypothyroidism vs anemia in fatigue. Moreso, if you were on the hospital floor and you were to say "I'm getting only a CBC, because it's more likely," the resident will chide you for not considering hypothyroidism as well and getting the Thyroid stimulating hormone as well, making you look bad. So yeah, learning for the test doesn't really ever end.
if anything testing has become FAR FAR too easy, people pass CS courses and come out the otherside only to have a vague notion of how a computer works.
I won't claim his post is correct or not, but he claims the technology behind such tests is wrong and lets less educated people pass through with guessing, whle more educated people try to pass without guessing and fail.
People see the tests produce poor selection, and make the tests harder and harder in attempt to remedy this (but they won't since it's the technology of a test that's wrong).
Then you come here and support his opinion 1:1 by claiming tests are too easy (i.e. should be harder) and idiots pass through.
Ironic, isn't it.
I think what he should have said is that multiple choice tests are a stupid idea (it's okay if one or two questions are a block of multiple choice lines but not the whole test). Let the student explain things with his own words.
Justice is the sheep getting arrested while an impartial judge declares the vote void.
This is really a question of statistics not of mathematics. Having done experiments on MBA students, we found that a well written multiple choice question is more accurate than 4 well written essays. The fact that we can easilly have 50 multiple choice questions and a maximum of 8 essays makes it a no brainer that multiple choice is much more accurate.
So it isn't a matter of how you reward guessing (which psychologist will say that rewarding guessing actually gets better accuracy). It is a question of how well written the questions are. Further the pass rate has absolutely nothing to do with the fraction needed to pass. Even high school students understand this one. So he seems totally confused.
His basic assumptions are so retarded as to invalidate his own thesis. Yes, depending on the difficulty of the exam, the range between the best and worst candidates will narrow. But the effect of guessing only becomes important in the extreme cases he looks at (impossibly hard test vs. impossibly easy).
And who the hell sets multiple choice questions with only two options? Rerun the numbers with five options and report back. You'll find the guesser is far more severely punished.
Besides pass rates as indication of the difficulty of exams is a myth. Set any exam with even the slightest differentiation, and you can have whatever pass rate you like. You just pick your passing grade appropriately.
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
...the reasoning is... incomplete because it is based on an undefined variable ("knows twice as much as" (that itself is no easy task to measure)), and excludes the reasoning that, if a test is on the single subject the testee's 'level of knowlege' is "calculated" on, he with more knowledge/experience in that subject and its workings as a whole would have a greater chance of "guesstimating" correctly on the questions he was unable to answer with 100% certainty. Even more so if the test isn't a fixed set of true/false questions.
I'm sure it is possible to reduce such questions to mathamatical formulae, but the algorithm would be m~u~c~h more complicated, and even then I think we could only be hitting at "closest averages".
No, no sig. Really.
ThePromenader
> hard tests are meaningless? what's his solution, easy tests where even an idiot can score 100%?
No, you completely missed the point hard _multiple choice_ tests are meaningless, esp. when counting only right answers without penalty for wrong ones because the result depends more on how lucky you are (at guessing) than on actual knowledge. Maybe this is an overstatement, but there is no denying that multiple-choice can be problematic.
Though some of his logic was overblown (see the comments made directly on his blog), I think his larger point has some merit. In fields which require lots of studying before beginning as a professional, such as medicine and law, you always hear that you have to be absolutely brilliant to 'get in'. The fact of the matter is that this is not the case: you should be darn smart, but you needn't be the best student in the world to be successful as a doctor. Many of the students who go to law or medical school (I'd guess most) are completely qualified for positions in their respective fields, but by the same token, are not necessarily any more qualified than their peers: they've all studied the same material, had the same experience in the lab, and know the whole picture within a reasonable approximation of each other.
Yet to maintain the level of exclusivity that these careers have, there must be some way to select a subset of the candidates to proceed, and at this point, there are few distinguishing features among them. Some will be far and away brilliant, and will easily get a career regardless; but the majority can't be differentiated from one another. So, how should it be decided who is a doctor and who isn't? By making a test that's so hard it amounts to a randomising function, and then selecting a subset of top scorers to pass. Passing doesn't mean one is inherently more qualified; it just means one guessed better on that day. This also explains why people can pass on their second or third try: they are no better than their competitors the next time around, but eventually one will guess luckily, and get in. It'd be interesting to do some statistical analysis on how many tries it takes people to 'pass' a particular exam, and see if the results fit probabilistic models: If the results of such analysis fit too well, the test is too hard, whereas if they deviate greatly from probabilistic expectations, then the test is more likely to be an actual test of one's knowledge.
To be sure, there will be some individuals who can pass based entirely on their knowledge, just as there will be some individuals who simply aren't cut out for life as a lawyer that will fail the exam. But ultimately, it allows the higher-ups to select candidates for job positions based on the single indisputable criterion of the candidate having passed an exam, thus avoiding any messy issues when someone complains about them choosing a particular candidate in lieu of one better qualified.
Time for a terrible analogy, since it's 0300 here: Really hard exams are the bouncers at the door to the club of medical careers.
In many professional specialties, including law and medicine, there are times when a quick, decisive educated guess may produce better results than an exhaustively researched, definitively confirmed answer.
So tests that force students to do a lot of guessing may still be good tools for evaluating their professional qualifications.
A doctor or lawyer who can guess right may be superior to one who plods to the right answer only after many expensive lab tests or hours of legal research. That's not to say that doctors and lawyers shouldn't do lab tests and research -- of course they should. But there are many situations, especially time-sensitive ones, where quick judgment is more important than absolute knowledge: during surgery or a health crisis, during a trial or deposition, etc.
in college that gave very hard tests. Intel Assembly class. For a midterm, we had to decipher Object-Oriented Assembly, and decipher self-modifying code. After 3 weeks of introduction to Assembly.
I got an A, with an average of 58% in the class.
For the 2-hour final, he got up at the 1-hour point, and yelled: "The test is over. All pencils down." We just sat there dumbfounded for about 10 seconds, and then he said, "Just kidding. I always wanted to do that."
Ya, a real great pal there!
Worst teacher I had in college. He didn't last long
Don't steal. The government hates competition.
I once had a test that had a check box for how confident you were your answer was correct, that affected your score the following way:
If you ticked "confident" and you were wrong, -2
If you ticked "confident and you were right, +2
If you ticked "unsure" and you were wrong, -0
If you ticked "unsure" and you were right, +1
I guess the point is that it's advantageous to guess, but only if you choose the lesser-scoring option.
Cue the underground brilliance of every slashdot troll claiming that he is no less than a genius and nothing truly mental stimulating can be classified as difficult.
-tyfighter
TFA makes sense. Observe:
News for nerds?: yes[ ] no[x]
Stuff that matters?: yes[ ] no[x]
Clearly the editorial process is fraudulent - as this is a multiple choice, it is obvious that guessing tends to count much more than knowledge.
From this we can conclude one of two things:
1) Zonk is bad at guessing
2) The author is speaking out of his ass
Tempting as it is, I am going to stick with 2... But I could, of course, be guessing.
You mean it ain't me noggin, it's me teachers?
As many have posted, this blogpost is mostly pretentious at best. However, in the post he states:
Now suppose the test is very hard. As hard as it could be actually. Suppose the test is so hard that I, with lesser knowledge, can only answer one question based on actual knowledge. I answer that question, and guess at the other 99. You, who know twice as much as I, can answer two questions based on knowledge. So you guess at 98 answers. As you can readily imagine, the odds of you getting a higher grade than I are very slight. In fact, over 45 percent of the time, in repeated trials, I would outscore you, even though my knowledge is half that of yours.
I'd like to point out the simple fact: in reality we don't worry about those who are two or three times as smart as the rest, their knowledges are mostly indistinguishable (as pointed out by the blogpost, albeit shakily), but we are looking for the many magnitudes of times smarter than the rest (so smart in fact that they surpass the flaws that he has pointed out). And that's where those taking 'hard tests' succeed and others do not. All these flaws are arguably non-existent but even in supposing their existence, it would do nothing to correct these flaws in our ability to be able to separate those capable of Med School or Law school and those who are not. Those capable, those in the very upper-ring, are just so capable that they surpass the very flaws of the test itself.
My page.
So it's not just a "Typo" that distracts, it supports a completely faulty conclusion.
One is left wondering what kind of mathematics background the author had. Also, noting the dittography earlier ("question question"), whether proofreading or "checking your work" formed part of the author's training.
In any case, the post also assumes that test-makers don't spend an awful lot of time validating their tests; so instead of taking the rules from any given test, a couple of straw-men examinations are supplied.
Consider, for example, the case where a multiple-choice test featuring 4 possible answers penalizes wrong answers by one third of a point. In that case, guessing is not advantageous unless the examinee can eliminate two answers: hence "partial knowledge" can count for something.
Oh yeah, and who the heck said that the test was "hard" because most of the answers were unknown? Heck, if you look at your big standardized tests (such as the SAT), and just the multiple-choice parts, you'll find that, for those who take the test more than once, there's not much noise in their scores. So why should a medical exam be different?
I love the exams we had : a question was posed or a problem stated which required the knowledge we had learnt to solve it. Eventually there is more than one question asked to offer a lead. But no answer given. Those are real test. Applied Knowledge. Usually for multi choice with a very basic knowledge of the subject you can sort out formany response the one being the most probable. This is how I breathed through my english Multiple-Choice at the university, and hell, look at how bad (or how good ;)) my english is. Face it multiple choice might be an easy way out for professor to correct exams, but they are the poorest choice to test the knowledge and habilitiy to reason of the student.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org
In the long run he will score 1 + ( 99 * 0.5 ) = 50.5.
My expectancy is 2 + ( 98 * 0.5 ) = 51.
Seems I score more.
Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
A close second was counting negatives. To make easy concepts more difficult on tests, professors would often throw in layers of negative concepts (which of these isn't...). As I took the test, I'd count on my fingers while saying negative, positive, negative, positive ad nauseum. Once, I counted five negatives in one question and correct answer.
This wasn't testing my knowledge of the subject being taught. It was just seeing how well I could parse.
Of course essay tests are much more difficult to administer, though they are better indicators of your grasp of the subject.
I had a physics professor for two entire physics series. This man was... a machine. He was VERY intelligent, and was a VERY good teacher. He was, however, quite anal. He would not expect you to know things he hadn't taught, but he expected you to know what he had taught with *perfect* mastery.
He provided copies of all former tests, along with answers and how to solve them, to the local copy store for students to buy (amazingly, this prof DIDN'T try to take you on them, the only cost was of the copies). The tests changed VERY little over the years. Two nights before the exam, you were welcome to go to a study session, where he would take problems VERY similar to what would be on the test, and walk you through solving them. And he would let you take a 4x5 card into the test with you, with anything you wanted.
His tests consisted of three questions. Just three. At the end of the alotted hour for the exams, the majority of the class would NOT have finished. Those tests were *tough*. I also had a calculus professor who would give exams that consisted of just two problems, and few people finished them completely. That wasn't so much that he gave hard problems, just problems that took a lot of work to solve. That almost seems backwards, since the point of calculus is to make difficult problems easier... or at least possible, anyway. But he was VERY generous with partial credit.
Oh, you're not stuck, you're just unable to let go of the onion rings.
What do you call a person who graduated at the bottom of their class in medical school?
A doctor...
Its a joke. Before you say anything, a doctor isn't a doctor until they pass the medical board tests.
Don't allow yourself to dream away time. Be productive. -- Some fortune cookie
Look, its all bad 19th century design.
If the question said, pick the 'odd ones out' each worth n% its better.
There is no wrong or right unless the answer says so. But did the person designing the questions have a degree in writing/psychology/reading aswell?
Its easy to know who is a rope learner, vs a true genius, even Hawking flunked a lot at school.
Liberty freedom are no1, not dicks in suits.
He's right as far as it goes that a multiple choice test where the recipients know almost none of the answers is not very accurate at measuring their marginal knowledge.
However in my experience, hard multiple choice tests have a different problem..
"hard" can mean that you compare against a curve that's known for that particular test and that the curve has a long enough upper tail to seem to measure something at the upper end. Ie, the last couple of questions as you approach 100% are worth more than the questions before them.
The problem with that is that it seems to me that a common way to make a test have that longer upper tail is to make some of the questions ambiguous bad questions. If there are 10 questions on a test that are poorly designed where a knowledgeable person is likely to pick a "wrong" answer, then you can count on it that VERY few people will get all of the "right" answers. Instant "hard" test!
Mr. Feldzamen claims to have passed the Virginia bar exam, but I can't find any evidence he was ever admitted to the Virginia bar, or to any state bar (he's not in Martindale-Hubbell). He cites the Virginia bar exam -- which I also passed (IAAL, licensed to practice in CA and VA) -- as one of his examples of a "complete fraud." In fact, when I took the Virginia bar exam it had over a dozen one-hour essay components, testing each and every possible subject. By contrast, the California bar exam, had essay tests covering six randomly chosen subjects out of a possible 15 or so, and it had other non-multiple-choice components. The multiple-choice section of every state's bar exam, the Multistate Bar Exam, is no walk in the park. So I don't understand how he includes bar exams in his claim that the tests are invalid. If anything, the low pass rate of bar exams, typically 50% or less among a candidate pool of mostly recent law school grads, suggests that they are very hard indeeed.
I find the fact that medical and lawyer exams are based on multiple choice rather disturbing. As an engineer almost all of my test were long answer. Sure, some multi questions, but mostly show all your work or explain the whole process. And I just design systems and networks! Now someone can just luckily guess enough multiple choice questions and start slicing me up?
Like I said, disturbing.
Vote monkeys into Congress. They are cheaper and more trustworthy.
A person has heartburn, do you:
A) Perform a colonoscopy
B) Perform open heart surgery
C) Tickle him
D) Fart
E) Refer him to Cowboy Neil
I'm going to Mexico for my next check up. At least you'll get tequila first....
Vote monkeys into Congress. They are cheaper and more trustworthy.
I just skimmed TFA, but it seemed to me like he was advocating a guessing penalty.
Hah! Ok, how about a test on soap operas or celebrity trivia. Or sports?
They whose government reduces their essential liberties for temporary security, receive neither liberty nor security.
In our first year of engineering school (in France. Call it college, for the USA), our math teacher only did multiple choice exams. I was always floored by how accurate the results of those exams were. Of course, all answers counted, and guesses were punished.
The rumor was that he had done his thesis on the subject of multiple choice exams. Sadly, he is retired now, and newer students no longer benefit from his type of quick and accurate exams.
Misleading titles? Inflammatory blurbs? Keep in mind that Slashdot is a tabloid.
The chess ranking is typically a 3-digit figure. Given two chess players, you can work out approximate odds of one winning from the difference in these figures. The figures are compiled from the games people have won, and the ranks of people they have played against. As in multi-choice tests, each individual question or game has a wright (win) and a wrong (lose) solution, and a stalemate (not filling in anything) option. From this we can estimate ranks of people we have not met; we can estimate ranks of people in history; we can even estimate corrections for ranks between cultures. For instance in the 19th century, how might a woman chess player in London (where the culture did not encourage chess) rank against a man from Prague (where cafes typically had chess boards in the tabletop, and most people played with friends and strangers in their lunch hours) had their backgrounds been equal, and assuming a native talent for chess is spread equally? This last point is not obvious - the differences between London and Prague and between women and men may not be wholly cultural, but the others can discuss that ad et ultra nauseam.
No-one designed chess with a perfect solution, and yet we can rank people. The IQ tests started from a similar point. People did not understand what intelligence was, exactly, but if they made tests that seemed to be testing the right sort of thing, and got the best people to design the next round of tests, then it was hoped that an incisive test for intelligence would evolve, even though no-one had defined what intelligence was. Unfortunately, in the early days, what was being tested as 'intelligence' was probably better named "how like minded are you to the white male that designed the test". The test can be as exacting as a chess ranking if you do enough tests, but the figure is less useful because it is not a measurement of something abstract and useful (unless you were IBM in the sixties, looking for white males with short hair that would sing the company song, in which case it was perfect).
There is a further downside to IQ tests. If you sit and stare at them, you can often reason a second or third possible answer using different readoning. I also have a problem with forms that means I think long and hard about the answer, and then tick the wrong box. The trick seems to be to work really quickly, and let your instincts drive your answer. I have only ever done about three IQ tests, and all of these were done ages ago for job applications to computer companies. The last one must have been twenty five years ago. We had two hours and 300 questions. I deliberately hammered through the questions, and handed the paper in after 40 minutes to avoid the temptation of fiddling with the answers. Incredibly, this seemed to cure my usual error rate with forms, and I got a perfect score. I wasn't any smarter that day - I just happened to be in the zone, I guess.
Didn't get the job, though. They thought I was too scary.
Thinking about the original post, though. The guy claimed that a multi-choice was 'fraudulent'. Isn't 'fraud' where someone is trying to deceive someone else? Multi-choice questions are an attempt to separate the test for the presence or absence of knowledge from the talents of presentation (good handwriting, confident presentation style, etc), but often flawed by laziness in trying to pass off examination skills to a computer. A good multi-choice questionnaire would have to be much longer than such tests usually are to reliably separate the thing you are trying to measure from the noise (think how much mesurement goes into a chess ranking, for example). But 'fraudulent'? And this was supposed to be a legal exam? I have my doubts about the original posting. Interesting subject, though.
Just make it 10 possible answers... with 5 or 6 quite obviously wrong answers... for someone who studied...
Therefore the more "educated" the guess, the higher the probability of a reward...
The scenarios sketched in the post are not exactly close to the real world. Multiple choice tests no matter how hard can easily be constructed so there probability of passing the exam simply by guessing is insignificant. For example:
n
There are 20 questions and you have to be correct on at least 16 of them. If there are just two options on each question your chances of passing by guessing is 1:170 if there are four options your chances are 1:2600000.
If you are interested see http://en.wikipedia.org/wiki/Binomial_distributio
Sadly it was kdawson who posted this turd. (Or did I miss the memo about trolling slashdot with misinformation that seems to be circulating.)
Summary: 2 + 49 - (49 / 2) = 26.5, not 75.5 as per the article.
I want the minutes of my life spent reading this back. The author rattles off a bunch of crap about his credentials, including math credentials, and then rolls out some bs which pretty much amounts to admitting he is trolling the slashdot submission queue.
Anyone want to refer me to less dumb versions of this site? At this point I'm just waiting for Jon Katz to start posting again.
-- http://thegirlorthecar.com funny dating game for guys
Funnily enough, one of the most hardassed profs I ever had also taught the introductory assembler class. (except for us it was PDP-11 and 68K) His tests were legendary for their difficulty, and the average was somewhere in the 20-30% range. However, it was curved after the fact and was a perfectly valid exam since there was absolutely no opportunity to guess. He gave us self-modifying assembler code too, without telling us such a thing was possible in advance! He also had a unique way of assigning readings. He would say, "Have you read chapter X yet? If you haven't, you're screwed!" Still, despite his apparent sadism we did learn a lot in his class.
In a later course I had a prof who would run our class through proofs that would span 3 or 4 lectures. If you fell asleep once in that period of time you'd be utterly lost. At the end of his proofs he would often say, "Does this make sense? Does everybody get this? If not, you had better think about dropping the course!" (Somehow it was hilarious in his thick indian accent. He really rolled the 'r' in dropping too.)
This article is nonsense from beginning to end. First, as others have pointed out, the arithmetic is wrong. Second, the point of penalising wrong answers is misrepresented: it's nothing to do with improving the accuracy of scoring for different abilities, but is to minimise the difference between those with equal abilities who choose to guess answers they don't know and those who don't. Finally, the model of abilities is completely wrong. A better student will not only know the answer to more questions, but will be more likely to be right on questions he guesses. So far from swamping the true difference, guessed answers add to the accuracy of the test.
Maybe I didn't read it carefully, but I think his math in that last example is all wrong. It seemed wrong to me. I didn't see how that kind of adjustment could amplify the difference the way it did.
I get the following:
The guy who answers 1 correctly and guesses at 99 ends up with an expected score of 1 + 49.5 -24.75 = 25.75
The guy who answers 2 correctly and guesses at 98 ends up with an expected score of 1 + 49.0 -24.5 = 26.5
This is obvious, right?
I wonder where he got his math education. It is fairly simple to show that there exists a mapping between the results on a multiple choice test and "actual knowledge" K=T+|e|, where |e| is the statistical error, accounting for guessing statistically. Subtracting for wrong answers etc. is just "psychology". The statistical uncertainty "e" can easily be reduced below any significant value with more choices and more questions.
The example the author shows maximized the statistical uncertainty of guessing, and is not relevant. To illustrate the point: take the 100 question true/false test.
A) If you give 1 point for correct and no point for wrong, the student will score from S0=50 (randomness) to S0=100 (perfect). Now calculate a new score
S= 2*S0-100, and you have results from 0 to 100 (round anything less than 0 to 0.
B) Announce you will subtract 1 point for each wrong. Now you will get scores from T0=0 to T0=100, and your map is just T=T0.
don't cut it off www.mgmbill.org
The problem there is that averages are one thing, but in practice there still is a non-zero chance that he'll actually score higher than you do.
Let's say it's 20 questions, 4 possible answers each. He'll know 5 of those, has to guess 15. There's even a 1 in billion chance that he'll get all 20 right. (4^15 = 2^30 = approx 1 billion.) If you gave that test in China, by now you'd have at least one guy who pulled exactly that stunt.
There's also the issue of how well those questions fit your and his domain of knowledge. Let's say you can't possibly test _all_ the questions, because that's usually the case. You can do it for state capitals, but you can't possibly cover a whole domain like medicine or law.
There are 50 states, you know 25, the other guy knows, say 12 (rounded down), so it's not impossible that the 20 questions are all from the 25 you don't know, but include all 12 that guy knows. In fact, assuming a very very very large domain (much larger than 50, anyway), there's about 1 in a million chance that all 20 questions will be from the 50% you don't know.
Now when testing states that doesn't have a higher moral, because (at least theoretically) all states are equally important. In other domains, like medicine, law, even CS, that's not the case: stuff ranges from vital basics to pure trivia that noone gives a damn about. (Or not for the scope of the problem at hand: e.g., if I'm hiring a Java programmer, asking questions about COBOL would be just trivia.)
And a lot of "hard tests" are "hard" just by including inordinate amounts of stuff that's unimportant trivia. E.g., if I'm giving a test for a unix admin job, I can make it arbitrarily "hard" by including such trivia as "in which directory is Mozilla installed under SuSE Linux?" It's stuff that won't actually affect your ability to admin a unix box in any form or shape. The fact that SuSE does install some programs in different directories is just trivia.
(And if that sounds like an convoluted imaginary example, let's say that some "hard" certification exams ask just that: where is program X installed in distribution Y? And at least one version of Sun's Java certification asked such idiotically stupid trivia as in which package is class X, or whether class Y is final. Who cares about that trivia? It's less than half a second to get any IDE to fill in the package for you. E.g., in Eclipse it just takes a CTRL+SPACE.)
And in view of that previous point, including trivia in an exam just to make it "hard" is outright counter-productive. There is a non-null chance that you'll pass someone who memorized all the trivia, but doesn't know the basics.
Not all knowledge is created equal, and that's one point that many "hard" exams and certifications miss. If a lawyer doesn't know the intricacies of Melchett vs The Vatican, who cares? In the unlikely situation that they need it, they can google it. If they don't understand Habeas Corpus, on the other hand, they're just unfit to be a lawyer at all. Cramming trivia into an exam can get you just that kind of screwed up situation: you passed someone who happened to know that Melchett vs The Vatican is actually a gag question, and that case name appears in Stephen Fry's "The Letter", yet flunked someone with a solid grasp of the the basics and who knows how to extrapolate from there and where to get more information when he needs it.
Rewarding random guesswork is worse. Probably the most important thing one should know is what he _doesn't_ know, so he can research it instead of taking a dumb uninformed guess. Most RL problems aren't neatly organized into 4 possible answers, so it can be a monumental waste of time to just take wild guesses and see if it works. I've seen entirely too many people wasting time trying wrong guess after wrong guess, instead of just doing some research. E.g., I've actually witnessed a guy trying every single bloody combination between *, & and nothing in front of every single variable in a C function, because he never understood how poin
A polar bear is a cartesian bear after a coordinate transform.
Actually it has to be a % passing. If the supply of licensed doctors and attorneys were not limited, the costs for their services would reduce, so these exams have to be a part of the the system to control the supply. A test may be written to ensure a spread (so it tests knowledge) and also to ensure that the passing score is largely unattainable. So, I think the analysis is incorrect. The tests are not too hard to be useful as tests, it is just that their is a conflict of interest as regards their use. As medical care begins to take on the characteristics of a human right as representation in court is a political right, perhaps we'll begin to see a breaking down of the cartel system so that medical and law educations are not restricted and final competency tests can be tests of competency rather than also being a link in a chain of controlling supply to increase price.s -selling-solar.html
--
Electricity without fuel costs: http://mdsolar.blogspot.com/2007/01/slashdot-user
What I don't like about his reasoning is his assumption that "hard" tests will test substantial knowledge that even the most educated test-taker will not get correct. I would submit that such tests are poorly designed, at least for a final/qualification test.
If your goal is to teach X amount of material and you want to give tests to see how far along a student is at learning X, then such a test is okay, as there will be naturally parts of X that you haven't even taught yet that the student is likely to get wrong. However, if you're now giving a final/qualification test where a good student is expected to know all of X, then the test should test for all of X, and no more. Many of the students should be scoring very close to 100%. In this way, guessing doesn't become a large statistical factor in overall score.
If you administer such a test and even the best students are missing half the questions, either you're testing for more than X, or X was not taught very well. Now, in college classes, we want X from different classes in different years, at least recent ones, to be equivalent; that is, we want the kid who got 100% on the test this year to know as much as the one who got 100% last year. So one needs to be careful about lowering the standards for what qualifies as X knowledge. However, statistically speaking, it's very unlikely for a college or professional class to have a "poor" year where everyone in the class is a poor learner so they can't even reach 100% of X even if taught properly. So I don't think it's a bad idea if test high score is only 50% on a final to either make the final easier or change one's teaching methods. The chances of it actually "dumbing down" the qualifications relative to previous years is small.
It's a shame this guy went to the effort of creating this blog post without making any effort to involve useful metrics of how informative a test really is. The words sensitivity, specificity, and variance don't come up at all. There is a kernel of truth here, which is that you can have both noisy items (not informative about the taker's knowledge) and informative items on tests, and hard tests tend to have more noisy items. The author seems to miss the point that two-choice items at which students guess maximize the error variance. In other words, he chooses the best possible case to support his argument, even thought it's unrealistic. Five-choice guessing items contribute less, although it depends how the items are structured (if three of the choices are easily eliminated by even the worst students, then it's much closer to a two-choice item). As a thought experiment, if there were a million choices per "hard" item, they would contribute almost no variance to test scores. The article seems to make no reference to the true score variance among the test takers, which is obviously critical.
I would have liked to see an analysis of the relationship between the number of plausible choices per question and the probability of mis-ordering two test-takers (giving the less knowledgeable a higher score). That would have been a lot more informative than simply saying, essentially, "two is bad, you do the math for more -- but trust me, it's a mathematical certainty."
There is a kernel of truth here, that multiple-choice tests are often not that sensitive, and that when everyone is guessing on an item, it contributes only noise to the measure. At issue really is how much variance in the test score is explainable by knowledge. In other words, how much information is contained in the test score. An article that uses phrases like "mathematical certainty" and "complete fraud" is obligated to provide some legitimate analysis, or at least references to the literature, not just anecdotes.
Ugh. I just wrote a pretty polite reply at his page after skimming his idiotic article. Now that I've read it, I'm actually angry.
This guy knows NOTHING about testing. Nothing. He isn't even to the level of Classical Testing Theory (CTT), which is really not much more than means and Pearson correlations, and is nowhere near how high-stakes (and even medium- and low-stakes, increasingly) multiple choice (MC) tests work now, and how they have worked for many many years.
IAAP (I am a psychometrician). A big part of what I do for a living is design a particular MC test, pilot the items, and interpret the results. But I don't just count up the correct items and give you the percentage. Why? Because that would be insane. You can guess on those.
Oh, but he says this:
But suppose the grading attempts to adjust for guessing. There is no way of knowing what is in the mind of the test-taker, so the customary is to subtract, from the number correct, some fraction of the number wrong.
--Which is just fine until I tell you I have NEVER heard of dealing with guessing that way on a professional-level test.
As a general rule, we don't do any easy mathematics. At all.
Here is part of the output for a test I'm working on right now:
This is generated by RUMM2020, a tool for Rasch analysis. The Rasch model was developed in the 60s as an ideal model of item response. These are the stats on 3 items of this test. The two most important columns are Location and Probability.
The location is the item difficulty. Given the sample's performance on this item, and given their ability, how hard is this item? Item 35 is quite difficult; item 36, quite easy.
The probability is the p value for the chi square. Basically, if it's 0.05 or below, that item is operating significantly (statistically significantly, that is) outside of the model. It displays poor "fit." we generally toss these items before going on to the next step (ideally, these are weeded out during pilot testing, before the test goes live--in this case, it is an experimental test of a construct I'm not even sure exists anymore, but I digress). If an item has poor fit with the model, it is too much of a loose cannon, and its results cannot be trusted. This is what the benighted blogger (is there any other kind?) was whining about. That item is hard not because it is good, but because it is evidently stupid. The responses are all over the place, which means people were probably just guessing. Out it goes before it ruins any examinees' lives.
The next step is to get person locations. In the case of people, these numbers indicate the person's ability. This is calculated by looking at their performance on the items, given their difficulty (Which is calculated based on people's performance on them! Incestuous! But given a large enough sample, it all works out to a fine enough grain to be useful). Here is the output for the people:
So, the first person didn't do so hot; the last did pretty well (these usually top out at 3ish). As you can see in "DataPts," there were 125 items on this test. I started with 160. Do you hear that, Mr. Unexpected "Truths?" We have your back! We're not just handing you a naked score based on our crap items. WE PULL THE CRAP ITEMS.
That location score will usually be rescaled to something prettier, since no one would really like to see something like
Had an algorithms prof (of all things) give us a test where every question had the following possible answers:
..
Yes, No, Sometimes, Maybe, Unknown
Then, he had questions like 1. Some scientists believe than P=NP?
To which, of course, you could argue ANY answer is correct.
That being said, this blog post comes across as the usual whining we've all done or had to put up with through the years. No testing methodology is perfect, and everyone tests different on different kinds of tests. Fact is, though, they're pretty damn good. It's a common belief that millions of people who are otherwise idiots are graduating with great grades, while millions of geniuses can't test well - but that's horseshit. The majority of people manage to test at their level of understanding. The fact that people actually notice the odd idiot who guesses well is the exception that proves the rule.
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
He should have stuck with the more useful observation that almost* any test with a very low pass rate will be unreliable.
All tests have a margin of error, although its a rather taboo subject - when did you ever get a test result that stated the 95% confidence interval? If only a small proportion pass, there is a danger that these errors will dominate.
There are:
Now, an issue with multi-choice is the "guessing" problem, but there are (as TFA points out) work-arounds. TFA misses out the most important way of reducing guessing - which is designing the questions carefully so that each alternative is seductive and/or represents a common error. The real problem with multi-choice is the last two bullets above - it really is the most artificial and superficial form of test possible. Done well, its a good way of quickly romping through a large domain to offset the "sampling" problem, but it should never be the totality of a test. The depressing problem is that its so easy to mark and administer - and is cheap to deliver on computer (c.f. more ambitious computer-based testing, which is expensive to develop).
*I'm sure its possible to contrive a counter-example.
In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.
No, his implied definition of a hard test is that most test-takers know few correct answers, but the passing score is lowered to allow a sufficient number of test-takers to pass. Then, if you do the math (which he got somewhat wrong) allowing people to guess without penalty creates a significant probability that someone who knows less than you will pass even though you failed.
I read tfa, and aside from the bad math in the last paragraph I also have a problem with the logic of his example. He assumes that since one person who knows twice as much as me should get twice as high as me. But it isn't at all taking into account that twice of VERY LITTLE knowledge is still not a lot. I can understand that multiple choice will not be a perfect way of judging someone's knowledge, but thinking about how many people need to take the exam, and how few people there are to grade them. Not to mention it is impossible to ask long answer questions on every aspect of a course. Multiple choice questions are still not too bad a way of quizzing a large group of people on subject matter that has a large range of subtopics. Assuming of course the questions aren't uber-hard. Damn this wasn't exactly something I wanted to read 2 days before my exams (which happen to be largely mc-questions)
assumptions. The premise is "IF we had person A that knows 2 times as much as person B, a well devised test ought to score person A twice as high as person B". No one is saying that they found person
A and person B, i.e. two people where they can show beyond any doubt that A knows twice as much as B, that is exactly what ideal test would do. The problem here is construction of such ideal test.
And surely it is possible to construct a test that approaches the ideal without having people A and B. All one has to know about A and B is that A will answer two times as many questions correctly, if they don't answer questions they don't know answers to. So, one can then see how to score tests to reflect that. And the answer is to penalize guessing to some degree, which will depend on the structure of the test.
As the island of our knowledge grows, so does the shore of our ignorance.
but these tests do not test knowledge (all the people attempting BAR or medical license exam already have degrees). They are devised to cull and decide who "gets in", rather than test knowledge.
:). World doesn't work that way.
It is a naive assumption to think that more knowledgeable should get in it seems
As the island of our knowledge grows, so does the shore of our ignorance.
I know this isn't a comforting thought, but isn't some of the domain of doctors and lawyers in effect specialized, logical guesswork? For example, many diseases could share a common set of symptoms. Certainly, it takes knowledge, but it also takes a wee bit of luck.
...they are designed to touch on subjects which are likely to get them onto news sites like Slashdot...
So, is that the new sport? First there was First Post, then (before they switched) getting 50 Karma Points. Now we have to get Slashdot to Feature our Third Party Blog?
When our name is on the back of your car, we're behind you all the way!
The hardest part of most medical specialist exams are the orals. Nobody ever complains about the written component. You get a to sit in a room with one or more examiners for a few hours of intense grilling. There is no way to hide any lack of knowledge and your deficiencies are exposed for all to see.
Also the US has a strange system of certifying specialists. After completing residency (usually based on putting in your hours) you can practice medicine under the application 'board-eligible.' Once you've passed your exams, then you can be called 'board certified.'
In Canada, you can't practice at all unless you pass your board (Royal College) exams. The exams are reputedly harder in Canada as well (from those I know who have written both).
I don't care what they don't know.
I give multiple choice exams with between 100 and 200 questions, and 4 possible answers.
Wach correct answer is worth 2 points; they need to answer 50 correctly to get 100.
They don't HAVE to answer any question, or any number of questions. If they can answer 30 questions, they can get a D. Any question answered incorrectly is -1 point. This serves two purposes.
It prevents guessing, and it forces the student to consider whether they actually know the answer, or just think they do.
I typically give 4 of these per semester. After the first one I usually get several complaints because they're not used to testing in this way. After the second I usually get one or two stating they can't break the habit of answering every question. After the final, I get many compliments and high marks on my evaluations, and the students tell me they are much more confident in what they've learned than from any other class. I've had occasion to run across previous students from years past, and they claim they still remember more from my class than from others.
I've had administrators forbid me to do it this way. I did it anyway. When they saw the results, they relented, and many suggested the process to others.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
[i]For True-False exams for example, the number subtracted would most likely be (Number Wrong ÷ 2). Let's see how that would work out, for the sample case above. You, answering two questions correctly and guessing at 98 would be likely, on the average, to get 49 wrong, and so have a final score of 2 + 49 - (49 ÷ 2), or 75.5, while I, again on the average. answering only 1 correctly and guessing at 97, would get a final score of 1 + (97 ÷ 2) - ((97 ÷ 2) ÷ 2)), which comes out to be 25.25. Here there is a substantial difference between our scores, closer to the two-fold difference in our actual knowledge.[/i] Lets think about this, 51-24.5=26.5 not 75.5, further, knowing one would mean guessing at 99, not 97. 1+(99/2)-(97/4)=25.75 This means the avg. difference if adjusting for guessing moves from .5 (average score of 50.5 vs 51) to .75, hardly a substantial difference. Of course the numbers will separate out at greater levels of knowledge as he showed earlier, if one person can answer 50 and the other 25, the average scoes will be 62.5 and 43.75
Now he probably simply didn't check his math, but twice in the same paragraph?
Parent post gets the point, and states it better than TFA.
I think the original article/clueless-blog forgets to factor in a very important fact: in well-designed multiple choice tests, the test has questions across a wide range of difficulties. Say you have a test with 80 questions, divvied up into four levels of difficulty (easy, moderate, difficult, impossible).
If you've mastered half the "moderate" material, you have an automatic ten question advantage over those who know only the easy material. Even if you're only equally adept at guessing the answers to the other questions (fifty for you, sixty for your opponent), it's very unlikely that your opponent will guess well enough to match your score.
Also notice that he insists on talking about tests with only twenty questions, even though most tests are significantly longer. Short tests certainly offer the best chance for guessing your way to a good score. But the only test I've ever seen that was that short were the Novell Netware certification tests I took in 1999, and those were adaptive (read: the computer administering the test was selecting questions based on how well you were doing. The longer the test, the more questions you were getting wrong.)
You want the truthiness? You can't handle the truthiness!
A well designed test wouldn't be all multiple choice!
There are problems with his analysis. One problem is, the "examples" he cites don't actually exist. I'm guessing that there is no professional test out there where an unqualified candidate would know only one fewer question than a qualified candidate.
The author seems to suggest that a "hard" test is one where every question is brutally hard, and only a true zen master would answer a significant number of questions based on knowledge rather than guessing. In fact, well designed tests try to stagger the difficulty of questions to provide for maximum discrimination between candidates of varying levels of knowledge. There will be a handful of very easy and very difficult questions, but most will be at about "moderate" difficulty (think of it as a bell curve). So in reality, the difficulty of the test is governed primarily by the cutoff score. The higher the score, the less likely the easy questions and judicious guessing are going to save you.
The more knowledgeable candidate should be better at guessing questions he doesn't know. But even ignoring this, even as little as a five question advantage to the more knowledgeable candidate is huge. Say that X knows 25/80 questions, and Y knows only 20/80 questions. Each question has four answers, and each candidate guesses randomly on questions he/she doesn't know. We expect X to wind up with a score of around 38.75, and Y to end up with a score of 35. Even with most of the questions being guesses, there is a 75% chance of X winding up with a better score than Y. If he knows 30 questions, the odds raise to 95%. If he knows 35, the odds surpass the 99% mark. Because the difficulty of the questions are staggered, any two candidates with different amounts of "tested knowledge" are going to have a noticeable difference in the number of questions they can confidently answer.
It's one thing to claim that most tests would benefit by punishing guessing. But the author goes further, dismissing any test that doesn't punish guessing as utterly meaningless. I don't think that anyone who understands the probabilities involved could honestly describe them that way, "former background in mathematics" or no.
You want the truthiness? You can't handle the truthiness!
Ask anyone with a teaching license and they will tell you that there is a lot of debate over what multiple choice tests actual test, knowledge of the material or ability to take tests. There are a lot of educator who argue that essays or portfolios are a more accurate measure of how much someone knows than multiple choice or true or false tests.
http://www.popularculturegaming.com -- my blog about the culture of videogame players
each correct answer worth c
and each incorrect answer worth -i:
If you have no idea which answer is correct, and (n-1)*i < c, then guess.
Likewise, if you can eliminate some of the answers, so you are only choosing from m possible correct answers, and m < n, then guess if (m-1)*i < c
It got me a full-tuition scholarship to an ivy-league school (where I learned to hyphenate adjectival phrases). Your results may vary.
In any case, please be sure to join the slashdotters posting 'you are an idiot' message on the article's comments. It's important to keep the slashdot's reputation as the premiere internet home of arrogant assholes.
Not all knowledge is created equal, and that's one point that many "hard" exams and certifications miss.
That might be less of a problem than you think. See these comments.
If a lawyer doesn't know the intricacies of Melchett vs The Vatican, who cares? In the unlikely situation that they need it, they can google it. If they don't understand Habeas Corpus, on the other hand, they're just unfit to be a lawyer at all.
This is a common misperception about law. It's actually more important to know the laws and cases than abstract concepts, because the concepts are defined solely by specific laws and cases. In applying a concept you must always provide a citation. The best lawyers are those with a giant capacity for remembering specific laws and cases, and applying them to current situations. A general grasp of concepts is useful in writing about law for the general public, but actually not that useful for practicing law.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.
can be hard and they don't mean that much as they are easy for people to cram and pass them with out have a clue about how to do the work and they cover things that you do not see / use in the real world or they do things in a way that is not the best way to do it.
"'The test was very hard,' the medical specialist said. 'Only 35 percent passed.' 'How did they grade it?' I asked. 'Multiple choice,' he said. 'They count the number right.' As a former mathematician, I immediately knew the test results were meaningless.
He/she must not have been a very good mathematician. They're assuming that the reason that only 35% passed is because each individual question was very hard, in which case their argument is correct. However, it's more likely that each question is relatively easy, but to pass you have to get almost all of them right. Since it's impossible to know which of these situations was the case based on what the doctor said, the former mathematician couldn't have known the test results were meaningless. In my experience taking the MCAT, med school tests, and USMLE licensing exams, they're usually composed of many easy questions, and you have to get almost all of them right. That sort of mimics what being a doctor is like, which is that most of the time the diagnosis and treatment is fairly straightforward, but the tolerance for mistakes is very, very low.
I demonstrated to the very young "professor" that by changing placement alone of various factoids my grade could have been anywhere from a high B to an F. What made matters worse, he declared that the bell shape curve of the outcomes validated his scheme. I couldn't quite get across the bell shape curve that, for instance, loaded dice create.
My adviser agreed with me but stayed out of the dispute. I am still angry!
Yeah, and what if you don't have a Java compiler?! Tests should require you to demonstrate you can compile into bytecode! And what if you don't have a computer?! You should have to be able to execute said bytecode. What if you you're forced to do it in zero-G? Can you do it?
In the real world, 'skill' is 'How well can you use tools to desired effect?', not 'How well can you operate without tools?'. Why? Because when people do things, they use tools. All the tools they can, to make their job easier.
If corporations are people, aren't stockholders guilty of slavery?
If nobody has suggested this until here:
What about a plurality of answers being potentially correct ?
Let's say 4 alternatives; and 0,1,2,3,4 may be correct.
Now we could consider
- the answer correct in case of all ticks being correct (resp. correctly unticked)
- to allocate partial marks: '+' for correct ticks and '-' for incorrect ones
At least, in both cases guessing will deliver close to nothing.
The blogger touts his math/stat skills and then argues that multi choice scores are a fraud. Like many self proclaimed experts, this one falls short. He posts a formula without variables and show the wrong answer.
Worse as a supposed stats expert, he also quotes the formula for guessing incorrectly.
He doesn't mention that standardized tests that use the "guessing formula" do not require one to guess. If you know only two answers and answer ONLY those questions, there is no penalty for unanswered questions.
Also his extreme examples aren't the ones to support his hypothesis. His primary two examples were both 100 True/False questions. In on "extreme" example one person knowing one answer and the other two. That case, regardless of the math we know on average the more knowledgeable person. His second example on this test was comparing two people. One knowing all and one knowing half. Aganig applying the guessing formula widens the delta but we still know who's more knowledgeable.
The example that exposes the fraud is a 100 question T/F test where one person knows 50 and marks guesses for the other 50, while the second person knows 64 answers and doesn't guess leaving 35 questions unmarked. Person 1 is going to average a score of 75, frequently a passing grade, while the more knowledgeable person scores a 64, often a failing grade.
However the blame here lies with the test preparation. If there is no "guessing" penalty for wrong answers, then all test takers should guess on all unknown questions. If all do then that person that knew 64 answer will on average score an 82 beating the person who knew only 50. If there is a "guessing" penalty for wrong answers, then whether or not the test takers makes blind guesses is irrelevant. AS another reply to the blogger points out, knowledgeable people rarely are blind guessers and thus should guess as they are likely to beat the odds of the guessing penalty.
If there is a fraud, it is if the standard for passing is so low that a person making random guesses can pass the exam one out of three or four times.
Well, since you are much more of an authority in this subject area than most of us on Slashdot, perhaps you could give me some insight on this little conundrum?
If a man is walking in a forest, and he's talking to himself, and there are no women around, is he still wrong?
I just wanted to say, I think that 95 percent of all exams are cop-outs, whether issued so deliberately or just because they were lulled-up that way. This is not including 'take-home' exams. In a perfect world, rather than spend all the resources we have on lawyers, advertising, physical distribution of virtual goods, cash registers their operators, and who knows what else, we could have more people compensated to learn how to teach and have them teach and spend time assessing students individually or in smaller groups over longer periods of examining, paying attention to who they actually are and what they have to say. And even in this perfect world, there would still be more room for people to become teachers through the returns of what that education gives back. To those who say that machines, computers, paperless offices and trust-based systems take jobs away from people, who might also say that exams are the natural result of logistics, I say - please consider the nature of what education provides a society and how far a human mind can actually go. And please do not give up.
The typical college entrance exams I took had 4 choices each and 4 points for a correct answer and -1 for a wrong answer. The thing was in a lot of the cases you may not know the correct answer but if you know something about the subject 1 or 2 of the choices are obviously wrong . Now if you eliminate those and guess amongst the rest your chances are much higher than simply not answering a question you don't know the answer to. Mathematically with 2 choices eliminated you have a .5 chance of guessing right so an expected value of 4*.5=2 -1*.5=.5 = 1.5 as opposed to 0 for not attempting. I think this kind of guessing is fair as you are getting rewarded for your partial knowledge which lets you eliminate at least the nonsense solution.
**Life is too short to be serious**
I once took a test that I knew absolutely nothing about, and got the best score of anyone taking the test!
...
...
:( But I got the best score out of the 100 or so people taking that test at that site.
I was in high school, so this is almost 30 years ago. Also, I was the brightest kid my school had ever seen, literally. Okay, not all that hard when a typical graduating class only has 50 people in it (this was a rural school system), but still
In my junior year (that is, grade 11), they decided to have me represent the school in all sorts of competitions. Math and science - math was really my specialty, I had taken every math class they had to offer by this time, but I also did well in science. So anyway, this contest comes up in which you're allowed to take 2 subjects. Of course I took math and did okay. Not spectacular, my school didn't offer Calculus or other advanced math classes, but respectable. And since no one else in the school was willing to take the test in Physics, I took that as well.
You have to understand, my school didn't even offer Physics until grade 12. I had taken general science 2 years earlier (and got such a good score that it made everyone else look silly), then Biology and was at the time taking Chemistry, but Physics wasn't until the next year. I figured "How hard could it be?" Well, I got the test, and maybe had an idea how to work out one question.
I wasn't going to leave the test blank. Like the college entrance tests, they actually assign a negative score to wrong answers so that there's not supposed to be any advantage to guessing, but I wasn't going to sit there for an hour and do nothing. So I started looking through the answers
Now that I've been a teacher also, I know about (theoretical) test design. You're supposed to include a couple of reasonable-sounding but wrong answers (referred to as "distractors") to catch the people who have some idea what they're doing but are trying to be lazy, and a couple of completely wrong answers - and of course the correct answer. I was able to eliminate the completely wrong answers, then look just at the others and determine which one had to be the correct answer on over 90% of the questions.
In one sense it didn't work. I still got done with 20 minutes left and had to sit there with nothing to do for the rest of the time.
This was actually a national contest, but I won't give the name. Fortunately, there were people who beat me at other locations, it would have been really embarrassing if I'd one the national competition just by guessing.
And needless to say, I never gave my students multiple choice tests (in Math, that was my subject after all). I know from experience that multiple choice tests are worthless.
Can't speak for law school but virtually every exam my wife took in med school as well as for her licensing exams was multiple choice. She informs me that most if not all med schools in the United States give the vast majority of tests in multiple choice format. The main exceptions are practical examinations where multiple choice is not an option.
Their real purpose is to hinder competitors from entering the market.
If you are worried about quality of care, than transparency is they key. Transparency allows the customer to weed out incompetent or inexperienced practitioners.
I agree with everything you said except this part:
"A multiple choice question might only have one right answer and its point value is the exact same as that of something much easier (especially, when on the harder on, the wrong choice might even be 'righter' than the correct choice on the easy question) -- but thats why there is an entire field of psychometrics out there to ensure that these sorts of exams are doing what they say they are."
Seems to me like that is more an example of psychometricians being forced to accept a less than valid form of test scoring. The proper way to do things has to incorporate Rasch's principle that the likelihood that a given test-taker will give the correct answer (on a question that is valid for the quantity it being used to measure) depends on the product of the easiness of the question and the ability of the test-taker. For that matter, lumped scores (pass-fail, ranking, or absolute) on professional proficiency exams - which by their nature must test disparate quantities with various non-linear contributions to professional qualification - cannot properly be interpreted as measurements of anything without a well-thought out unified criterion that describes the contributions and dependencies of the various quantities measured by the questions to the overall measurement of professional competence.
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
"Tests don't prove you know anything; they only prove you know how to take tests!"
If you disagree with me on social issues, then it's pretty clear that you are a narrow-minded bigot.
AAAPIT (I am a psychometrician in training). He clearly knows nothing about psychometrics, and is pretty much a fool for assuming that the people who put together the tests have never bothered to think about such elementary problems. There is well-developed statistical methodology behind the scoring of standardized tests. Most licensing tests these days are put together with Item Response Theory, which gives the test developer a very precise idea of how much of a role guessing plays in each question. (You might be surprised to find that the floor guessing parameter is not just based on the number of choices; it varies depending on the details of each question). IRT also yields a test information function that lets you see how much information the test is giving you along the range of ability levels. The argument he makes about deducting fractions for incorrect answers (known as "formula scoring") is BS, because no standardized test ever reports just the raw score. Different forms of the test differ in difficulty, and so must be equated to one another. In the process, raw scores are converted to scaled scores, and the conversion is typically not a linear one. Formula scoring results in lower raw scores than if you don't apply the penalty (dichotomously scored), but all that means is that the range between the lowest and the highest raw score is a less with the dichotomously scored test. If that range is too small, you can always add more questions. Suppose you took two versions of the same test, one dichotomously scored and one with formula scoring. (Assume for the purposes of simplicity that there's no measurement error.) Yes, you would get a higher raw score on the dichotomously scored test, but so would the whole test-taking population. Your percentile rank would not change, and the scaled score would work out still be the same.
There's a very easy solution. Require essay tests. Make sure at least 10% of your class doesn't complete the test (but A students get done 20-30m early on a 2 hour test).
It works like a charm, and weeds out exactly who needs weeded out.
He're a hint. After the first exam of a semester in a weed out class, the intstructors can predict with a 98% accuracy your final grade. Your test taking skills mean shit. If you're not bright when we talk to you, if you're slow on the uptake, you won't do well. The tests are designed to avoid giving you passing grades. Of course some people bleak the mold, but these people are always super-intelligent and have to put an absurd amount of effort in.
Then, the rest of the semester is really about teaching material, the scores don't really matter that much.
Let me tell you a story. When my parents bought a ZX-81 with 1K RAM back in the day, that thing didn't even have enough memory for an assembler. I learned assembly by translating it all in hex by hand. I had a big notebook with all combinations of opcodes and registers, and their hex codes. Forget writing "for" loops or even "goto", you had to actually count bytes by hand to do a jump.
Or did I tell you about the time when a PHB gave me a computer with a compiler, but literally no editor? (Not even EDLIN.) Yeah, we had to do with a disk editor until that was sorted out, because the alternative was to sit and twiddle thumbs. Even if with a damn good excuse.
So I _can_ do, and did do, without even the "crutch" of a compiler or assembler or even a text editor. Can _you_?
That said, I genuinely don't miss those days. They're not some "good old days", they're days when I wasted time on stuff that a tool would have done better. That was wasted time. There's a reason there are better tools nowadays, and that is that they genuinely make you more productive. They let you focus on the things that actually _matter_, like algorithm and design, not on the mechanical bullshit that a compiler or assembler does better or faster anyway.
_That_ is what makes a good compiler: algorithms, data structures, patterns, and knowing how to use a tool or library for the rest. Doing stuff by hand that the IDE or compiler does better, that's not a reason for pride, it's a waste of time and (employer's) money.
It's like hiring, say, a gardener and discovering that his grand reason for professional pride is that he can mow the lawn with some small scissors, instead of relying on the "crutch" of a lawn mower. Well, who cares? He's still doing a crap job and wasting more time than someone else. If the tools do that faster, freakin' use them. In fact, if a gardener actually did that, you might even suspect him of fraud: that he's deliberately wasting time so he gets paid for more hours.
A polar bear is a cartesian bear after a coordinate transform.
First off, this is cheating. I'm responding to a comment made at Blogspot here on Slashdot, but there's no way I'm creating an account over there just for this and besides they don't even have threaded conversations. Instead I'll quote the guy from Blogspot's comment here in full and then basically say he's full of shit.
Aaron said...
I am sorry, but as a psychometrician (i.e. someone who writes multiple choice tests and interprets the results), I have to simply chime in with this:
We know. That's why we don't just count correct answers.
Any major test (GRE, LSAT, TOEFL, TOEIC, etc.) uses some kind of item response theory (IRT) to determine the score. This means that the final score is actually the person's ability, given their performance on the items, which are weighted differently (to put it VERY simply) according to people's performance on them. It doesn't matter what easy-to-read numbers the test gives you as your score; your REAL score is a number between 0 and 1. Sometimes that number is rescaled to the actual number of items that were on the instrument to give people the illusion of a classical MC test.
Another point is this: Remember when you took your SAT (I think it was)? They told you not to guess if you weren't sure about that answer, right? The reason for that is that with a really well-worn and robust test, the developers have been able to figure out who picks which distractors, and can therefore derive further meaning from whatever option you choose. So instead of a simple binary item (right or wrong), they can create a partial-credit item. Say "A" is the right answer, but people who are pretty smart seem to pick "B" a lot. So maybe the stats will assign a value of 0.5 for that one. Maybe "C" is just a throwaway distractor and doesn't mean anything other than you missed the question. But what if "D" turns out to really distract total morons? The stats might end up assigning a NEGATIVE value if you pick that. So read the test specifications before you take a big test. If they say not to guess, that's why. What you don't know can actually hurt your score more than just skipping it.
Look into the Rasch model and multi-parameter IRT. It's late and I actually need to develop some questions tonight (no kidding!), so I leave it to you and Wikipedia.
So to sum up: Basically, you are right about the problems with MC tests, but wrong about how much this affects people's lives.
June 17, 2007 4:06 AM
So, as I was saying --bullshit.
I'm also a writer of GRE/TOEFL practice tests and I am quite sure this is not true. This was true for the TOEFL, but only for a few years. With the advent of the computer based TOEFL in 2000 there were weighted responses and the successful implementation of this feature was one of the primary differentiations between software practice test products that were published at that time such as my own which you are welcome to buy on Amazon but I'm sure you won't if you're already reading this in English.
However, that computer test was dropped in favor of a radically redesigned test in 2005 --another reason you probably won't buy it at Amazon-- in which ETS specifically documented that they were dropping weighted scoring entirely. This was specifically stated in documentation from ETS and it was distresing to me because I was offering one of the few projects that had a reasonably accurate weighted scoring system so I am absolutely sure of this. It cost me money big time.
As for GRE, well this is location depdendent. In some locations the GRE computer based test still uses weighted scoring, but in most of Asia that test is no longer offered and a non-weighted test is currently the only choice. The reason the
...but my limited math skills are all going red-flag on me at the moment:
For True-False exams for example, the number subtracted would most likely be (Number Wrong ÷ 2). Let's see how that would work out, for the sample case above. You, answering two questions correctly and guessing at 98 would be likely, on the average, to get 49 wrong, and so have a final score of 2 + 49 - (49 ÷ 2), or 75.5, while I, again on the average. answering only 1 correctly and guessing at 97, would get a final score of 1 + (97 ÷ 2) - ((97 ÷ 2) ÷ 2)), which comes out to be 25.25. Here there is a substantial difference between our scores, closer to the two-fold difference in our actual knowledge.
OK, forgive me for RTFA, but how is 2 + 49 - (49/2) equal to 75.5? My trusty calculator tells me this is 26.5, exactly one point higher than the second example -- as I would expect.
The entire argument is fallacious...I know twice as much as you, so much that I get 100 questions right, you get 50 right and guess at the other 50...50 + 25 - (25/2) = 62.5. Not quite a 2:1 ratio there.
While I agree with the author's premise that guessing should be penalized, he does a terrible job proving his point.
The real fraud of that sort of test is that the number of passing grades is set first, then the pass/fail cutoff is moved to meet that figure. If few take the bar exam, a drooling moron may pass. If many take it, being well qualified isn't good enough.
It has to be long and hard so moving the cutoff can provide fine-grained control on the number who are admitted into the profession.
The way we assess future professionals may be wrong: We give them a piece of paper or sit them in front of a computer screen full of questions, and ask them to either choose from multiple answers or write down their own answer. However, few of these professionals will ever need to do exactly that in their actual jobs. In essence, we benchmark candidates by asking them to do something they will rarely do in real life. The results are easy to predict: Some will learn how to pass tests without exhibiting real-life performance, while others will be able to do the job but fail on the test. In the end, tests seem to mainly assess the candidates's patience and conformity to social hierarchies.
One gets a feeling one's in the wrong crowd after seeing what happened to his comment thread after Slashdot reported this: a genteel and thoughtful chat becomes filled with increasingly crude, uninformed and insulting remarks.
Maybe I don't want to be here....
Doesn't it strike anyone as odd as to how inefficient the education system must be to produce such a high failure rate? The screening process that admits candidates to these elite professional programs must be broken too, as it obviously allows too many candidates that just can't cut it into the programs. On the other hand, maybe it is just the testing process is broken...
For instance, I'm a testing person, but not a content person (i.e., I design towards what the stats tell me, as well as the actual wording and structure of the exam...I always work with someone who understands the content areas from a very advanced level and can deal with that end). One of the last MC exams I was helping validate, I knew NOTHING about the content -- it was a medical exam. First thing I did was go through the entire exam, read all the questions quickly, and see if logic could remove any of the answers. Statistically, I would have gotten a 20% by random means, but in this case, I received somewhere around 43% (if I remember correctly). The educated guess is a BIG part of these things...you aren't just measuring content knowledge, but application and that means if someone can raise the bar, they might actually do well in the real world.
If you knew NOTHING (your words) and you could get 43% through logic, in what SHOULD have been 20%, then I think you prove the author's point even more. How good is a 5-choice multiple choice test if someone with ZERO knowledge can score 43% by applying logic/common sense ? It sounds like what you are describing is the exact opposite of an educated guess
JWall: GUI client for IPTables
I have had all kinds of experience. Some of it a little strange.
A couple of things I did around the end of law school bear mention here.
I probably should not discuss it, but I helped calibrate the Multi-State Bar exam, during my third year of law school. Most lawyers will scream bloody murder, that I should have been allowed anywhere near the data.
It is not like it sounds. I was working with a real psychometrician. He knew the statistics and methodology, and I knew the practical parts of computer systems. We both knew SAS, very well.
(Statistical Analysis System - it is its' own little language. In many ways, the language is an improvement over languages like Fortran, and I *like* Fortran)
The data was double-blinded. Neither my friend nor I saw the questions or any of the answers. Someone else handled that part. All we knew was that for each one of the thousands of examinees, for each question, whether or not the examinee got the correct answer or not. The order of the questions was also scrambled, so we did not even know the order of the questions as they were taken by the examinees.
FWIW, for the Multi-State Bar in my State, and many others, only one thing counts: the total correct. Nothing is taken off for wrong answers. A passing score is much higher than 25% of the total. I do not recall now, but 60-80% is the neighborhood of correct answers to pass (actually it was a combined score from the written and Multi-State, but if you got only 25% on the Multi-State, you failed, period.)
Bad statisticians get crappy results because they make wrong assumptions. Whoever the guy is that wrote the article, never let him do your statistics. He makes assumptions that competent psychometricians know are false.
I know the article's assumptions. I made them myself until I worked with my friend.
Strange things happen with really good test questions. This is not all, but most.
First, some guy randomly guessing, say, by going down the questions and always taking the first answer, will fail. Even if he is incredibly lucky and is nearly three standard deviations out (one of the very unlikely possibilities from a uniform distribution), he will still fail the exam.
Second, if his answers were educated guesses instead of blindly picking from a, b, c, d, or e, then his chances of getting the correct answer went down.
You cannot even take the exam until you graduate from an accredited law school, or you practice in California. *Every* single person that took the exam made at least educated guesses on most of the answers.
(One of the top guys in my law school class decided he was going to give the psychometricians a heart attack, by answering every single question correctly. He bragged about it before the exam. The smartest guy I ever met, bar none. Later he said that at the end of the day, he looked up, realized that he had twenty-five questions to go, and there were five minutes left. He *blindly* answered the last twenty-five questions (all with (b), IIRC) and turned in his exam. He passed, of course.)
My friend and I could sort of tell the order of some of the questions, in spite of the double-blind. The more difficult questions, in the last fifty or sixty, clearly had a higher random component of correct answers. Taking into account the difficulty of the question and the size of the random component, we felt confident that we could identify and order three-fourths of the final fourty questions.
Difficulty == number of examinees getting a correct answer, adjusted for their relative ability to correctly answer all the other questions. For small numbers of examinees, this is perilous. Our sample set was more than ten thousand, and verified against results from previous years going back. Those answers, in turn, were sampled, then diligently validated against LSAT scores (a law school entrance IQ test), law school grades, relative difficulty of law school, undergraduate grades, and personal inquiries to indiv
All is paradox. Retired lawyer, so this is just one more layman's opinion.
To paraphrase, he said that decent tests count the number of right answers you get, but really good tests also count how many times your answer is, "It depends."
The man who does not read good books has no advantage over the man who cannot read them. - Mark Twain
There is no way to hide any lack of knowledge
Right, because this is an iterative, custom-fit exam. They assume you generally know your stuff, since you passed the writtens; they don't care about where you're strong, they want to know where you're weak, and how weak you are. As soon as the examiners start to smell the whiff of ignorance in an oral exam, they pursue it mercilessly, and work together to explore the depths of your particular areas of ignorance the same way a tag-team of sadistic dentists will use an array of very small and very sharp bits of steel to dig in and thoroughly explore a bad spot on a tooth. God help you if you give them an especially juicy target to work on, or if you give them more than one.
(Shudders when thinking back on doctoral oral exams.)
The man who does not read good books has no advantage over the man who cannot read them. - Mark Twain
My classmates proved this in High School, inadvertently. Our Chemistry class was notoriously hard, and graded on a curve. For the practice final one guy answered 'C' for everything. Another answered randomly. They both finished fairly early, to the chagrin of the teacher, but did fairly well on the final curve. If I recall, "C" beat random. This test did not include the wrong-answer penalty like the SAT.
Why, oh why, didn't I take the Blue Pill?
As a psychometrician, I must disagree with his post and examples. When latent trait (in this case, knowledge of the subject, or ability) is estimated, difficulty of each question and probability of guessing are taken in consideration. As a mathematician, you must be familiar with Item Response Theory (IRT) and Rasch mode, and its modifications. Even if IRT is not used, extremely difficult (like those in your example) or easy items are usually not included into tests, since they do not have any informational value, and guessing parameter is considered when scoring the responses. Konstantin Augemberg (konstantin at augemberg.com)
Under your system, if I am over 33% confident in my answer, it is still to my advantage to make a guess. Maybe that's the effect that you're going for, but being 34% confident in myself is not enough for me to claim to "know" something.
Imagine the consequences. If I were taking your test, any time I can eliminate just one choice, my expected value (or penalty) for guessing is 0, assuming I don't have clue #1 about the other choices. But if I actually took your class, I would hope that I would at least have clue #1, so any time I could eliminate one choice with confidence, I would take a stab at answering the question.
Looking at it a different way, let's say I'm a slacker and I only know 50% of the material on your test. What score would I get if I took your test? You are hoping that I'll get 50%, typically a failing grade (if I only learned half of what I was supposed to, I'd say failing is appropriate). But in reality, I would expect to pass your test.
Why? Well, I know 50% of the material, so I'm going to get 50% on your test based on that alone. But the story doesn't end there. If I know 50% of the material, I should be able to eliminate two of the four choices on the questions for which I do not know the answer. That means that I will expect to get half of the remaining questions correct (and half incorrect, of course).
On a 100 question test, I will get:
50*2=100 points for my 50% mastery of the material
25*2=50 points for my "good" guesses
25*(-1)=-25 points for my "bad" guesses
That gives me 125 out of 200 points, or 63%. Nothing to post on the refrigerator, for sure, but I passed, eh?
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
I have a friend who has - in my presence - suffered from varying forms of seizures and episodes. A few times she almost fainted in my arms, and once her eyes glazed over and she was making weird noises and slumping over until I carried her over to a chair (she came to after awhile, but never remember in the time she was "out"). According to her doctor, it was simply because she was too tall and sometimes not getting enough blood circulation to her head (despite no BP issues), no further tests, no prescriptions.
I'm dreading one day where I'll hear she has had a serious accident due to a seizure. I've had little luck helping her find another doctor either as *none* want to contradict a fellow doctor's diagnosis...
I don't think it's #1, since we never named her doctor to others, and #2 doesn't apply since it's Canada and we have public healthcare. I'm not sure about #3, but I would think that another doctor might be willing to *see* the patient before making that assumption.
These were also different doctors in different areas of town, but the impression I got from mine is that the various medical associations frown upon one doctor overruling another's judgement, even if the first was wrong.