Closed-Source Tests
The NYTimes has a lengthy expose of the actions of a company that creates and administers standardized tests, one destined for RISKS Digest very shortly. A bug in their software sent students to summer school and resulted in teachers and superintendents being fired from their jobs, even though the company was notified of problems early. It's a fascinating story of the risks of going with a closed source vendor - how the company acts to perform damage control, lies, stalls, compartmentalizes the damage by telling each complainer that they are the only one experiencing problems, and finally, most of a year after being notified of the problem, fixes the bug. (It's a two-part series - the first part discusses problems with human scoring of tests.)
From what I understand, Opensourcing the thing wouldn't have done a damn thing.
:) with different patterns from the different version tests (I think a set of these put the students back about $5k each).
In my day job, I am Manager of Development at the Indiana University - Purdue Universities Testing Center. I've read quite a bit on this and have evaluated these guys software and didn't care mch for it (could be that my own software comes up with higher predictors than theirs and was much more flexible). With adaptive testing like their own (and this is all in laymens terms lest one of the wanna be psychometricists wants to correct me), ya build the item database, calibrate it, evaluate it and then calibrate it some more. Real testing may be going on in all this time, but even static items will be somewhat liquid in their numbers over years times.
Unfortunately, companies like this like to change as many questions each year as possible. Doing this means that you will have better test security, but your items may not have all the correct weighting behind them. How does one Open Source this without loosing all data ya need to make this stuff adaptive. With standard testing, ya may ask 200 questions and a lot of times you are simply measuring a persons ability to do lots of work in a set amount of time. Adaptive testing uses a lot of calculations to figure out what ya know and what ya don't know. Instead of the 200 questions, ya might get 20 (or less on some of the new standardized ones) that you are free to take as long as you'd like.
If the person taking this knew even a few of the questions they got before hand, this would throw off the entire test. If you don't think folks cheat on these types of tests, you are an idiot. There are school systems that have gotten ahold of written tests and drilled their students on the exact questions presented. On the high stakes testing, we find folks that will go to such lenghts as to take the test on the east coast in the morning under ficticious names, fly across the country to California and retake the afternoon test. There was a case where Law Students were memorizing one question each and as soon as they were outta the test, would cell phone in the questons, and someone would be selling code keyed pencils (we have one of these
Anywho, no amount of Open Sourcing would have helped. Bad Software wasn't written, a bad analysis of the data was probably done. OS is not the answer to everything in life...
Clif Marsiglio
HTTP://ASSESSMENT.IUPUI.EDU
A school district might not be able to justify the money to check a system, but I suspect it could not justify using a system with known errors and would have an interest in getting it fixed.
Perhaps. I always give my students answer keys after I give them a test. But for big tests such as are described in the story, I would worry that having to change all the questions several times every year would introduce so much opportunity for poorly worded, misleading, or fundamentally flawed questions that this risk would outweigh the current risk of incorrect answer keys.
There is also the cost question. If companies had to rewrite the whole test every time they administered it (because they would publish the answers), then the costs would rise sharply and the tests would become less affordable to struggling school districts (or we would see large tax increases to pay for the tests). The benefits of increased test accuracy might not justify the cost to the taxpayers.
None of these considerations apply to the case of the software used to grade the tests. There seems to be little risk that understanding the way the grades are curved would enable cheats other gamesmanship. Since this is not networked software, the potential for attacking it from another computer is small (social engineering is still a risk, though, but no more so for open source than for closed-source software).
Thus, although it would be tempting to put "Open Answer" on par with "Open Source," the first seems impractical and the latter seems well suited to the problem at hand.