Slashdot Mirror


Closed-Source Tests

The NYTimes has a lengthy expose of the actions of a company that creates and administers standardized tests, one destined for RISKS Digest very shortly. A bug in their software sent students to summer school and resulted in teachers and superintendents being fired from their jobs, even though the company was notified of problems early. It's a fascinating story of the risks of going with a closed source vendor - how the company acts to perform damage control, lies, stalls, compartmentalizes the damage by telling each complainer that they are the only one experiencing problems, and finally, most of a year after being notified of the problem, fixes the bug. (It's a two-part series - the first part discusses problems with human scoring of tests.)

19 of 122 comments (clear)

  1. "But Johnny... by Anonymous Coward · · Score: 4

    ...what are these pencil-scrawled changes on your report card?" "Corrections. Just a software bug. Will you just sign it already?"

  2. Pareto analysis of answers. by bstadil · · Score: 3

    The story has an example that the Answersheet had 6 wrong out of 68 questions. As incompetent as this seems its mindbuggling that they do not pareto the answers and flags where the "right" answer is not the highest ranked. Much better strategies can be devised but we are not talking rocket science here. Letting the schools have access to the data coupled with a few lines of perl would fix thes kinds of problems.

    --
    Help fight continental drift.
  3. "Obviates"? by VValdo · · Score: 3

    I know this is one of those big words everyone likes to use on slashdot (along with "obfuscate" and a few others), but since this is a thread on education, I hope you'll forgive me.

    -----
    obviate
    v. tr. obviated, obviating, obviates.

    To anticipate and dispose of effectively; render unnecessary.
    -----

    Maybe you meant "illustrates" or "highlights" or "illuminates"?

    W
    -------------------

    --
    -------------------
    This is my SIG. There are many like it, but this one is mine.
  4. Re:it's only good if you read it by clifyt · · Score: 5

    From what I understand, Opensourcing the thing wouldn't have done a damn thing.

    In my day job, I am Manager of Development at the Indiana University - Purdue Universities Testing Center. I've read quite a bit on this and have evaluated these guys software and didn't care mch for it (could be that my own software comes up with higher predictors than theirs and was much more flexible). With adaptive testing like their own (and this is all in laymens terms lest one of the wanna be psychometricists wants to correct me), ya build the item database, calibrate it, evaluate it and then calibrate it some more. Real testing may be going on in all this time, but even static items will be somewhat liquid in their numbers over years times.

    Unfortunately, companies like this like to change as many questions each year as possible. Doing this means that you will have better test security, but your items may not have all the correct weighting behind them. How does one Open Source this without loosing all data ya need to make this stuff adaptive. With standard testing, ya may ask 200 questions and a lot of times you are simply measuring a persons ability to do lots of work in a set amount of time. Adaptive testing uses a lot of calculations to figure out what ya know and what ya don't know. Instead of the 200 questions, ya might get 20 (or less on some of the new standardized ones) that you are free to take as long as you'd like.

    If the person taking this knew even a few of the questions they got before hand, this would throw off the entire test. If you don't think folks cheat on these types of tests, you are an idiot. There are school systems that have gotten ahold of written tests and drilled their students on the exact questions presented. On the high stakes testing, we find folks that will go to such lenghts as to take the test on the east coast in the morning under ficticious names, fly across the country to California and retake the afternoon test. There was a case where Law Students were memorizing one question each and as soon as they were outta the test, would cell phone in the questons, and someone would be selling code keyed pencils (we have one of these :) with different patterns from the different version tests (I think a set of these put the students back about $5k each).

    Anywho, no amount of Open Sourcing would have helped. Bad Software wasn't written, a bad analysis of the data was probably done. OS is not the answer to everything in life...

    Clif Marsiglio
    HTTP://ASSESSMENT.IUPUI.EDU

  5. Cross-Checking by Detritus · · Score: 3

    Why didn't the company validate the tests and the scoring process before releasing them to the school systems? There can be errors in the code, requirements and statistical models and techniques. They could have given the new tests to a sample of students along with reference tests that are never widely distributed. A comparison of the scores on the two tests should uncover any major errors.

    --
    Mea navis aericumbens anguillis abundat
  6. High Stakes tests are the problem. Open or Closed by PotatoHead · · Score: 3

    What is needed here is a peer process. Teachers get certified about what is important and what is not. They they evaluate the student as part of their classwork, and input to the class.

    Personality conflicts can cause problems with this, but if there is an appeals process of some kind, most of this can be worked out.

    Tests do not even begin to reveal a students achivements in school, or their worth to society. Peer review does.

    Imagine the students testing themselves. They know the requirements, let them work toward them. I am not saying let the students choose if they pass or fail, but make them involved in the process so they understand it, and can help each other.

    I know what I did in school. All of the really good stuff that mattered was not on the tests. It was the projects I did, and the papers I wrote, and the arguments I had with staff and friends.

    Of all the classes I have to say that Music and Drama were the most interesting from a testing point of view. These classes are peer reviewed by their nature. How do you know you are doing well? Do others say so? Did your performance at the play get some applause? Your teacher is a mentor in these sort of things. They take what is there and improve it. You will get an 'A' anyway, so why work hard at all? If things work the way they are supposed to in school, the teacher gets you motivated, and sets direction, your peers give you someone to work with and achieve goals and share success. Standardized tests totally ruin all of this.

    Point is simple. Teachers know the students best. Most of them actually care even if they are underpaid. Let them make choices, and help to form good citizens. Taking all the hard work, and boiling it down to one test is stupid. Even a genius will have a bad day. Should the rest of their life be changed because of it?

    Don't think so.

  7. Re:Closed Source? by ahunter · · Score: 5
    Sheesh, read the article. The very first paragraph states that someone outside the company had found out about the problem and notified the company, who promptly sat on it until it went bad.


    A school district might not be able to justify the money to check a system, but I suspect it could not justify using a system with known errors and would have an interest in getting it fixed.

  8. i knew it by TomL · · Score: 3

    i always thought that errors in standardized test scoring put me in a talented and gifted class (i got kicked out of it after two years due to bad academic performance, thank god, heh).

  9. Hmm, I wonder... by 11thangel · · Score: 4

    Could the same bug have resulted in my Computer Programming teacher being hired? I still find it quite odd that she is teaching a top level programming class, yet doesnt understand what a function is. I still recall trying to explain the word "filesystem" to her. Hopefully, the same bug will assist my english grade (hey, it couldnt get worse).

    --

    I am !amused.
  10. Systems without Oversight by AMuse · · Score: 4

    I say that this is just symptomatic of a much bigger problem in the first place: Computer systems not having the proper amount of human oversight.

    Credit bureaus rely on their use of the computing systems for pretty much everything, and look how hard it is to get any error fixed at all.

    This could just as easily have been a private prison company (Which most all prisons in CA are) accidentally sending your traffic-ticket offender to a high security felon bin for 20 years instead of a 6 month stint for not paying their bills.
    ------------------------------------------ --------

  11. Closed Source? by faust2097 · · Score: 3

    This has a lot less to do with closed source than it does with quality control in general. Just because someting's OSS doesn't mean anyone else has actually looked at the code. School districts aren't know for having budgets for consultants to check it out...

  12. Another UCITA and clickwrap issue by www.sorehands.com · · Score: 3
    If you look at the clickwrap and the UCITA, these damages from known bugs are limited.

    Think of the graduating senior who now has to go to summer school instead of working for tuition money. This could delay their degree by a year. All caused by a known bug!

    Should the software company be held harmless?

  13. it's only good if you read it by Frymaster · · Score: 4
    really, the problem here is that
    a) bad software was written
    b) it was closed source.
    an open source solution would only address this problem if the purchasers (ie the school administration) actually sat down and audited the code. not very likely, imho. not that i'm dissing open source... far be it from me to do that, but oss is only as good as the people willing to audit it. being in the process of writing an online test generator i can tell you that teachers and admins look at oss the same way they do proprietary software... the only difference is that its good for their budget and doesn't come in a box...

    nuff said.

  14. Closed answer tests also a problem by Phronesis · · Score: 5
    If you read part 1 of the NYT story, you find out that many more students were penalized by incorrect answer keys than by computer errors. Thus, if we are to trumpet open-source as the appropriate way to deal with risks of errors in the algorithm for normalizing percentiles for test difficulty, then do we also conclude that all answers must be revealed so that erroneous answer keys can be caught?

    Perhaps. I always give my students answer keys after I give them a test. But for big tests such as are described in the story, I would worry that having to change all the questions several times every year would introduce so much opportunity for poorly worded, misleading, or fundamentally flawed questions that this risk would outweigh the current risk of incorrect answer keys.

    There is also the cost question. If companies had to rewrite the whole test every time they administered it (because they would publish the answers), then the costs would rise sharply and the tests would become less affordable to struggling school districts (or we would see large tax increases to pay for the tests). The benefits of increased test accuracy might not justify the cost to the taxpayers.

    None of these considerations apply to the case of the software used to grade the tests. There seems to be little risk that understanding the way the grades are curved would enable cheats other gamesmanship. Since this is not networked software, the potential for attacking it from another computer is small (social engineering is still a risk, though, but no more so for open source than for closed-source software).

    Thus, although it would be tempting to put "Open Answer" on par with "Open Source," the first seems impractical and the latter seems well suited to the problem at hand.

  15. Academic reponse by Fros1y · · Score: 3

    What I found so horrifing about this situation was the response of the people responsible for the NYC catastrophe. It would seem that those in power were far more concerned with the politics of their job to even think about worrying about the children and subordinates that the test was slowly crushing. Perhaps the most brazen of this behavior was the administrators decision not to speak out about his concern because of the fear he had about his reputation.

    If we are ever to have upstanding and capable students who know not only logical though but also ethical beliefs about participation in society it is precisely this sort of leadership that we can do without. I'm very sorry that the tragedy of these tests hurt so many children and some many competant superintendants, but was this really one of them?

  16. Tip of the Iceburg by Papa+Legba · · Score: 4

    This is going to just get worse for kids. With press. Bush pushing to make standerdized testing nation wide this will become more and more common of an occurance.
    I live in Virginia, a state that implimented Standereds Of Learning tests (SOLs) years ago, The absolute paranoia that surrounds a test that "was just going to be used for monitorring purposes" is astounding. The schools have stopped pretending that they are teaching knowledge and instead spend all of their class time cramming facts down the students throats so they can pass the trivia quiz of the SOLs. br> It has gotten so bad that a local city has asked to extend the school year for kids just to prepare for these test. Once the test are out of the way the kids spend three weeks until the end of the year loafing in class as the teachers have no reason to give them a final, they already had it and passed their SOL.Just and example of how the schools are warping to fit around the SOLs , soon they will be the official final. This is the only outcome you can expect when a teacher and administrators depends on their raises based on how their school district does on the SOLs and their jobs depend on how well their own classes do on these tests. School should teach kids how to think and solve problems, not how to regurgitate facts at the drop of a hat, facts that can be easily found in a book or on the web if you were not sitting in a proctored testing room.
    This was a great idea to start but it is getting out of control, just like drug testing in the 80's early 90's. Seemed like a good idea until fly by night testing labs started turning in false positives by the truckload ruining people and their carrers.
    Kids don't need this pressure, Teachers ,maybe, but this is not how to apply it, school adminsitrators definatly need to be held accoutnable. I do not think this is the way to do it though. Ultimatly we do get rid of the incompitents , but we also get rid of the talented teacher. Once the lesson plan is dictated from the state or nations capital the chance for real learning is lost and it just becomes a numbers game. Kids are not numbers, they are potentials and should be treated as such! When we takes steps like these to teach to the lowest common denominator, the brightest of our children are wasted, we need to stop this and start teaching smart.

    --
    Papa Legba come and open the gate
  17. Testing co's proprietary data are student answers by vls · · Score: 3

    Testing companies such as ETS and CTB quickly gain monopoly power because of a simple, but powerful, network externality:

    As the company administers more tests, the company's database of questions and performance gets larger. Also, the company usually includes experimental questions on the same tests as calibrated questions -- giving powerful statistics on how the new questions will perform relative to older questions.

    Thus as the company administers more tests, the company gets a bigger an bigger lead over its competitors. If a school switches testing companies, they won't be able to track trends from one side of the switch to the other.

    Like other network externality markets -- think operating system -- the monopolist's proprietary edge comes not from the originality or sweat of the incumbent, but simply from the size of the adopted user base.

    But the testing market is subtlely different. In testing, much of the proprietary value comes from the answers the students themselves give. Indeed, if the school districts considered these data 'proprietary' then the testing companies might have to 'buy' their monopoly position from the customer.

    But even if the school retained ownership of its own pupils' data, the testing company would still have the power in the relationship. To truly move to 'tester portability' -- and thus competition -- schools would need 1) to be able to retain the actual wording of the tests (and share that with other testing companies) and 2) to be able to insert experimental questions from other testing companies into the testing, to allow for calibration if they were to switch vendors. The only hope I see for such an utter inversion of the relationship would be if districts comprising more than 50 percent of the testing market banded together. Possibly, if the U.S. Department of Education forced a change for a federal program.

  18. Re:This should be interesting by tb3 · · Score: 3
    Yes, you can file lawsuits on software bugs, AECL was sued over the software bug in the THERAC-25 that caused six incidents of injury and/or death. Here's a good write-up.

    If the bug causes sufficient damage or harm, and the company was negligent, then that should be grounds for a lawsuit. (Of course, IANAL, but my sister is.)
    -----------------

    --

    www.lucernesys.comHorizon: Calendar-based personal finance

  19. Re:Auditing. by Ryan_Terry · · Score: 4

    ...very nearly criminally negligent?

    He, that is my vote for understatemnt of the day. Do you have any idea the amount of time/money those students who had to take summer school lost? Add all of those together and this becomes a lot more serious. I think the only thing stopping them from serious legal troubles is the fact that these were high school kids. I'm not saying it is right, but teen-age americans don't get the same rights that their adult counterparts do. If this had been a corporation that screwed up on 40,000 paychecks you better believe ther'd be a legal battle to remember.

    DocWatson

    --
    MessEdUp
    .sig
    #/var/www/v