Slashdot Mirror


Closed-Source Tests

The NYTimes has a lengthy expose of the actions of a company that creates and administers standardized tests, one destined for RISKS Digest very shortly. A bug in their software sent students to summer school and resulted in teachers and superintendents being fired from their jobs, even though the company was notified of problems early. It's a fascinating story of the risks of going with a closed source vendor - how the company acts to perform damage control, lies, stalls, compartmentalizes the damage by telling each complainer that they are the only one experiencing problems, and finally, most of a year after being notified of the problem, fixes the bug. (It's a two-part series - the first part discusses problems with human scoring of tests.)

63 of 122 comments (clear)

  1. Districts use poorer schools to punish TEACHERS. by Anonymous Coward · · Score: 2
    In any large school district that has both poorer and well-to-do schools, where do the newly hired teachers get assigned? Answer: The poorer schools. The poorer schools are also where teachers get transferred to when they're "disciplined" for something but not able to be fired for whatever reason.

    The crime free schools in the clean white neighborhoods are considered "sweet jobs" where teachers with "a lot of seniority" get to work at.

    Is it any wonder that poorer schools do miserable on standardized tests? They're treated like jails, not only from the student side BUT FROM THE TEACHING STAFF SIDE as well. Ever see the James Belushi movie, "The Principal"? There's some truth to that.

    This problem is not unique to schools, either. New cops get put on the worst beats too.

  2. "But Johnny... by Anonymous Coward · · Score: 4

    ...what are these pencil-scrawled changes on your report card?" "Corrections. Just a software bug. Will you just sign it already?"

  3. Re:you think that's bad? by Chris+Johnson · · Score: 2

    Could still be worse. Fast forward to 'Gattaca' or some similar dystopia and the kid could be painlessly destroyed- or subjected to a course of medication to correct defectives. One where there is risk of death or permanent brain damage, but hey, if the subject is already defective what's the diff?

  4. Re:sad... by Danse · · Score: 2

    Never said anything about them "b[ing] criativ indevijuals an fele GUD aboute themsalvs" being the only thing that matters. There will always be a certain amount of memorization that is necessary. I'm just saying that memorization isn't gonna do them a damn bit of good if you haven't taught them to think properly for themselves as well. Something that seems to be completely overlooked as teachers scramble to drill facts into the kids so they can pass the tests.

    --
    It's not enough to bash in heads, you've got to bash in minds. - Captain Hammer
  5. Re:Tip of the Iceburg by Gregg+M · · Score: 2

    So what's your solution? Non-standardized tests? Have every school district make up their own "special" test? Why do you consider this testing any more a "trivia quiz" the normal test teachers make up?

    The point of the article was that New York incorrectly based their cutoff scores for summer school on the test. The people at CTB/McGraw-Hill told them *not* to. Sounds like your example of students zoning out after the SOL is the same problem, but it has nothing to do with the test.

    Your rant about treating kids like potentials is heartwarming for a picket sign, but if you can't give an example how to change things, it's just hot wind. These tests try to *raise* the lowest common denominator. The brightest kids do fine. What's wrong with that?

    I'm sure that you have years of experience in testing science so tell me what type of testing is used for kids who think and solve problems? What's your idea? What type of test do you recommend?

    --
    Linux is only free if your time has no value. Windows is only free if you threaten to use Linux.
  6. Because they don't have to. by bcboy · · Score: 2

    I know several people who have worked for standardized test manufacturers (there's a big one locally). They all left because the entire process was such a crock. From the way the tests are created, to the way the results are interpreted, it's all a pile of assumptions and conjectures.

    And the company was held to no standard whatever by any of their clients (various state school districts). Because of the "high standards!" craze sweeping the nation, they have guaranteed fat money coming down the pipe. They would ship tests that weren't normed correctly, bill clients extra hours to fund parties, etc., with no thought that maybe the money would dry up. It won't, as long as this political craziness continues.

    We can all speculate about where their corporate campaign contributions were going.

  7. Pareto analysis of answers. by bstadil · · Score: 3

    The story has an example that the Answersheet had 6 wrong out of 68 questions. As incompetent as this seems its mindbuggling that they do not pareto the answers and flags where the "right" answer is not the highest ranked. Much better strategies can be devised but we are not talking rocket science here. Letting the schools have access to the data coupled with a few lines of perl would fix thes kinds of problems.

    --
    Help fight continental drift.
  8. Human error by AftanGustur · · Score: 2


    The problem was a design/logical/programming error.

    The error had - erroneously - made the current test appear easier than the previous year's. To make the tests equal in difficulty, the computer had then compensated by making it harder for some students to do as well as they had last time. The error did not change students' right and wrong answers, but it did affect their comparative percentile scores.

    And regarding the 'disclosure of questions', I don't think we're asking for to much if we want the questions/ansvers one year after the testing, it may be to late for correcting the damage, but at least the testing company will know that it's work will be doublechecked.

    This process is necessary so scores one year can be compared with those from previous years, even if different questions are used. States ask for new questions because they are worried the old questions will leak out.

    Also, there is no such thing as a "computer error", not any more than there is a "pencil error" when writing.
    --
    echo '[q]sa[ln0=aln80~Psnlbx]16isb15CB32EF3AF9C0E5D7272 C3AF4F2snlbxq'|dc

    --
    echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
  9. just desserts by trb · · Score: 2
    CTB's error hit hardest in New York City, the nation's largest school system. Apart from the children, the most prominent victim may have been the city's schools chancellor, Rudy Crew. The error showed - incorrectly - that reading scores citywide had stagnated after rising for two years, raising questions about Dr. Crew's leadership. Within months, he was out of a job.

    Before the mistake was discovered, Dr. Crew had been a leading advocate for using standardized tests to hold students and educators accountable.

    In the immortal words of WS Gilbert, the punishment fits the crime!
  10. "Obviates"? by VValdo · · Score: 3

    I know this is one of those big words everyone likes to use on slashdot (along with "obfuscate" and a few others), but since this is a thread on education, I hope you'll forgive me.

    -----
    obviate
    v. tr. obviated, obviating, obviates.

    To anticipate and dispose of effectively; render unnecessary.
    -----

    Maybe you meant "illustrates" or "highlights" or "illuminates"?

    W
    -------------------

    --
    -------------------
    This is my SIG. There are many like it, but this one is mine.
  11. Re:it's only good if you read it by clifyt · · Score: 2

    Gorilla - Adaptive tests don't work that way. ya get one right, they give ya a harder question, get it wrong and it gets a little easier...til the computer can figure out what ya know and don't know. If ya are at the top of the test because ya just happened to get the right questions then there is something wrong with the test as it should be punishing ya with harder and harder and harder q's. Yeah, this is all simplified how it works (ya get into baysian logic and stuff) but it its essentially how it works. It aint all about random questions...

    Heh! This is already day old news, but I figure folks like me will read back to see if anyone posted any response a few days later :)

    clif

  12. Re:it's only good if you read it by clifyt · · Score: 5

    From what I understand, Opensourcing the thing wouldn't have done a damn thing.

    In my day job, I am Manager of Development at the Indiana University - Purdue Universities Testing Center. I've read quite a bit on this and have evaluated these guys software and didn't care mch for it (could be that my own software comes up with higher predictors than theirs and was much more flexible). With adaptive testing like their own (and this is all in laymens terms lest one of the wanna be psychometricists wants to correct me), ya build the item database, calibrate it, evaluate it and then calibrate it some more. Real testing may be going on in all this time, but even static items will be somewhat liquid in their numbers over years times.

    Unfortunately, companies like this like to change as many questions each year as possible. Doing this means that you will have better test security, but your items may not have all the correct weighting behind them. How does one Open Source this without loosing all data ya need to make this stuff adaptive. With standard testing, ya may ask 200 questions and a lot of times you are simply measuring a persons ability to do lots of work in a set amount of time. Adaptive testing uses a lot of calculations to figure out what ya know and what ya don't know. Instead of the 200 questions, ya might get 20 (or less on some of the new standardized ones) that you are free to take as long as you'd like.

    If the person taking this knew even a few of the questions they got before hand, this would throw off the entire test. If you don't think folks cheat on these types of tests, you are an idiot. There are school systems that have gotten ahold of written tests and drilled their students on the exact questions presented. On the high stakes testing, we find folks that will go to such lenghts as to take the test on the east coast in the morning under ficticious names, fly across the country to California and retake the afternoon test. There was a case where Law Students were memorizing one question each and as soon as they were outta the test, would cell phone in the questons, and someone would be selling code keyed pencils (we have one of these :) with different patterns from the different version tests (I think a set of these put the students back about $5k each).

    Anywho, no amount of Open Sourcing would have helped. Bad Software wasn't written, a bad analysis of the data was probably done. OS is not the answer to everything in life...

    Clif Marsiglio
    HTTP://ASSESSMENT.IUPUI.EDU

  13. Re:Decentralize by Detritus · · Score: 2

    It would also give the local school authorities an opportunity to cook the results. With so much riding on the results of the tests, the temptation to alter the results would be severe.

    --
    Mea navis aericumbens anguillis abundat
  14. Cross-Checking by Detritus · · Score: 3

    Why didn't the company validate the tests and the scoring process before releasing them to the school systems? There can be errors in the code, requirements and statistical models and techniques. They could have given the new tests to a sample of students along with reference tests that are never widely distributed. A comparison of the scores on the two tests should uncover any major errors.

    --
    Mea navis aericumbens anguillis abundat
  15. High Stakes tests are the problem. Open or Closed by PotatoHead · · Score: 3

    What is needed here is a peer process. Teachers get certified about what is important and what is not. They they evaluate the student as part of their classwork, and input to the class.

    Personality conflicts can cause problems with this, but if there is an appeals process of some kind, most of this can be worked out.

    Tests do not even begin to reveal a students achivements in school, or their worth to society. Peer review does.

    Imagine the students testing themselves. They know the requirements, let them work toward them. I am not saying let the students choose if they pass or fail, but make them involved in the process so they understand it, and can help each other.

    I know what I did in school. All of the really good stuff that mattered was not on the tests. It was the projects I did, and the papers I wrote, and the arguments I had with staff and friends.

    Of all the classes I have to say that Music and Drama were the most interesting from a testing point of view. These classes are peer reviewed by their nature. How do you know you are doing well? Do others say so? Did your performance at the play get some applause? Your teacher is a mentor in these sort of things. They take what is there and improve it. You will get an 'A' anyway, so why work hard at all? If things work the way they are supposed to in school, the teacher gets you motivated, and sets direction, your peers give you someone to work with and achieve goals and share success. Standardized tests totally ruin all of this.

    Point is simple. Teachers know the students best. Most of them actually care even if they are underpaid. Let them make choices, and help to form good citizens. Taking all the hard work, and boiling it down to one test is stupid. Even a genius will have a bad day. Should the rest of their life be changed because of it?

    Don't think so.

  16. User Group Needed by DonK · · Score: 2

    The simple-minded solution would be to have a 'user group' - i.e., a group formed of a representative educator from each state where the exams are given. Such a group should quickly discover such a nation-wide pattern. Yet, apparently the people in Tennessee and in New York weren't talking. Should one conclude the educators themselves were part of the coverup, trying to conceal their states' performance? Or are they just so parochial that they do not notice what is happening in other states?

  17. There is a closed source solution by Felinoid · · Score: 2

    People forget how the market works when so much of the market isn't working.

    When a companys product is known to be defective you stop using the product.
    Companys usually respond becouse eventually they will be cought.
    The software industry ignores this rule becouse so often after being cought in lie the company in question continues to get away with it. Something is broken.

    Open source and Linux are solutions in that they provide an alternitive that isn't crushed by monopolist tactics. Not becouse open source produces a better product every time but that right now close source is producing a poor product and an alternitive is nessisary.

    Open source produces quality even when users have no alternitive. Open source isn't dependent on market demands so it isn't effected by them. If it didn't have another way to produce quality none would happen.

    Closed produces quality as a direct result of compeditiveness. Plain and simple. They are not the only kids on the block and if they don't provide the quality software tools users demand the users will get them elsewhere.

    With open source if the software tools the users demand are missing some users add them to the software. Then we all have them.

    Open source dosn't produce better tools in a healthy market. But when the market is poisoned the quality of open source remains strong.

    I'm not saying open source dosn't produce quality software. It dose. It just dosn't need market demand to make it happen.

    If all were healthy Linux and Windows wouldn't be very diffrent in quality.

    Microsoft is cought up in this game of ignoring consummer complaints. Usually companys die when they do that. It's not normal.

    I'm sure in the future we'll run into some grand examples of poiosned open source work. Right now however the poison is over at Microsoft.

    --
    I don't actually exist.
  18. Re:If this was anything like what I just experienc by gmhowell · · Score: 2

    While I don't think I'm yet quite old enough to not want to deal with tests (I'm only 28!) I do remember some poorly written tests in grad school. So I made every multiple guess test a short answer type test. Luckily, my classes were small enough that my profs would look at this.

    My first child is due to be born any day now. Idiocy like this might make him home-schooled (much as I dread that). My wife largely lost her teaching job due to not teaching to the test (PG County MD if that means anything) and now works at a private company teaching kids to write and do math that they don't learn at the regular schools (too busy teaching to the test). But, to bring up home-schooling again, she's also teaching reading, writing, math to kids who have parents who are totally unqualified to home school anything, yet think they are.

    I need to find a Jedi and just let my kid become a padawan...

    --
    Jesus was all right but his disciples were thick and ordinary. -John Lennon
  19. Re:Closed Source? by gorilla · · Score: 2

    No. Even if you have source code, and recompile the source, then you do not know what the object code being run will be. See Reflections on Trusting Trust, Ken Thompson, Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763.

  20. Re:it's only good if you read it by gorilla · · Score: 2
    Adaptive testing uses a lot of calculations to figure out what ya know and what ya don't know. Instead of the 200 questions, ya might get 20 (or less on some of the new standardized ones) that you are free to take as long as you'd like.

    I think we should be asking bigger questions. Is this form of testing worthwhile? To my mind, a heck of a lot of these are simply tests of how good you are at taking exams. If you happen to be the sort of person who can memorize large amounts of information and retrieve it in the time period you'll do well. Reducing the number of questions asked makes it more and more a instance of luck instead of skill. If the 20 questions asked happened to be ones you know, then you score brilliantly. On the other hand, if they happen to be questions you don't know, then you're stuck. This is acceptable on "Who wants to be a millionare", when nothing of consequence is at stake, but it's not acceptable in things which will affect a person's whole life. Unfortunatly, those exams which are useful at telling the true skills of a student, eg open book essay questions, are expensive to take & grade, and therefore not the sort which are being forced upon us.

  21. Re:it's only good if you read it by gorilla · · Score: 2

    But harder is not absolute. It depends on your knowledge mapping to the question. If the subject was geography, and they asked you which river runs through Paris, and you happened to take your last vacation in Paris, you'd have a good chance of getting the right answer. Does this mean that you are good at geography? No, if they'd asked you the same question about Glasgow, you'd have had no clue, so the Paris question is for you, much easier than the Glasgow one. For someone who happened to go to Scotland on vacation, the opposite would be true. With 20 questions or less, then it would not take a lot of co-incidence for the questions to happen to map really well, or really badly, onto your knowledge.

  22. Re:Closed Source? by ahunter · · Score: 5
    Sheesh, read the article. The very first paragraph states that someone outside the company had found out about the problem and notified the company, who promptly sat on it until it went bad.


    A school district might not be able to justify the money to check a system, but I suspect it could not justify using a system with known errors and would have an interest in getting it fixed.

  23. i knew it by TomL · · Score: 3

    i always thought that errors in standardized test scoring put me in a talented and gifted class (i got kicked out of it after two years due to bad academic performance, thank god, heh).

  24. Re:Auditing. by signe · · Score: 2

    While I agree with the idea that someone should be made responsible for this, exactly who should shoulder what part of the blame?

    On the one hand, we have the testing agency. They had problems with their software, lax quality control, and a PHB who withheld information from the schools. They scored the tests incorrectly (with regards to rankings and previous years), and as a result a lot of people got fired and a lot of kids spent a summer in school needlessly.

    On the other hand, we have the NYC school board. They made the decision to make the standardized test the end-all, be-all, despite the testing company's recommendations to the contrary. Even if the tests were scored and ranked properly, you have the opportunity for kids who test badly to end up in summer school.

    I think that it was "right" for Crew to lose his job, but not for the reasons that were given. Whether there were more people in the school system that needed to share the culpability will probably never be known. However I think that the testing company got off far too easily. And unfortunately, if any of the faculty in NYC that were fired because of the test results tried to sue them, they'd likely weasel out of it by saying "We didn't tell them to use this as a yardstick!"

    -Todd

    ---

    --
    "The details of my life are quite inconsequential..."
  25. Inside opinion by BMonger · · Score: 2

    Please keep in mind that I'm not trying to shift blame either way.

    I work for one of these companies. So just for some "inside info" here's the lowdown.

    As far as the QA process goes it is pretty good. In fact many states QA after we QA to double check. The fact is is that things are missed. You wouldn't believe the data that comes through the systems and while everything is supposed to be accounted for it sometimes isn't. I have to say that the testing process is extensive. Often times though the customer demands that the final product be delievered even when we know that the product has not been tested.

    As far as Open Source code, to the best of my knowledge there is no problem with a customer looking at our code. Do they have the monetary resources to hire somebody to look at it? Will that person interpret the specifications correctly? No they don't have the money and it takes a while to learn the process correctly.

    I have never scored an essay in my life but I can see where it would get terribly monotonous. Add on top of that supervisers pushing supervisors pushing little people and the monotonous job becomes a fast paced let's get it out the door deal.

    I do have a problem with schools using these tests as a high stakes test where students graduate/not graduate or similar repricussions. Just as anything created by man there is possibility of error. Just think of how many various things have been recalled. Heck, yesterday I saw a recall at a store for a Halloween Costume because it was flammable. It's May folks. Halloween was October. But stuff like this happens be in consumer goods, software, or whatever.

    I do think the tests are worthwhile (mildly) in comparison to other students, and to see if the student has improved. I do not think they should be used to decide a students fate by any means.

    I dunno... I don't feel like typing anymore 'cause I could go on and on... anyhow just in case you weren't aware these opinions are my own and not the opinion of the company I work for. If you have any questions feel free to ask them and I'll try to answer them (in my opinion and with the information I have without giving away "those things you're not supposed to when you work for a company").

  26. Legal Team on the way... by Kefaa · · Score: 2

    I see it now, the company claiming their EULA protects them from anything their software might or might not do. The school system saying it is the companies fault and everybody in court. The students who may lose scholarships, parents who now have to pay because of it, teachers and staff that lost their positions, and of course the townspeople of any school district that is somehow in a legal battle.

    In abstract it falls under the "okay we need to do this better next time.." however real people had a real impact on their lives. How many teachers would have a hard time getting hired AND be able to prove it? Like a claim of "improper behavior," this is not going away even if the screw up is completely resolved. The next school will wonder, are they a screw up or were they screwed? Better to fall on the side of "screw up" at least I won't be help responsible later.

  27. Re:it's only good if you read it by Amokscience · · Score: 2

    This story reminds me of the Therac-25 X Ray machine accidents (do a google search). Of course, instead of getting test scores mixed up, people died.

    Poor QA and safety analysis. Trust in a product that was supposed to be fullproof. Lengthy process to fix the problem and recover damages.

    Frankly I think the problem is more that the company behavior was unethical and immoral. An all too common problem when dealing with money and people.

    --
    Fsck cluebie moderators. I'll say what I want, offtopic or not. And fsck having to qualify every bloody statement just
  28. you think that's bad? by ryantate · · Score: 2

    Boy takes two years of special ed after getting bum I.Q. results

    "Mr. Kumpula hired a psychologist to review the original test results and found the boy's IQ was significantly higher than calculated. On a revised version of the same IQ test, the boy scored 99, which is within the average range of intelligence. Normal intelligence is between 85 and 110, plus or minus five, the statement of claim said.

    "Edmonton lawyer Scott Schlosser, who is acting for the Kumpula family, said the boy has subsequently scored 113, which is considered high average. "
    ...

    "The board has denied the allegations, none of which have so far been proved."

    (found this via the excellent Obsucre Store)

  29. Re:Closed Source? by crucini · · Score: 2

    Actually, that raises an interesting point which might apply to other areas like Carnivore. Is it possible to construct a computer which is trusted by two adversarial parties? In other words, the computer can prove to everyone that it is running the same code it claims to be running.
    This could easily be done with the assistance of a trusted party, which would modify the kernel. Doing it fully 'peer to peer' sounds hard, though.

  30. Re:Closed answer tests also a problem by crucini · · Score: 2

    I haven't thought this position through, but I increasingly believe that the 'closed answers' regime is a deeply unfair one, relying as it does on 'security through obscurity.' It creates a small minority of test-takers who know all the (previously asked) questions and answers. I think that this group exists for every sizeable test of this type.
    I wonder if this could be solved with parametrically generated questions - the question itself is merely a framework or script which when run (on a per-test-taker basis) creates a unique question instance.
    Under this scheme, the parametric templates would be released to the students, and would in fact constitute the definitive curriculum.

  31. That's interesting, but... by crucini · · Score: 2

    Just because the scanning has been centralized and industrialized on a massive scale doesn't mean it should be. Instead of shipping tons of paper to a central location to put it through a very expensive, fast machine, why not scan it locally through a cheap scanner? Then a computer can upload the data to a central server.
    Another approach would be to use fax machines calling a tally server, since sheet-feeding fax machines are more common than sheet-feeding scanners.

    To put it another way, the power of the large scanner has become a self-fulfilling prophecy - we aggregate because we have large scanners, and need large scanners because we aggegate. But with the internet, physical aggregation is not needed for data aggregation.

  32. Hmm, I wonder... by 11thangel · · Score: 4

    Could the same bug have resulted in my Computer Programming teacher being hired? I still find it quite odd that she is teaching a top level programming class, yet doesnt understand what a function is. I still recall trying to explain the word "filesystem" to her. Hopefully, the same bug will assist my english grade (hey, it couldnt get worse).

    --

    I am !amused.
    1. Re:Hmm, I wonder... by big.ears · · Score: 2
      This is probably a really old problem--I experienced it way back in 1990, when I was taught programming by the typing teacher. At that time, rumor had it that the class wasn't going to be taught again, because new regulations required that someone teaching a programming class must be certified with a masters degree. There are very few masters-degreed computer geeks who would be willing to work for $17,000, not to mention going back to a place that most of them were ridiculed and mocked at. To top it off, to teach high school, you also usually need to be certified, which eliminated the possibility of hiring "adjunct" faculty like they do at colleges.

      My question is, which would you prefer: a lame computer class that provides some structure and a potential for learning new things, or no class at all? In today's public school system, those appear to be the choices. The truth is, nobody really learns how to program in a class anyway--they learn how to program by programming, often as a requirement for getting a grade in a class, but frequently not.

  33. Auditing. by kezdeth · · Score: 2

    In my view, this obviates the need for better auditing of the process. If you're going to use a piece of software for anything this important, then you need to be certain that it will work correctly. IMHO, the publisher was very nearly criminally negligent on this issue.

    'Course, I also think that this is yet another good example of the need for open source software, but that's me.

    --
    Kez
    1. Re:Auditing. by Ryan_Terry · · Score: 4

      ...very nearly criminally negligent?

      He, that is my vote for understatemnt of the day. Do you have any idea the amount of time/money those students who had to take summer school lost? Add all of those together and this becomes a lot more serious. I think the only thing stopping them from serious legal troubles is the fact that these were high school kids. I'm not saying it is right, but teen-age americans don't get the same rights that their adult counterparts do. If this had been a corporation that screwed up on 40,000 paychecks you better believe ther'd be a legal battle to remember.

      DocWatson

      --
      MessEdUp
      .sig
      #/var/www/v
  34. Broken questions. by FTL · · Score: 2
    Some of my favourite memories of high school and university were spotting errors in exam questions. The best one was in a SmallTalk exam where they gave us half a page of code, then spent several pages asking what outputs would be generated based on various inputs. I spotted that there was a period missing (same effect as a missing semicolon in most other languages). So I simply wrote "parse error, line 18" as an answer to each question.

    The problem was that we never got to examine the marked exams, so to this day I don't know if a clueless droid marked me as wrong, or had a good laugh and granted me the marks.
    --

    --
    Slashdot monitor for your Mozilla sidebar or Active Desktop.
  35. Systems without Oversight by AMuse · · Score: 4

    I say that this is just symptomatic of a much bigger problem in the first place: Computer systems not having the proper amount of human oversight.

    Credit bureaus rely on their use of the computing systems for pretty much everything, and look how hard it is to get any error fixed at all.

    This could just as easily have been a private prison company (Which most all prisons in CA are) accidentally sending your traffic-ticket offender to a high security felon bin for 20 years instead of a 6 month stint for not paying their bills.
    ------------------------------------------ --------

  36. A bad answer key should have been detected. by Animats · · Score: 2
    Proper scoring software should detect a bad answer key.

    A basic concept of standardized test design is that each question should have some degree of predictive power for the final result. A bad question or a wrong answer key should make that question show up as a terrible predictor. Test scoring software should be checking this.

    Informally, one way to look at this is to look at the exams for students who score high and see what questions many of them seemed to get wrong. Questions consistently missed by high scorers are in some way defective. This is basic test quality control.

    New York State regulates tests by law. There's a Ttruth in testing" law. Test-takers have the right to see their questions, answers, and the correct answers after the test. So something like this should have been noticed.

  37. Re:Tip of the Iceburg by SuiteSisterMary · · Score: 2

    Given the fact that I count three spelling mistakes in the title and first paragraph of your post, let alone the rest, perhaps these tests are not such a bad idea....

    --
    Vintage computer games and RPG books available. Email me if you're interested.
  38. They have a bad reputation on other matters by richmaine · · Score: 2

    One reason some school districts like the Terra Nova-based tests is that it is pretty dumbed down.
    An acquantaince of ours had a highly gifted kid that they were trying to get a grade skip for. She got a shockingly poor (for her) score of 70 percentile in one section on one of the Terra Nova tests and this was being used by the school district as an argument that she wasn't advanced enough for a skip. After getting the raw scores (which there was a lot of resistance to giving, but our acquantance is...tenacious), they found out that she had gotten every question in the section correct. But so had 30% of the students taking it, so that was a 70 percentile. On some of the other standardized tests, the report has a special notation if you "topped out" the test; the
    Terra Nova test didn't show that. But most shocking was that 30% of the students got every quetsion in the section right. That indicates an awfully dumbed down test.

  39. Closed Source? by faust2097 · · Score: 3

    This has a lot less to do with closed source than it does with quality control in general. Just because someting's OSS doesn't mean anyone else has actually looked at the code. School districts aren't know for having budgets for consultants to check it out...

  40. Another UCITA and clickwrap issue by www.sorehands.com · · Score: 3
    If you look at the clickwrap and the UCITA, these damages from known bugs are limited.

    Think of the graduating senior who now has to go to summer school instead of working for tuition money. This could delay their degree by a year. All caused by a known bug!

    Should the software company be held harmless?

  41. Wake up and smell the FUD by Chester+K · · Score: 2

    It's a fascinating story of the risks of going with a closed source vendor

    Sorry, but no, it's not. All software has bugs. Open Source or Closed Source.

    "Whine whine, but if it was open source, they'd have found the bug and fixed it!"

    Uh huh. Someone tell me again how long that root compromise was in BIND before it was discovered?

    Is someone going to tell me with a straight face that there's a community of developers just salivating to pour over the source code to standardized testing software? The benefits of Open Source become less prevalent as the number of developers with an interest in the code goes down.

    --

    NO CARRIER
  42. my $.02 by Windjammer · · Score: 2

    OK I work for a prometric testing center. I am also a teacher. What I noticed (this was the first and last time I gave one) with the scantron sheets is that when I did a glance over of all of the tests, that I had at least a 35% error rate on them. Mind you that the machines used to score the tests are a heck of a lot higher quality. (I finally ended up hand scoring the things). Also working at the prometric system, I can't answer to the actual scoring, but one thing that is nice, is when the hardware works (as with anything the computerized testing does have its shares of glitches in it). But there is one major major major major major advantage that I can see that most schools/testing agencies have to deal with that computerized testing doesn't--the problem of scanning tests to be entered on the computer--as the answers are already in the computer. I also know as a teacher, if I screwed up on a test and graded it incorrectly my *** would be hung out to dry, and I'd have to answer to quite a few angry parents. This is also because the kids would have a chance to review the questions. Here in lies the problem. I know for a fact that some of the testing agencies value their questions at more than $600.00 per question. Heck I had to sign I don't know how many waivers just to take a computerized GRE, and then I had to take the thing at a specific time of the month, when ETS was apparently going to rotate their questions again. This is the main reason why that parent had to threaten a lawsuit to view the test questions on the test. He was lucky that they did so to begin with, as this was not the only time that wierd things have happened with standardized tests. If you check out the book about Escalante, he had a heck of a battle on his hands with ETS's AP Calc test. At least those kids got a second shot at the test.....

    --
    What? Me worry? NEVER.....
  43. Re:Open source across the board! by Misch · · Score: 2

    free as in "time"

    Nope... this would fall under the "Free as in beer" criteria. Why? 1: "Time is money." 2: "Money buys beer." So, "Free as in time", and "Free as in Money" equate to "Free as in beer."

    --

    --You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs
  44. it's only good if you read it by Frymaster · · Score: 4
    really, the problem here is that
    a) bad software was written
    b) it was closed source.
    an open source solution would only address this problem if the purchasers (ie the school administration) actually sat down and audited the code. not very likely, imho. not that i'm dissing open source... far be it from me to do that, but oss is only as good as the people willing to audit it. being in the process of writing an online test generator i can tell you that teachers and admins look at oss the same way they do proprietary software... the only difference is that its good for their budget and doesn't come in a box...

    nuff said.

  45. Closed answer tests also a problem by Phronesis · · Score: 5
    If you read part 1 of the NYT story, you find out that many more students were penalized by incorrect answer keys than by computer errors. Thus, if we are to trumpet open-source as the appropriate way to deal with risks of errors in the algorithm for normalizing percentiles for test difficulty, then do we also conclude that all answers must be revealed so that erroneous answer keys can be caught?

    Perhaps. I always give my students answer keys after I give them a test. But for big tests such as are described in the story, I would worry that having to change all the questions several times every year would introduce so much opportunity for poorly worded, misleading, or fundamentally flawed questions that this risk would outweigh the current risk of incorrect answer keys.

    There is also the cost question. If companies had to rewrite the whole test every time they administered it (because they would publish the answers), then the costs would rise sharply and the tests would become less affordable to struggling school districts (or we would see large tax increases to pay for the tests). The benefits of increased test accuracy might not justify the cost to the taxpayers.

    None of these considerations apply to the case of the software used to grade the tests. There seems to be little risk that understanding the way the grades are curved would enable cheats other gamesmanship. Since this is not networked software, the potential for attacking it from another computer is small (social engineering is still a risk, though, but no more so for open source than for closed-source software).

    Thus, although it would be tempting to put "Open Answer" on par with "Open Source," the first seems impractical and the latter seems well suited to the problem at hand.

  46. Open source across the board! by swinge · · Score: 2

    yes, it's not just errors in closed-source grading that's sending some kids to summer school. have you considered the impact of closed-source testing? if we open-sourced the exams themselves, that would elimate more errors than any other measure! not to mention the freedom it would engender: free as in "time".

  47. Academic reponse by Fros1y · · Score: 3

    What I found so horrifing about this situation was the response of the people responsible for the NYC catastrophe. It would seem that those in power were far more concerned with the politics of their job to even think about worrying about the children and subordinates that the test was slowly crushing. Perhaps the most brazen of this behavior was the administrators decision not to speak out about his concern because of the fear he had about his reputation.

    If we are ever to have upstanding and capable students who know not only logical though but also ethical beliefs about participation in society it is precisely this sort of leadership that we can do without. I'm very sorry that the tragedy of these tests hurt so many children and some many competant superintendants, but was this really one of them?

  48. Re:Tip of the Iceburg by Papa+Legba · · Score: 2

    Well, I will definatly watch my spelling on this reply!

    I don't recomend any testing, I recomend more dynamic funding (note I did not say MORE MONEY, just better directed were it goes). Vouchers look like a good route to me, you fill your school with know nothing teachers and inept administrators and the people will leave and take their money with them. I feel that a lot of the poor performance on the part of the school system is that they are a monopoly that must be payed. Even if you put your child in a private school or home school them you must still pay the taxes that fund the school system.

    As long as administrators and teachers get a free ride then they are not going to want to make things better. The only thing that is going to help the kids is more directed funding allowing a school to offer better classes with smaller groups better sorted by ability. It is just as much of an injustice to force a smart student to attend a remedial class as to force a not so bright student to take advanced classes. Both will start to fail. The smart one out of boredom and the less bright one out of lack of ability.

    More money won't help the problem, the best funded public schools in the nation are amongst the poorest performers. Until their is compition in the market place we will continue to see this lack of education.

    To break this down to it's points : Vouchers good, Testing Bad!

    --
    Papa Legba come and open the gate
  49. Tip of the Iceburg by Papa+Legba · · Score: 4

    This is going to just get worse for kids. With press. Bush pushing to make standerdized testing nation wide this will become more and more common of an occurance.
    I live in Virginia, a state that implimented Standereds Of Learning tests (SOLs) years ago, The absolute paranoia that surrounds a test that "was just going to be used for monitorring purposes" is astounding. The schools have stopped pretending that they are teaching knowledge and instead spend all of their class time cramming facts down the students throats so they can pass the trivia quiz of the SOLs. br> It has gotten so bad that a local city has asked to extend the school year for kids just to prepare for these test. Once the test are out of the way the kids spend three weeks until the end of the year loafing in class as the teachers have no reason to give them a final, they already had it and passed their SOL.Just and example of how the schools are warping to fit around the SOLs , soon they will be the official final. This is the only outcome you can expect when a teacher and administrators depends on their raises based on how their school district does on the SOLs and their jobs depend on how well their own classes do on these tests. School should teach kids how to think and solve problems, not how to regurgitate facts at the drop of a hat, facts that can be easily found in a book or on the web if you were not sitting in a proctored testing room.
    This was a great idea to start but it is getting out of control, just like drug testing in the 80's early 90's. Seemed like a good idea until fly by night testing labs started turning in false positives by the truckload ruining people and their carrers.
    Kids don't need this pressure, Teachers ,maybe, but this is not how to apply it, school adminsitrators definatly need to be held accoutnable. I do not think this is the way to do it though. Ultimatly we do get rid of the incompitents , but we also get rid of the talented teacher. Once the lesson plan is dictated from the state or nations capital the chance for real learning is lost and it just becomes a numbers game. Kids are not numbers, they are potentials and should be treated as such! When we takes steps like these to teach to the lowest common denominator, the brightest of our children are wasted, we need to stop this and start teaching smart.

    --
    Papa Legba come and open the gate
    1. Re:Tip of the Iceburg by Water+Paradox · · Score: 2

      Given the fact that I count one attitude flaw in your post, I'm still referring to the original which has several salient points.

      --
      information is immaterial
  50. Open source alternative? by ColdGrits · · Score: 2

    I find it interesting that although there are a large number of posts slagging off "closed" source, and saying how this problem would have never got as bad with "open" source, not one single post has been able to point to a suitable piece of open source code which they could have used instead.

    CAN anyone point to an existing open source alternative that meets all the necessary criteria?

    Because if not, then I'm afraid discussion of whether or not the problem would have been as big with open source is irrelevant if there IS no such open source product to begin with.

    Yes, "they" could have written software themselves, but do you really think they had the necessary expertise and time to spend writing such a product from scratch? The answer is "no", otherwise they would have.

    Just something to consider.

    --

    --
    People should not be afraid of their governments - Governments should be afraid of their people.
  51. Measure twice, cut once. by 5KVGhost · · Score: 2
    "If you read part 1 of the NYT story, you find out that many more students were penalized by incorrect answer keys than by computer errors. Thus, if we are to trumpet open-source as the appropriate way to deal with risks of errors in the algorithm for normalizing percentiles for test difficulty, then do we also conclude that all answers must be revealed so that erroneous answer keys can be caught?"

    Hmmm. It's not like the answers are unfathomable mysteries revealed only by divine inspiration. Presumably any given answer on these tests could be easily verified by someone who actually knows the topic being tested. Just like that father in the NYT story did. But look at all the nonsense that father had to go through to even find out what the heck his own kid did wrong.

    With a machine graded test, why not prepare two or more seperate answer keys independently? Making an proper answer key can't be _that_ hard. Run the tests through once with each key, compare the bulk results, and any problems like this immediately become glaringly obvious. I think a test that determines a student's entire academic future deserves a little simple error checking.

    The larger problem is that the testing companies have no independent oversight. They should be required to place their answer key and a copy of every question into the hands of a neutral party who can check these things if there's a dispute. (That's not necessarily some random government agency, as the gov't also has a pretty dismal track record as far as customer service and full discosure are concerned.) How about requiring the different major testing companies to hold and audit each other's tests? Competition in action!

    -Bryan

  52. Closed Minds, Not Closed Source by Foggy+Tristan · · Score: 2

    The factors that led to this seem to revolve around closed-minded thinking: Dr. Crew refused to believe the scores could be wrong, despite a dramatic difference. Mr. Tangentt (sorry if I get the name spelled wrong here) refused to admit there might be a problem, and apparently still does.

    Customers not knowing anything about other customers also excarbated the situation. Had the customers been able to communicate to one another, they may have discovered the commonality of what they were seeing and had it resolved much sooner.

    --
    Beware typoes.
    1. Re:Closed Minds, Not Closed Source by Angel+of+Legaia · · Score: 2

      And the students are the ones who ultimately pay the price. The administrators may get some bad press, but in the end, students are the ones who feel the crunch from this fiasco. Why? Because low scores on standardized tests mean less money and poorer teachers for that school. By the time the school board realizes the students may have done okay on the tests, it will be too late.

      Whatever happened to the days when standardized tests were graded by a machine that checked for markings on the answer sheets? What was wrong with that system? It tended to be fairly accurate.

      Just my opinion; I could be wrong.

      Phoenix

      --
      I've actually come to love Hanson and here's why: These kids are a giant rehab festival just *waiting* to happen!
  53. If this was anything like what I just experienced by trentfoley · · Score: 2

    The consulting company I applied to just made me take an online apptitude test concerning J2EE. First of all, I hate tests. I'm too old for anything harder than "Millionaire". But this one really sucked. Not only were some of the possible answers ambiguous, it suffered from all of the same things that email and websites do -- poor spelling, improper punctuation, and bad grammar. That is arguably not a horrible thing when writing in English; but, when writing in a programming language, especially when that language is case-sensitive, it is absolutely unacceptable. I did not answer two of the questions because of the typo's in the code that would have prevented compilation (javax.swing.Jframe, for example).I won't have to take summer school or anything like that, but it might cost me a dollar or two an hour.

  54. closed v. open? by room101 · · Score: 2

    This is interesting, but is it really a strict comparison between closed and open source?

    This is just an example of bad business practices. In fact, I think that any customer should expect better than this. Not to mention that doing this sort of thing is a huge legal liability, thus not done by many companies, either by the feeling of "fair-play" or just cya.

    Just because an external firm reported the bug, it doesn't follow that only a closed source company would be able to bury/hide it. Perhaps the chances are lower that an open source product would get away with it, but it isn't a foregone conclusion that by definition, an open source product couldn't/wouldn't do the same thing.

    --
    room101 -- how much can you stand before they break you?
    (they always break you eventually)
  55. Wait till we have elections run bythese same Bozos by funwithBSD · · Score: 2

    You think this is bad?
    Wait till some similar process screws an election up.
    Oh, wait....

    --
    Never answer an anonymous letter. - Yogi Berra
  56. Testing co's proprietary data are student answers by vls · · Score: 3

    Testing companies such as ETS and CTB quickly gain monopoly power because of a simple, but powerful, network externality:

    As the company administers more tests, the company's database of questions and performance gets larger. Also, the company usually includes experimental questions on the same tests as calibrated questions -- giving powerful statistics on how the new questions will perform relative to older questions.

    Thus as the company administers more tests, the company gets a bigger an bigger lead over its competitors. If a school switches testing companies, they won't be able to track trends from one side of the switch to the other.

    Like other network externality markets -- think operating system -- the monopolist's proprietary edge comes not from the originality or sweat of the incumbent, but simply from the size of the adopted user base.

    But the testing market is subtlely different. In testing, much of the proprietary value comes from the answers the students themselves give. Indeed, if the school districts considered these data 'proprietary' then the testing companies might have to 'buy' their monopoly position from the customer.

    But even if the school retained ownership of its own pupils' data, the testing company would still have the power in the relationship. To truly move to 'tester portability' -- and thus competition -- schools would need 1) to be able to retain the actual wording of the tests (and share that with other testing companies) and 2) to be able to insert experimental questions from other testing companies into the testing, to allow for calibration if they were to switch vendors. The only hope I see for such an utter inversion of the relationship would be if districts comprising more than 50 percent of the testing market banded together. Possibly, if the U.S. Department of Education forced a change for a federal program.

  57. Re:This should be interesting by tb3 · · Score: 2

    Two other interesting points: there was no mention of lawsuits (and you know there's gotta be some brewing somewhere), and the woman in charge of the department now works for the company that administers the SATs. If I was a college-bound senior, I'd be real worried right now.
    -----------------

    --

    www.lucernesys.comHorizon: Calendar-based personal finance

  58. Re:This should be interesting by tb3 · · Score: 3
    Yes, you can file lawsuits on software bugs, AECL was sued over the software bug in the THERAC-25 that caused six incidents of injury and/or death. Here's a good write-up.

    If the bug causes sufficient damage or harm, and the company was negligent, then that should be grounds for a lawsuit. (Of course, IANAL, but my sister is.)
    -----------------

    --

    www.lucernesys.comHorizon: Calendar-based personal finance

  59. Cautionary tale. by 137 · · Score: 2
    Where I live (Michigan), we have statewide proficiency tests that actually are corrected by our teachers. While that's nice (the teacher-graders, not the tests themselves), it's not a solution for standardized tests, where finding the results isn't as simple as #correct/#total=%ile. There's a whole cadre of statistical manipulations the raw results go through, and we only have to look at the article to see how dangerous some of them are:

    Then CTB did something that it would not do in any other state: it simply raised the comparative rankings of many Tennessee students, and lowered some others, to conform with Mr. Sanders's statistical models - even though the company could find no error to justify those changes.

    My, my, my... adjusting results to jive well with a statistical model, are we? That's some quality data, there. That such manipulation is possible is a little chilling, but this article seems to suggest it is commonplace. School districts are making decisions based on test data that's been twisted and pulled like Silly Putty.

    Seems to me that the real reason we have standardized tests is to cast the legitimizing shadow of external validity on some fundamentally meaningless numbers so that we can claim those numbers constitute a meaningful measure. We need something quantifiable to judge our students against, after all, and if there isn't a good yardstick available we'll use a crappy one even though we know it's crap, because we certainly need something to show the parents.

    Don't get me wrong -- I strongly believe that we need to monitor our educators and drum out the bad ones. We don't tolerate bad surgeons; we shouldn't tolerate bad teachers. But articles like this make it abundantly clear that standardized tests, while appearing to offer an easy gauge of student performance, are all surface and no substance.