Hacker Exposes Evidence of Widespread Grade Tampering In India
Okian Warrior writes "Hackaday has a fascinating story about Indian college student Debarghya Das: 'The ISC national examination, taken by 65,000 12th graders in India, is vitally important for each student's future: a few points determines which university will accept you and which will reject you. One of [Debraghya]'s friends asked if it was possible to see ISC grades before they were posted. [Debraghya] was able to download the exam records of nearly every student that took the test. Looking at the data, he also found evidence these grades were changed on a massive scale."
Sometimes you have to do the needful to get into the school you want.
This would be true in the US and the UK, and India doesn't even match up to those "high" standards. He'll be in jail because someone with power will be embarrassed by this.
More for the discussion of statistics than for the really sad excuse for security on those pages..
He tried to kill me with a forklift!
this is the type of coding that you get in India stuff done on the cheap and likely to coded to spec with no thinking about how bad of a idea this is.
Have you seen the curves? They don't even approach a poisson distribution.
The test results were manipulated. There are missing scores (from 1-100) on a test taken by 150,000 students. That is not possible. They have been bumped up to passing. The graphs show jagged peaks separated by gaps rather than a curve. Unless his data is incomplete or has been manipulate, there is no reasonable explanation for the jagged charts.
Nothing I hear about education fraud in India surprises me since one of my Indian coworkers explained how people "buy" degrees from Indian universities.
University employees can be bribed to create the records for an entire curriculum, spanning multiple years of attendance. This record is indistinguishable from a valid one and generates a real diploma. The University will confirm education because "it's in the system".
I think he said it cost about $3000 USD or so for a Masters degree.
According to my attorney (a former IT person who went to law school), that qualifies as hacking.
He was helping me with a child custody issue, but he had a case where a woman was accused of hacking. He said clearly she couldn't do it as she could barely use a webbrowser and she was accused of a fairly sophisticated attack. He was thinking about using me as an expert witnesss, so we got talking about the subject. He said he'd obviously argue it wasn't if he was the defense attorney, but that case law present was changing GET parameters qualifies as hacking.
That truly scared me.
Not all his observations. The notable lack of scores leading up to the pass point and the sudden spike at that exact point are particularly notable.
In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
I thought so too, but the problem is when you overlay the various tests for different subjects, they all show the same missing points. Standardizing different tests (in different subjects) would not produce identical gaps when overlaid unless all 150,000 students performed exactly the same for each subject – which is just not believable.
"There are missing scores (from 1-100) "
Without knowing how many questions are given in each section, and how they're scored, that's not possible to say. The set of possible scores doesn't necessarily include every value from 1-100.
If there are 30 questions in a section, and it's scored on a straight percentage basis, you're going to see discrete peaks every 3.33%, and nothing in between. Gosh, just like on the graphs.
That doesn't explain the odd overall distributions, however.
"National Security is the chief cause of national insecurity." - Celine's First Law
It definitely does not represent standardization to a score of 100. It's not an even distribution of peaks. It is pushed up above the failing mark, and there is no gap from 94-100. Furthermore, all the different tests in different subjects show the same gaps. This is not reasonable at all.
What does "ls -l" do? Please describe below.
That kind of thing. So, I'm not surprised if institutions are manipulating test scores. India is more about the perception of computer savvy developers than the reality of it.
No sigs in BETA. Beta SUCKS.
The Indian system of education doesn't work like that. Here's a post I made on another forum: You can theoretically attain all marks in the 0-100 range because there is no scaling up. Each paper has components that together total upto a 100. For example, there could be 10 1-mark questions, 15 2-mark questions, 4 3-mark questions, 3 4-mark questions and 6 6-mark questions. Each question can be graded to a fraction of it's worth. So you can get 1.5 on a 2-mark question, 0.5 on a 3-mark question, etc. Thus theoretically, all possible combinations of scores are possible. The absence of certain scores is evidence of tampering. SOURCE: I appeared for the CBSE exams last year. The system is similar, though not the same.
The author answers your objections. First, the missing values didn't have consistent intervals (it wasn't always every 3 points). Second, the grades from 32 to 34 didn't appear in the data. That gap seems unusual. Third, there weren't gaps from 94% to 100%, so it's known to be possible to attain percentages that aren't divisible by three, for example.
There is nothing in the article that indicates caste has anything to do with it. Most of the discussion suggested that the cause may have been to "bump" almost-passing grades to passing grades (and presumably other achievement tiers as well).
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
So let's say that some numbers are "missing." Why would someone manipulate the exact same numbers to be missing across all of the exams? I mean, I could see bumping a 32, 33, or 34 (non-passing) up to a 35 to have pity on some poor schmuck who came really close to passing, but why would, say, someone change a 93? I mean, not just for one student, but all the way across the board? What possible motivation could someone have to say "That's got to be either a 92 or a 94, we can't have any 93s"?
I'm inclined to believe what the poster above said. They're simply rounding numbers based on the number of questions on the test to some nearby value in a way such that not necessarily every integer between 1 and 100 is represented. In other words, if there are 40 questions on the test, you'll have scores of 3 (rounded from 2.5), 5, 8 (rounded from 7.5), 10, etc. You will never have a score of 76 or 94 or 61. I strongly suspect that if he knew exactly how the test was scored, the "missing numbers" explanation would be pretty obvious.
Back in late 2009 and early 2010 I was scraping jail inmate registry records for Scott and Dakota County, MN. This was simply a script which incremented the ID numbers by one several times a day and put them out into a CSV. I uploaded these to Google Docs and had Docs Widgets build simple charts based on those data for a rolling ~6 month window of inmates.
As I started looking deeper into the data I started noticing I had ages lower than 18. Odd I thought but sure enough, Scott County was including their juvenile records in the data mixed with the adults even though it wasn't shown on their public website.
I contacted the County and they fixed the bug (you can read about that here: http://www.lazylightning.org/scott-county-quickly-fixes-juvenile-jail-roster-issue) but I was still surprised at the relative lack of security for juvenile records:
It's surprising how lax security is anywhere and to the poster elsewhere in this thread that said this is what you get when you outsource to India, this particular web stuff was not performed with outsourced talent so that comment was nothing short of asinine.
"Hacked" means "retrieved from a web server in the way they were intended to be retrieved." The fact the webserver was completely unsecured is, however, worrying.
"Widespread grade tampering" means "statistical evidence that the final grades are not the raw grades, but have been adjusted according to some system as yet unidentified." The nature of the adjustment is as yet unidentified - it could be nefarious, or is much more likely to be according to policy. Pretty much every school system in existence does this.
So the headline should really read, "Student stumbles across results on unsecured website and doesn't understand the grading system." It's not really news.
Slashdot - News for Nerds, Stuff that Matters, in ISO-8859-1 Has just realised that beta makes this signature redundant
1. Teachers have to ensure that their class marks have a certain average and median before they submit them. There can't be too many failures either.
2. Teachers know not to give a grade of 49 if the pass is 50 since the student will argue to get that missing point. If you want to be safer, just don't give out anything in the forties.
3. If a test gives letter grades, that equates to a particular number. A = 85, A- = 83, and so on. In that case, no one gets an 84, ever.
"We are here on Earth to fart around. Don't let anybody tell you any different!" -- Kurt Vonnegut
Are you trying to mock educational standards by pretending to be someone who failed statistics?
Poisson distributions have to do with frequency of repeatable events over time. You meant Gaussian or Normal distribution.
Cheating and corruption in *India*?! No. Fucking. Way! I expect nothing less in the rape capital of the world. P.s. my wife is indian and I have first experience with how corrupt and vile that country is. From cops, to repairmen to government officials.
You need to read TFA http://deedy.quora.com/Hacking-into-the-Indian-Education-System that should give you an idea of what the person in the article talks about with tampering data. Even with 1 question asked in the test, the score range should not be this ugly or the evaluation/grading method is not up to par. TLDR summary, it is statistically impossible to miss that "many" score points between 1~100 from this size of data.
On a side note, I am not sure whether the person is going to jail... I hope there won't be "mysteriously missing or injured" person because India culture is not a western culture...
There are ranges where every integer is represented, other ranges where every other one is missing.
The real smoking gun is that several grades just below a passing grade appear to be promoted up to pass.
If you recognize that your evaluation system only has an accuracy of +/- 3% it does make some sense to bump up those below the passing grade by that much to the level of the passing grade. It also saves a whole lot of resources by not having to field requests for regrades and reevaluations from all of those students who are just barely below the cutoff.
When your tools are imperfect (and they all are), there is no absolutely "fair" way of dividing a large group into two mutually exclusive categories. You might be able to say with high confidence "Those who scored a 60 or above know their stuff" and "Those who scored 40 or below do not know enough", but the ones closer to the cutoff are much harder to judge with confidence.
Politically it is easier to bump up the marginal ones. People well below the cutoff line generally do not ask for special treatment as they know they did very poorly, and those who got bumped up won't complain about it because they benefited and they don't even know that they got special treatment. As always, it sucks to be just shy of the "new" line, but if you don't know that the line is there, it doesn't hurt as much.
Technically, in a caste system, you're not allowed to move up except in very narrow circumstances. You're not actually allowed to move at all - up or down. You can be the most brilliant person on the planet, but if you were born to an untouchable in India, well, no one would listen to you.
More likely though, it would be done by people from higher castes because they have a certain image to maintain.
Remember, in Asia, this all derived from the old school British system where exams basically set you on your path through life - basically the final exams at the end of high school was The Final Exam(tm). Score well, and you'd go to university. Score not-so-well, you got to a second-rate college. Score less and you're a lowly tradesperson. Score even worse and you're an unskilled labourer.
So in general, it's an extremely high-stress period where teens would basically be locked in their rooms spending all the time studying because it really is it - no chance to take it over (well, I suppose there are certain humanitarian reasons they allow), and it basically determines your future.
Likewise, for anything with this much pressure on it, people succumb to the human condition - suicide is common, both before and after the exame. Cheating is as well - and many elaborate cheating machines have been conjured up over the years - this isn't your own hide-a-cheat-sheet scale - this is full on tiny 2-way radios and other mechanisms. And of course, hacking of grades to improve one's score.
Interestingly, I think in China one district is forcing all test-takers through a very sensitive metal detector and forcing them to strip - just one step below forcing test-takers to be stark naked during testing. The metal detector is extremely sensitive and basically won't allow anything metal in.
That's how serious the test is, and how serious everyone takes it.
For all its flaws, the modern American system is generally better and more "available" (and even the modern British education system isn't as strict). I'm not entirely sure that letting one test determine your future is entirely wise, and it's one reason why a lot of students travel abroad to study. Some do it because they scored well and got prestigious international study scholarships from their country, but others do it because they couldn't get in, and studying abroad is an option for those that do not pass.
You're lucky that they responded appropriately by calling you and fixing the problem.
The usual response is to accuse you of being a terrorist/hacker/anarchist/etc. and try to put you in jail.
I don't read your sig. Why are you reading mine?
For those who dont read TFA ... why is there an expectation that the results should have been secured ? The results are posted on dead tree on all school notice boards. You could go around each of those school, and gathered the same data. ...I dont given an eff
1. Kid figures out query params and post fileds in http
2. Kid mines data from a public web server to get publicly available information.
3. Kid "analyzes" data statiscally, finds a pattern to grading
4. Kid dubs it tampering. (Tampering would be if the evaluators grading were to be replaced with something else. )
5. Tech dumb media latches onto the story, makes a celebrity out of a kid scraping data off a website.
6. Education agency is pissed off for really no fault of theirs. I mean
Where is the effing breach Potential Consequences:
Agency lodges police complaint based on media reports (India has overbroad cyber crime laws, people have been arrested for making anti gov remarks on facebook)
Kid gets arrested when he land in India in the summer vacation
Kid asked to surrender passport till the court decides on the case
Case drags for years
Kid screwed
Slashdot : news from half assed unverified sources, stuff that
Why don't you just read the fucking article instead of trying to come up with your own wackjob explanation? He quite clearly explains it:
One of the most common critiques of my theory was this - maybe there were questions with only 3 or 4 mark intervals in all subjects making certain marks mathematically unattainable. My counterargument? All numbers from 94 to 100 are attainable and have been attained. What does this mean? It means that increments of 1 to 6 are attainable. By extension, all numbers from 0 to 100 are achievable.
Let me give you an example. If 99 and 98 were definitely achievable with deductions of 1 and 2 respectively, this means one of two cases - there is a question A worth 1 mark that made 99 occur, and a question B worth 2 maks that made 98 occur, which meant getting A and B both wrong would mean 97 could occur. Case 2 - Question A was worth 1 mark, and question B was worth 1 mark too. The 99 got A wrong, and the 98 got A and B wrong. By this logic, if 97 were not possible, it would mean that there is no other question of 1 mark in the examination or that nobody got a 2 point question wrong and question A or B.
Basically, because 99, 98 and 97 were all attained, then any increment of 1, 2 or 3 points should be possible. The fact that nobody got 80% in any subject in the entire country points to widespread tampering.
Help I am stuck in a signature factory!
The examples in parent post are wrong.
"Breaking and entering" requires physical trespass. There is no trespass involved when using the GET method, which is part of a standard and open protocol, to request a web page, which in this case is unencrypted and easily read by anyone who asks for it.
The "bait car" analogy fails miserably. There is no property theft involved in what was described by TFA since nobody was deprived of use of anything. In the general case, "intellectual property" is not physical property and courts need to recognize the differences.
If anyone needs a physical analog of what this fellow has done, it is like this:
Imagine that for reasons unknown, the New York City Board of Education recorded the student ids and test scores as graffiti on all the park benches in Central Park. Where any passer-by could read them. Each student was directed to the bench where their data was recorded (in indelible magic marker), and the BoE patted itself on the back for having found a way to make use of all those benches. Then this guy comes along and develops an efficient way to go from bench to bench to bench... Data on the Internet, accessible without any protection to anyone who had or could construct the URL, is as freely available as any graffiti written on a park bench.
Questions should begin with why the India agency responsible for handling this data put up these web pages without involving anyone who had a year or more of training in information management techniques. They certainly had persons on staff who would have avoided making the JavaScript so readily accessible, and there should have been some kind of password scheme so that only the student would be able to access his own scores. Why were their in house experts not involved? It is as if those who were delegated to build the web site did not want to involve anyone who knew enough about data management that they would become suspicious about it being manipulated.
I think there is more than enough evidence here that something is very corrupt in the India education system. Even if the data obtained had not been so obviously altered, the grossly amateur handling of highly personal information stinks to high heaven.
Will
Kinda like yours, except that you likely know even less about the test than he does.
My conclusion was that they rounded the grades to certain points. I'm not sure where he got the inference of malice or tampering, other than bumping failing grades up, which isn't exactly malicious (though probably unfair).
Never attribute to malice that which can be explained by stupidity... or policy.
Also, I give this guy a couple of days, a week max, before he's in jail for quite a while.
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Poisson distributions are found over non-time intervals as well.
.: Semper Absurda
If this had happened in the usa
Something very similar to this did happen in the USA, from some time in the 1980s until around 1995. It involved a government forestry agency, and the database they had to track logging, replanting, spraying, road building, and other commercial forest management activities.
I became involved about 1993 when I was hired by an eco-activist group who had used FOIA to obtain a digital copy of a detail report of the entire forestry database for the region. My task was to develop one-off perl scripts to extract the data from the report format and build a Paradox database that could be queried to see if the forestry records indicated any violations of the laws to protect spotted owl habitat. This was straightforward work: as I recall the hardest part was staying awake when doing the validation cross-checking. (I also dislike reconciling my checking account with the bank statement.)
But what I discovered was that the forestry database was full of crap. You cannot harvest a 20 year old stand of timber from a parcel that had been clear cut just three years earlier; you cannot harvest anything from a parcel before the access road to it is completed. A big portion of the database lacked self-consistency. Years later, I learned that the consultant that the forestry agency had hired to develop and maintain the database had been convicted of fraud, and that there had been a shake-up in the management of that agency. (Since the database records were crap, the eco-activists chose not use it in their spotted owl fight. Instead a new, and appropriate, attack on the managerial competency of the forestry agency was launched, I believe by persuading one of the State Representatives to demand an investigation.)
I do not think that computer fraud on this scale is likely to happen in the USA now, because I think every manager of any kind of any large government database is well aware that he needs to cover his ass by having his stuff validated by Information Management. However the news indicates this kind of fraud is happening in some small towns, and some of the smaller departments of cities-- places where there is still no easy access to information management professionals, where decisions involving database management have to be made by persons without a background in the subject.
Will
He's at Cornell University, that doesn't discount the possibility of jail time but it does pretty much eliminate the rendition aspect (he didn't piss of the US government afterall).
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Correct; typical example could come from counting, then plotting, discrete data. Number of children in a family, doors on a car...
Note that whilst you might expect a normal distribution, with events (exam results) distributed evenly but randomly about the mean, the fact the the guy found something that certainly looks non-normal, (he did not do normality tests, but having looked at his results, I don't think he needed to), does not itself prove that the results were altered.
Imagine a 'perfect' exam, where the expected (average) result for the student population was 50 out of 100, or 50%
Now imagine an (equally unlikely) 'perfect' candidate population.
If you plotted the exam results, you could expect the population to be centered on a mean result of 50, with half the scores higher, half lower.
If you had a (really getting unlikely now) 'perfect' education system, there would be a low standard deviation in your data, let's say 2%
If the results could be modelled with the Gauss curve, then 99.73% of your distribution would be at +/- 3 sigma (standard deviations) from the mean.
So lowest expected score of 50-2*3=46, with highest of 56.
Of course, candidate abilities could be much more varied than this, so sigma could be anything...5%, 10%
Anyway, getting to the point, if the mean of a what you *might* be expecting to be a Gauss / Normal curve is shifted sufficiently towards a 'hard' limit, (in our example, you cannot score less than 0%, or more than 100%, so both are 'hard' limits, or 'boundaries'), then the data (example results) do tend naturally to 'pile up' against the limit. (Think of a snow plough pushing snow aganist a wall - it's go nowhere to go, except up).
Thus you get a non-normal distribution, (typically better modelled with a lognormal or Weibull curve, not Poisson).
But WHAT can cause the mean to shift? For this example:
- Either the exam is "too easy", or
- The students are all very good (yeah, same thing,really), or
- The marking system is biased.
I'll leave you to draw your own conclusions on that one, but I've personally found that in India, (as in other places, including the USA), a little cash can go a long way...
But that was not the most compelling evidence of bias; that would be the very strange 'missing' data points, (especially close to critical scores such as the 35 pass. /endoldstatsbore
I don't think it's fair to blame the British for all of that. China has had a stringent civil service exam tradition, for instance, for 1,300 years.
Possibilities:
- There is a national cheating conspiracy ...or....
- The test score is not based on assigning a value to each question and adding up those values.
For example, the test could simply be scored as such:
All answers correct: Score 100
Miss one question: Score 99
Miss two questions: 98
Three questions: 97
Four: 96
Five: 94
Six: 92
etc etc
Miss 20 questions: 35
Miss 21 questions: 31
etc etc.
The author makes the ASSUMPTION that the score of the test must be the sum of the value of the questions answered correctly. There is no basis for that assumption. The fact that certain values are not present, and the values 34, 33 and 32 are not present, are likely by design (i.e. don't make people feel like they just missed passing.)
All the author has shown is that India is apparently doing a very poor job teaching critical thinking skills (as evidenced by the author's inability to exercise critical thinking skills.)
paintball