Cheating Detector from Georgia Tech
brightboy writes "According to this Yahoo! News article, Georgia Tech has developed and implemented a "cheating detector"; that is, a program which compares students' coding assignments to each other and detects exact matches. This was used for two undergraduate classes: "Introduction to Computing" (required for any student in the College of Computing) and "Object Oriented Programming" (required for Computer Science majors)." Cuz
remember programmers: in the real world you are fired if you consult
with a co-worker ;)
Some more info on the cheating detector from a Georgia Tech Alum of the CS program.
1. The cheating detector is not new. It's been in place for years. When I took intro programming in 1994 they mentioned it, and it wasn't new then.
2. Everybody at Tech knows about it. They tell you about this script the first day of class. Nobody here should be suprised they were caught. The fact that they were caught only shows them to be some of the stupidest people at Tech.
3. It catches people every term. Usual numbers are below 5% range. The fact that it caught someone isn't news. The fact that it caught 10% of a class is news.
4. These classes are cake. There is no reason anyone should need to cheat to pass these classes. They are the most basic concepts of programming.
Find free books.
You would be surprised how well this works in practice, even in intro classes. When I was a freshman taking intro to cs, they used one of these programs and got few false positives. If it matched exactly, down to the variable names, then it would be completely pointless. The one that my college was using back then matched regardless of variable/function names, or any source formatting. Essentially, it examined the overall algorithm and run-time execution paths to determine if there was likely cheating.
Besides, even if the system turns up a high match between two programs falsely, it is ultimately a human who gets to review the case and make the call, after (presumably) discussing the matter with the student before actually doing anything that would leave a mark on the record.
And as an answer to the knee-jerk reaction of "that's not how it works in the real world!" I tend to agree, but not completely. As an instructor of mine once said you have to learn to dribble before you can play with other people in a team in Basketball, and as such one needs to develop his or her own personal programming skills independently before he or she may work effectively in teams.
Of course, some could argue that learning in teams would be more effective and perhaps more useful, but the point is there needs to be a mix of team and independent projects. Without independent projects at all, it is difficult to be sure that everyone is competent to pull their own weight, and part of the role of Universities in the world of business is to certify that a graduate possesses a good skillset, and without both team and individual assignments, this is impossible.
Of course, as is the case with everything, this doesn't stop cheating. If one collaborates with someone completely unrelated to the class, it can't catch that, but then again, there aren't that many people inclined to work their butt off at no benefit to them just to help some other person get a good grade.... Of course, I have seen the case where a guy goes way out of his way to help a pretty girl, but that is another story entirely...
XML is like violence. If it doesn't solve the problem, use more.
Ok, check for exact match: diff source1.c source2.c
great, I just wrote a program to check for exact matches in source code, and it took me three seconds. Maybe I should apply for a patent for my ingenious approach (maybe I'd get it!!)
At my organisation, (in India) we've been developing something like this for quite some time for our internal tests.
While most of the work isn't (and probably won't be) publicly released, we can look at a systematic approach to building a better detector.
indent -i8 -kr
sed -e 's,\([^ ^I]\)[ ^I]\+,\1
sed -e 'g/^[ ^I]\+$/d'
i=i;
You may also want to first strip all #include <> statements (not #include ""), and run the code through the C preprocessor first to take care of #define, and conditional compilation
There's more obviously that I'm not sharing with you. These are the basics that anyone could figure out in a few minutes - not years.
Do not underestimate the value of print statements for debugging.
First off, diff doesn't work if the kids are smart enough to change their variable names and add spaces here and there.
I was a grader for the C++ and data structures class back when i was in school. And I saw my share of cheating. One instance that stands out is when a bunch of kids had variables called "dude" and "funtime". Problem was, they had enough differences elsewhere in their code, that an automated diff wouldn't have worked. For a while, I was going to write some fancy perl that would look for certain cheating patterns that I was seeing, but then I got lazy.
One deeper way to check for cheating is to pass code through the front end of a compiler and check what comes out. if there are too many simmilarities, they will stand out even if kids change paramater names and the like.
Finaly consider this: Checking for cheaters in a class isn't just doing a diff of two files. For every student in the class, you have to check his code against everyone else's. This is a O(n^2)problem. My class had around 350 people in it
so that's 122500 checks to do. If it is anything more complex than a diff (multiple files, compiler front-end, fancy perl parcing) this can take a mad amount of computing.
Blaze a trail to the New World
As a former Head TA for one of the classes in question (CS 1502 - Intro to Computing), I'll try to elaborate and answer common questions.
;)
;)
No, I have no current affiliation with Georgia Tech.
Yes, the cheatfinder really, really, honest-to-God exists. We used it every quarter that I was associated with the class and caught _lots_ of people. You'd be stunned how many people thought we were just making it up to scare them into not cheating.
Yes, it actually works. It examines mostly source code, although some versions of it were twiddled to look at "in-between" assembler to help catch those who just change variable names and such. It scans for patterns in the logical constructs of code blocks, even if they've been rearranged or altered in other "cosmetic" ways. It also looks for exact matches in text (like the "commas in same places" mentioned by Kurt in the article), but this is misleading -- it does a whole lot more than that.
Yes, depending on how you run it, it can generate a boatload of false positives, but it contains several tweakable threshold levels that let you control how "suspicious" a pair-match has to be before it gets flagged, and these thresholds are made looser for simple programs where there's really only one way to do it.
No, no action is *ever* taken based on the output of the cheatfinder directly. It merely alerts the TA who's responsible for cheatfinder that quarter and he/she then manually reads the source code to see if it looks like a case of cheating. If so, it gets sent on to the professor for a final verification (and possible discussion with the student if it is a borderline case), before being forwarded to Kurt for examination and possible disciplinary action.
Finally, yes, it's an old and very "evolved" codebase. You wouldn't want to be the one to maintain it, but on the other hand, it has been tweaked to the point where you'd be really surprised at the sort of clever cheating it can detect. (i.e. it works a lot better than diffing the source code
Anyway, figured I should throw in my $0.02 on this one, since I used to run that class.
If anybody has any specific questions, please post to this comment and I'll reply. (Questions from current Tech students asking how to "get around" the cheatfinder will be happily ignored, of course.