Slashdot Mirror


Cheating Detector from Georgia Tech

brightboy writes "According to this Yahoo! News article, Georgia Tech has developed and implemented a "cheating detector"; that is, a program which compares students' coding assignments to each other and detects exact matches. This was used for two undergraduate classes: "Introduction to Computing" (required for any student in the College of Computing) and "Object Oriented Programming" (required for Computer Science majors)." Cuz remember programmers: in the real world you are fired if you consult with a co-worker ;)

9 of 941 comments (clear)

  1. Re:I can make one of those by Howie · · Score: 3, Informative

    You failed (ln: File Exists). That would be:

    ln -s /usr/bin/diff cheatingdetector

    --
    "don't fall into the fallacy of believing that Perl can solve social problems. Maybe Perl 6 can, but that's a ways off"
  2. Info on cheating detector by kramer · · Score: 5, Informative

    Some more info on the cheating detector from a Georgia Tech Alum of the CS program.

    1. The cheating detector is not new. It's been in place for years. When I took intro programming in 1994 they mentioned it, and it wasn't new then.

    2. Everybody at Tech knows about it. They tell you about this script the first day of class. Nobody here should be suprised they were caught. The fact that they were caught only shows them to be some of the stupidest people at Tech.

    3. It catches people every term. Usual numbers are below 5% range. The fact that it caught someone isn't news. The fact that it caught 10% of a class is news.

    4. These classes are cake. There is no reason anyone should need to cheat to pass these classes. They are the most basic concepts of programming.

  3. We've been using for 8+ years whats the problem? by Thellan · · Score: 3, Informative

    I just graduated from GaTech in December and I was a Teaching Assistant for the Into to Computing class for 2 and a half years at Tech. The students are told on the first day of class that cheating is not allowed and that if you are caught you will be punished. They are told about the program and whether they believe or not is their problem.

    The students are told it is ok to discuss the homeworks and project with each other and that it is ok to discuss the concepts. However it is NOT ok to copy each other's code.

    The program does not just compare the text of each student's homework which is what some people seem to think it does. The program gets rid of variable names, function names and things like that because a person cheating can simply change those. It compares the style of the code and it is not given common code to look at. The only code checked is the code from problems that generally generate unique solutions.

    In the time I spent there I know of over a hundred cheating cases caught by the program. In some of those cases if you had of given me the 2 pieces of code I never would have said the people were cheating but when asked the students confessed. I have never heard of someone being falsely accused. Most of the time when the 2 cheaters are asked separately they admitt to it.

    Once again, Tech does not have any problem with people helping each other understand concepts like the way pointers or a vector works or the differences between stacks and queues. What they have a problem with is when each studen does not do his own work on an individual homework.

    Eventhough some of the problems may seem not worth it, like writing your own version of strcpy, it is still necessary so that students understand how the library functions work even if they will never be writing library functions in their life.

  4. Re:You're caught by bcrowell · · Score: 4, Informative
    1. The professor should only use it on nontrivial assignments, so this shouldn't be a problem. Any tool can be used incorrectly, and that doesn't mean the tool is bad.
    2. Students tend to believe that there is only one right way to work a problem. In my physics classes, I often have students who try to turn in solutions copied from the solutions I handed out in a previous semester. It's usually painfully obvious. For instance, I teach my students how to do order-of-magnitude estimates, and one of the problems I assign is to estimate the number of blades of grass on a football field. I had a student turn in a copy of my own solution, including all my idiosyncratic assumptions, like assuming exactly the same number of blades of grass per cm2, and assuming the same (wrong) width I had guessed for the field. His work was all in the same order as mine, and laid out on the page with the same indentation, etc. He tried to claim that his answer was just naturally the same as mine, because he got it right.
  5. Re:How exact? by Junta · · Score: 5, Informative

    You would be surprised how well this works in practice, even in intro classes. When I was a freshman taking intro to cs, they used one of these programs and got few false positives. If it matched exactly, down to the variable names, then it would be completely pointless. The one that my college was using back then matched regardless of variable/function names, or any source formatting. Essentially, it examined the overall algorithm and run-time execution paths to determine if there was likely cheating.
    Besides, even if the system turns up a high match between two programs falsely, it is ultimately a human who gets to review the case and make the call, after (presumably) discussing the matter with the student before actually doing anything that would leave a mark on the record.

    And as an answer to the knee-jerk reaction of "that's not how it works in the real world!" I tend to agree, but not completely. As an instructor of mine once said you have to learn to dribble before you can play with other people in a team in Basketball, and as such one needs to develop his or her own personal programming skills independently before he or she may work effectively in teams.

    Of course, some could argue that learning in teams would be more effective and perhaps more useful, but the point is there needs to be a mix of team and independent projects. Without independent projects at all, it is difficult to be sure that everyone is competent to pull their own weight, and part of the role of Universities in the world of business is to certify that a graduate possesses a good skillset, and without both team and individual assignments, this is impossible.

    Of course, as is the case with everything, this doesn't stop cheating. If one collaborates with someone completely unrelated to the class, it can't catch that, but then again, there aren't that many people inclined to work their butt off at no benefit to them just to help some other person get a good grade.... Of course, I have seen the case where a guy goes way out of his way to help a pretty girl, but that is another story entirely...

    --
    XML is like violence. If it doesn't solve the problem, use more.
  6. A better approach by BluesMoon · · Score: 5, Informative

    Ok, check for exact match: diff source1.c source2.c
    great, I just wrote a program to check for exact matches in source code, and it took me three seconds. Maybe I should apply for a patent for my ingenious approach (maybe I'd get it!!)

    At my organisation, (in India) we've been developing something like this for quite some time for our internal tests.

    While most of the work isn't (and probably won't be) publicly released, we can look at a systematic approach to building a better detector.

    1. run indent on all source files to standardise white space usage:
      indent -i8 -kr
    2. Remove excessive white space within statements (students tend to add extra white space:
      sed -e 's,\([^ ^I]\)[ ^I]\+,\1 ,g'
    3. while you're at it, remove blank lines too:
      sed -e 'g/^[ ^I]\+$/d'
    4. Remove repeated lines, or lines that match
      i=i;
    5. Run diff/cmp on the files and check the %ange difference

    You may also want to first strip all #include <> statements (not #include ""), and run the code through the C preprocessor first to take care of #define, and conditional compilation

    There's more obviously that I'm not sharing with you. These are the basics that anyone could figure out in a few minutes - not years.

    --
    Do not underestimate the value of print statements for debugging.
  7. diff don't do it by kippy · · Score: 5, Informative

    First off, diff doesn't work if the kids are smart enough to change their variable names and add spaces here and there.

    I was a grader for the C++ and data structures class back when i was in school. And I saw my share of cheating. One instance that stands out is when a bunch of kids had variables called "dude" and "funtime". Problem was, they had enough differences elsewhere in their code, that an automated diff wouldn't have worked. For a while, I was going to write some fancy perl that would look for certain cheating patterns that I was seeing, but then I got lazy.

    One deeper way to check for cheating is to pass code through the front end of a compiler and check what comes out. if there are too many simmilarities, they will stand out even if kids change paramater names and the like.

    Finaly consider this: Checking for cheaters in a class isn't just doing a diff of two files. For every student in the class, you have to check his code against everyone else's. This is a O(n^2)problem. My class had around 350 people in it
    so that's 122500 checks to do. If it is anything more complex than a diff (multiple files, compiler front-end, fancy perl parcing) this can take a mad amount of computing.

  8. This is old by drix · · Score: 3, Informative

    We have had this in the UC Berkeley computer science department for some time now. IIRC it's been quite effective; when it was first unveiled it nailed many, many students for cheating (I think). The verdict amongst students is, if you're good enough to defeat the cheating detector then doing the assignment on your own should be no problem anyways.

    --

    I think there is a world market for maybe five personal web logs.
  9. Head TA Elaborates by Arkhan · · Score: 5, Informative

    As a former Head TA for one of the classes in question (CS 1502 - Intro to Computing), I'll try to elaborate and answer common questions.

    No, I have no current affiliation with Georgia Tech.

    Yes, the cheatfinder really, really, honest-to-God exists. We used it every quarter that I was associated with the class and caught _lots_ of people. You'd be stunned how many people thought we were just making it up to scare them into not cheating.

    Yes, it actually works. It examines mostly source code, although some versions of it were twiddled to look at "in-between" assembler to help catch those who just change variable names and such. It scans for patterns in the logical constructs of code blocks, even if they've been rearranged or altered in other "cosmetic" ways. It also looks for exact matches in text (like the "commas in same places" mentioned by Kurt in the article), but this is misleading -- it does a whole lot more than that.

    Yes, depending on how you run it, it can generate a boatload of false positives, but it contains several tweakable threshold levels that let you control how "suspicious" a pair-match has to be before it gets flagged, and these thresholds are made looser for simple programs where there's really only one way to do it.

    No, no action is *ever* taken based on the output of the cheatfinder directly. It merely alerts the TA who's responsible for cheatfinder that quarter and he/she then manually reads the source code to see if it looks like a case of cheating. If so, it gets sent on to the professor for a final verification (and possible discussion with the student if it is a borderline case), before being forwarded to Kurt for examination and possible disciplinary action.

    Finally, yes, it's an old and very "evolved" codebase. You wouldn't want to be the one to maintain it, but on the other hand, it has been tweaked to the point where you'd be really surprised at the sort of clever cheating it can detect. (i.e. it works a lot better than diffing the source code ;)

    Anyway, figured I should throw in my $0.02 on this one, since I used to run that class.

    If anybody has any specific questions, please post to this comment and I'll reply. (Questions from current Tech students asking how to "get around" the cheatfinder will be happily ignored, of course. ;)