Cheating Detector from Georgia Tech

Erm. by Dr.+Sp0ng · 2002-01-16 07:41 · Score: 5, Interesting

This is new? They used something like this when I was at the University of Maryland a few years ago. And it did more than just check for exact matches, it compared parse trees and so on to check for similar program structure (any matches were, of course, double-checked by a human before ringing the cheating bell). It caught quite a few people I knew.

I remember when my school did this... by gergi · 2002-01-16 07:44 · Score: 5, Interesting

A few years ago, when I was a 2nd or 3rd year at Virginia Tech, some professor implemented a cheating detector into the automated grader for a class called Intro to C++.
Prior to that year, VT had an average of 75 cheating violations for the WHOLE university (25000+ students). For that one class, on one assignment, 150 students were found cheating by the cheating detector... out of the 500 or so students in the class.

Funny as hell

--
Nosce te Ipsum

Re:I remember when my school did this... by deander2 · 2002-01-16 08:27 · Score: 3, Interesting

Yep, I was there that year. It was Dr. Walker who implemented the system. Walker is a fast-moving but excellent programming teacher. That one class did more for the quality and structure of my programming than anything else I have done either before or since. (I'm now leading a team of programmers for the DOD doing a enterprise level java application, btw.)

--
http://kered.org

Re:Real-world vs. school by susano_otter · 2002-01-16 07:49 · Score: 3, Interesting

Good point. I know a 4th year student who doesn't know C, but watch out his Counter-Strike skills are amazing.

Sigh... when will schools implement the other kind of cheating detector?

--

Any sufficiently well-organized community is indistinguishable from Government.

Cheating by Carnage4Life · 2002-01-16 07:50 · Score: 5, Interesting

CmdrTaco says:
Cuz remember programmers: in the real world you are fired if you consult with a co-worker ;)

As someone who TAed classes at GA Tech, I take a lot of offense at this comment. There is a difference between working as a team on project based classes (of which GA Tech has a good number off including classes where we got to hack the Linux kernel and another where we got to deliver a product to a customer) once you've shown you understand the basics of programming and wholesale copying of other people's work in entry level classes where you are supposed to be learning to program on your own.

Beginning programmers need to learn how to program, find information from MAN pages & API docs, and come up with solutions on their own before being introduced into team based environments. If not they never learn how to be self sufficient or even if they are cut out for programming at all.

It is true that in the real world no man is an island but on the flip side, how many people have worked with co-workers who completely clueless about how to perform their jobs but held degrees or certifications that implied they shoould be knowledgeable about programming? These are the kind of people who hid behind the work of others in team based projects and submitted others work on individual projects.

Slashdot Boggles Me Again... by jmaslak · 2002-01-16 07:57 · Score: 5, Interesting

The responses here, at least the ones along the lines of "But collaboration is allowed in the real world" sicken me. I would (and HAVE) fired programmers who couldn't program simple stuff on thier own. The collaboration in industry is not anywhere near the level of syntax and elementry algorithm design.

A University degree is supposed to signify that you demonstrated knowledge in certain areas.

Cheating is not demonstrating knowledge.

Undergraduate level programming assignments do not require even consultation with other students, IMHO. They are too simple. If you can't code an undergraduate programming project without extensive "consulting", then you can't program. Period.

I am sickened by the number of people with CS degrees only because of "teamwork" and "consulting". I would guess, from my experience, 95% of people with CS degrees can't write a sort routine. Widespread use of these kinds of programs might fix some of this. As would harsher grading. In the real world, you don't get partial credit for a program that only dumps core or doesn't meet any of the design objectives. (in my opinion, any program which doesn't properly run a set of tests, provided to the students in the project instructions, should receive an "F" grade)

No wonder the software industry is such a mess. I've seem CS *GRADUATE* students who couldn't use malloc(). Note that I did not say "who use malloc() wrong - no, these students could not even figure out how to call malloc() nor explain what it did. There's something strange happening (I call it cheating) when someone can graduate with a CS degree yet never use dynamic memory allocation knowingly...

More Info by pmcneill · 2002-01-16 08:01 · Score: 5, Interesting

Here's some more info, from the perspective of a former TA (once for one of the classes in question). First, everyone at GaTech is required to take the first CS class, not just CS majors (== people in the CoC). Second, GaTech doesn't restrict collaboration in all classes. The first tier of classes are strictly individual so everyone has to be in front of the computer. In the second tier, CS2130 - Languages and Translation explicitly allows colloboration as long as people turn in their own code. Going further, later classes involve heavy amounts of group work.

With regards to the cheater-detecter program (called 'cheatfinder'), it's significantly more complicated than diff(1). It involves checking the structure of the code (ignoring variable names , indentation, and whatnot). Admittedly, I've never seen the source for it (very few people have), but it's been around since at least 1997. The output of the program is a single number indicating the probability that two people colloborated on an assignment. The threshold is typically set fairly high (0.90+), so false-positives are less likely. 187 students, the number caught this time around, is definitely the highest I've heard of, but it's definitely not the first time we've hit a large number -- just the first time it made the cover of the local newspaper.

Interestingly, many students (including myself before becoming a TA) think (well, thought now) cheatfinder is just something the profs made up to scare students.

Re:How exact? by Brownstar · 2002-01-16 08:03 · Score: 4, Interesting

I actually had that happen to me. I was taking an assembly course where the teacher wanted us to reverse the order of values in a list.

He gave us a long complicated piece of c code to do this, but instead I just used a stack (we didn't "learn" about those in class untill a few weeks later). Well, it just so happened 1 other student felt like writing the 11 line stack implementation, rather than the 100+ line one the teacher recommended. The teacher then said we cheated.

Fortunatly we were both able to explain how our code worked

Re:How exact? by gmhowell · 2002-01-16 08:07 · Score: 3, Interesting

Not necessarily. I was at a college that I won't name where I was on the Academic Honor Board. Essentially, suspected cheats were brought before the board to decide on guilt/inocense (sp) and give punishment.

Computer cases were the most common (4 of the 5 cases I sat in on). One day, we had three cases, with different defendants in each one. All programs, from about 15 students were essentially identical. What were the differences? Capitalization of variable names, and indenting style. That's it. So, while they were not 'exact' copies, they were close enough in my mind to merit guilt.

They were fairly trivial programs. I think a total of maybe 150 lines of code or so. Can't remember if it was some form of basic, or C (I really think it was the former). There were a few ways to do the problem (I think it sorted words or something). But the striking thing is that the variables were typical CS100 nonsense names (variablefoo, variablebar, but NOT simply 'i' for iterator or 'x') of four-five characters in length, differing only in that some students had all uppercase, and others all lowercase.

Now, I suppose that if the instructor had said 'use these variable names' there is a defense. But that was never mentioned.

I think the ultimate answer was that almost everyone admitted that they did some amount of copying, and all got zeroes on the assignment. I can't remember if any failed the class (and no, nobody was tossed from school).

But this is the interesting thing: Each of the three cases was about the same instructor, with the same program. But they were brought as three cases. We were presented the hard copy evidence for all three cases at the beginning of the morning. During a break after the first case, I flipped through the other evidence packs. I saw that the copying was very, VERY similar in all three cases. In fact, there were more similarities between program A in case 1 and program B in case 2 than between Program A in case 1 and Program B in case 2. To my mind, it was clear that the cheating was much broader than indicated. However, I was ignored. Our power was only as petit jury, judge, and executioner. We had no room to act as grand jury. (In addition, this was my first real world experience with a judicial system unable to understand technical issues. I was a chem major. Roommate was a CS major. I was the only hard-science guy on the board. The others were various history/business majors.)

Anyway, the point is: exact copies are probably always cheating. But near copies are also sometimes cheating.

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon

In the teaching trenches... by Embedded+Geek · 2002-01-16 09:00 · Score: 4, Interesting

I've taught numerous courses and once figured it wouldn't be too tough to build a detector like this. Inevitably, someone who cheated would follow a very basic procedure:

Copy the original code.
Change every variable name (even if to a less sensible name - HalfCircleWidth instead of Radius).
Rephrase most comments, but in the most transparent manner (e.g. "incerment the counter": becomes "the counter is incremented").
Grab one or two lines of code near the top and rewrite them in the most awkward manner possible. Presumably, this is to prove to themselves that they're more clever than the teacher and that they could've actually done the assignment if they'd bothered.

Inevitably, it was the trivial stuff (indentation, comment structure) that set off my alarms. Then, I'd give them a moment of truth and sit them down to try to explain how "their" code works. If they didn't, I'd kick their tails out. If I was teaching a seminar at someone's workplace, I might or might not inform their management. Since all these penalties were spelled out in my syllabus, I never lost any sleep (in fact, putting them in my syllabus tends to ensure no one tries it).

As to the the differenece between "consulting" with another and "cheating", I've found that the "explain your own code" is a pretty good yardstick. If I spend 2-3 hours preparing to teach a lecture, I have no sympathy with someone who doesn't spend enough time to do the assigned work but instead cheats.

--

"Prepare for the worst - hope for the best."

Thoughts of another Georgia Tech alumnus by wberry · 2002-01-16 09:09 · Score: 3, Interesting

I took Intro to Computing in the Spring of 1996. It was cake for me because I was a Computer Science major and I dig this stuff. But a lot of non-CS people dreaded that class above all others, especially Management, International Affairs, and Architecture majors, but also some engineering people, such as Aerospace and Industrial Engineering.

(And can you really blame them? How many civil engineers really need to know how to sort numbers in O(N log N) time? Or insert into a linked list for that matter? They write hacked-up FORTRAN if they write anything at all.)

Kurt Eiselt came to the first lecture and gave us a scare speech about Cheatfinder. Knowing that it looks for similarities between two students' works, I was worried constantly about my homework answers. A typical problem was to write an inorder binary search tree traversal routine in pseudocode. Honestly, how many different ways are there to do this? And there are 500 people in all sections of the class?

Fortunately, I was never flagged, but I have heard a few stories (which may not be true, you know how that goes) of people who were flagged, and were only vindicated after losing student jobs and failing classes.

I don't think an automated cheat detection system is applicable to small problem sets like binary search, stacks, and Mergesort. For the later classes, say Sophomore level, I have no problem with it though.

Besides, many Greek orders and clubs on campus have extensive "word" banks--archives of previous homeworks and tests, with solutions, from previous class offerings. Are they going to check against all previous students' work too?

--
LAMP hosting on Debian, SSH, no bandwidth cap, PayPal accepted - http://secondbrainhosting.com/

OT: reversing a list by KnightStalker · 2002-01-16 09:48 · Score: 3, Interesting

Here's 7 lines of C to reverse an array. The assembly would be more or less identical. I don't feel like dredging up my memories of 8086 assembler... it would probably end up screwing up my Perl for the next hour or so :-)

int list[] = {0,1,2,3,4,5};
int i,j,len=sizeof(list)/sizeof(int);
for (i=0; i < len/2; i++) {
j = list[i];
list[i] = list[len - i - 1];
list[len - i - 1] = j;
}

Reversing a linked list would be marginally longer, but a doubly linked list would be just as short or shorter than this. Only a real novice would take 100 lines of C to do it. BTW, how could you possibly learn assembly before learning what a stack is?

--
* And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."

A good cheating policy by Dominic_Mazzoni · 2002-01-16 10:07 · Score: 5, Interesting

The best formal cheating policy I've seen was from Professor Steven Rudich at CMU:

CMU 15251 Course Document and Cheating Policy

His policy encourages collaboration and specifically forbids cheating. It itemizes various types of cheating, for example copying from another student, letting another student copy you, and looking at someone else's files online (even if they forgot to set their file permissions).

Furthermore, he requires all of the students in his class to sign a statement saying that they have read and understand the cheating policy. Not only does that discourage some students from cheating, but it also makes it much easier for him to get students into serious trouble with the school when they are caught.

In addition to the course document, here's more or less what he had to say on the first day of class: (I apologize for paraphrasing; this is how I remember it) "Nobody plans to cheat. You all must be very smart, or you wouldn't be here. You think you're going to try hard and do well in this class. But later in the semester you'll get busy with other classes and activities, and all of a sudden an assignment will be due in one day and you haven't started. Or you'll be taking a test and realize that you forgot to study an important equation. Or you'll work hard on an assignment and almost completely get it working, but get stuck on one subroutine. Even though you never planned on cheating, all of a sudden you'll find yourself in a circumstance like that and it will seem tempting."

(BTW, I shouldn't have to say this, but Prof. Rudich's cheating policy is copyrighted. If you're a teacher or T.A., don't copy his cheating policy without his permission. That would be just as dishonest as cheating!!! If you want to use it, contact him and I'm sure he'd be delighted to let you use it, as long as you give him credit.)

Stephen Ambrose, watch out! by MattJ · 2002-01-16 10:37 · Score: 4, Interesting

As you may have read, bestselling historian Stephen Ambrose was recently caught having lifted sentences and even passages from other sources, and passing them off as his own writing in his books. (While he mentioned the source books in footnotes/endnotes, he did not put the cribbed text in quotes.) At least four different Ambrose books have now been shown to have the same pattern of lifted, unattributed passages.

These instances only came to light because an author of a lifted passage noticed it while reading Ambrose's book. Subsequent episodes came about because other authors started looking, and now some people are checking out new likely sources; this works because Ambrose only lifted passages from books that he admired and heavily footnoted (at least, so far as we know!).

Perhaps Ambrose was really just lazy, as he was fairly open about crediting others for the ideas (he "just" failed to credit them for the words, too). There are many cases of sneakier plagiarism than that, both in academia and in journalism.

So, class, the programming problem for today is, given the text of two books, spit out the most likely candidates for lifted passages, based on length and similarity of words. You get a B if you can do this for exact, verbatim matches, an A if you can do it with individual word substitution, and an A+ if you can recognize re-ordered clauses. The end users for this tool would be 1) authors everywhere who want to protect their own writing, and 2) journalists looking for juicy plagiarism scandals.

Re:Talk about cheating by Stephen+Samuel · 2002-01-17 01:48 · Score: 3, Interesting

A friend of mine (Dan Wilson) who taught computing at the University of Alberta had a program (20 years ago!) that generated statistics on a program to catch cheaters and copiers. From his description of the program, it seemed to work off of the parse tree, so it was essentially imune to simple workarounds like renaming variables or changing the indents.

Of course, he was also aware of the limitations of the program (given that he wrote it), so I don't believe that he took the statistics as the sole sole arbitrator of whether or not students were stealing code.

On the other end of the scale, Dan once found out that someone had published a solution to one of his assignments. He publicly announded in class that he was aware of the cheat. For anybody who had already submitted copied code and couldn't come up with a 'real' solution, he offered a partial amnesty (a zero on the assignment, but otherwise no punnishment) for people who came forward and fessed. He also warned that anybody trying to sneak the cheat past him would be failed from the course.

Despite his warning, a number of students still submitted the known cheat. Some blindly submitted the file without any edits whatsoever -- not even bothering to fix a simple syntax error that kept the program from compiling. As promised, they were removed from the course and reported for cheating.
There's no accounting for abject stupidity.

--
Free Software: Like love, it grows best when given away.

Slashdot Mirror

Cheating Detector from Georgia Tech

15 of 941 comments (clear)