Cheating Detector from Georgia Tech
brightboy writes "According to this Yahoo! News article, Georgia Tech has developed and implemented a "cheating detector"; that is, a program which compares students' coding assignments to each other and detects exact matches. This was used for two undergraduate classes: "Introduction to Computing" (required for any student in the College of Computing) and "Object Oriented Programming" (required for Computer Science majors)." Cuz
remember programmers: in the real world you are fired if you consult
with a co-worker ;)
Your Hello World program is exactly the same as Johnny's. You fail. You're kicked out of school. Good bye.
And someone forgot that "=" is the variable set operator, not the comparison operator, which is "==", so suddenly the cheating detector gave a lot of people F's...
I am !amused.
You're fired for copying someone else's work and passing it off as your own, particularly if it's a competing product.
If your employer has any integrity at all, that is.
Geeks all over the world discover the diff utility and wonder what the hell the Georga Tech monkeys are so worked up about.
"Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
Looks like someone at Ga. Tech finally discovered "diff"!
"I feel that if a person can't communicate, the very least he can do is to shut up." -- Tom Lehrer
1993? Think it was written in COBOL? Have they tested it for Y2K compliance?
Kind thoughts do not change the world
Wow.
How the hell are all those lonely CS majors supposed to get in good with the Education majors now?
It hurts when I pee.
Yes, but one of the goals of a CS department should be to produce programmers who are capable of doing work themselves. Would you want to work with (or supervise) a slacker who couldn't code his way out of a paper bag, but who graduated anyway because he cut-and-pasted the work of his (harder-working) classmates?
This is new? They used something like this when I was at the University of Maryland a few years ago. And it did more than just check for exact matches, it compared parse trees and so on to check for similar program structure (any matches were, of course, double-checked by a human before ringing the cheating bell). It caught quite a few people I knew.
I'm not surprised, since systems such as this are already widely used for detecting plagiarized essays.
Remember, folks -- you may not get fired for consulting with a fellow programmer, but if you never learn how to do anything but copy & paste other people's code, you've lost out on a LOT of problem solving skills.
:).
There's a difference, a huge difference, between collaborating and cheating.
In the real world, you _would_ get fired for taking credit for someone else's work, trying to pass it off on your own. Heck, you'd probably also violate a bunch of licenses, too
Not representing or approved by my company or anybody else.
Wow, they learned how to use diff?? :)
Seriously, I'm pretty sure they already to this here at University of Michigan. At least the professors SAY they do. I suppose it could be a bluff, but I don't see why they wouldn't do it.
yet another reason the switch to MIS...heh
ln -s cheatingdetector /usr/bin/diff
m00.
The Rochester Institute of Technology (R.I.T.) has a "try" command that compiles, tests, runs, and submits a students coding assignments electronically. I believe the programs are then run through a big hash function to detect similarities between the submitted code and all other submitted code. I don't know how far back their data comparison goes, however.
Yeah, it's called "diff"...
cheers
The article talks about a program that has been around since 1993 and merely detects exact duplicates of code...
Not really a big deal... Our school uses some University's program and database which not only detects fragment duplication, but also permutations of the code (such as changing variables, white space, etc.). Not sure which University though....
diff(1)
In the simplest case, diff compares the contents of the two files from-file and to-file. A file name of - stands for text read from the standard input. As a special case, diff compares a copy of standard input to itself.
Sounds like they developed diff again.
-M
Professors have been using WinDiff for many years now... whats so big about this?
university of wisconsin had the exact same thing, and i went there 5 years ago.
my friend actually got kicked out of schoole for a year because his program matched someone else's like 95% or something.
it compared variable names, syntax, style, and just general 'sameness'... i guess for most projects 50% would be average, and they just flag the ones way off the mean.
MARIJUANA, SHROOMS, X: ONLINE?! - E
Don't get me wrong, I understand that cheating on a test is wrong. I'm concerned that this sort of thing may help promote the "wasn't built here" syndrome (I believe it's called something else.)
I'm just hoping that this is balanced with a few lessons on reusing and sharing code, for practical purposes.
I worked as a teaching assistant at Purdue and we had a coding checker back in 84-85. It compared parse trees of students programs. Did very well at catching those that just changed variable names and resubmitted work.
--morris meyer
This is nothing new, I went/go to George Mason University and they've been doing that for a few years now, and not just for CS related courses ... I'm sure other schools have been too ... Ryan
Shouldn't sufficiently small diffs be enough to flag two assignments as potential cheats?
It better check for exact duplicates only, down to the variable names. Many undergraduate CS assignments are programs so basic that there are really only a few ways to implement them. It would suck to be a student who from scratch used the same algorithm as another student, and have them both flagged as cheaters.
A few years ago, when I was a 2nd or 3rd year at Virginia Tech, some professor implemented a cheating detector into the automated grader for a class called Intro to C++.
Prior to that year, VT had an average of 75 cheating violations for the WHOLE university (25000+ students). For that one class, on one assignment, 150 students were found cheating by the cheating detector... out of the 500 or so students in the class.
Funny as hell
Nosce te Ipsum
Of course, just plugging in sections of work into a google search engine can produce interesting results....
Probably of more value would be to compare results this year with previous years submitted answers. Although there would be alot of people out there with PhD's who would be in trouble if this time honoured method of passing exams and assignments was cracked down on.
Michael
There is no cryptographic solution to the problem where the intended receiver and the attacker are the same entity.
They have been doing this at MIT for a few years...
Last year about 30% of the superhardcore undergrad Software Engineering 6.170 class was found to have lifted at least some of their code. They did it right after drop date too, so the kids had to petition to drop the class once they found out they would be failing...
I always found that it wasn't easy to cheat. If I copy and pasted somebodies code, I had to go back through and change it all around so that I couldn't be caught cheating. This often proved to be more difficult than actually doing the project myself would have been.
I use code I find on the internet all the time, first thing I do when I start a new routine as a matter of fact. I search for preexisting code and use/modify that accordingly. Of course only freely available code but there are tons of sources for that. How would this program detect that? I would trust some code I got from the net over something another person in class wrote, but then again I know enough to be able to see if the code is good or not and to modify it if I need to.
Mod me down for this, but consulting with a co-worker at a job and obtaining code from a fellow student is NOT the same thing. The purpose of going to school is to learn and therefore they want your work, not your friend's. At a job, they just want the work to get done, they don't care how you do it.
If it were free and open, we could write white papers on why it sucks and/or improve it. That wouldn't be so bad.
That's sarcasm for those of you unfamiliar with the stuff.
7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
There are many programs out there for exactly the same purpose. For example, moss at berkeley lets you do this over the net.
This is not really new at all. At my school (U toronto), a "cheaterbeater" program was used in *every* programming assignment we had. And it detected more than just exact matches - it could figure out if you copied someone's code, shuffled some lines around, changed variable names, or other surface changes that would trick T.A.s. Several people were caught with this program. Why is this article here? It even says in the article that it was developed in '93 !!
Well, actually, if you take work from another co-worker and pass it off as your own, you'll be fired and prosecuted.
Don't Plagiarise - it's the law. (And judging from the snide comment, probably the reason that CmdrTaco never finished college.)
In the real world, you are fired if you steal code from someone else without their permission, pretend it's your own, and incorporate it into the app you're writing for your company. In the real world, people give credit where credit is due.
A lot of the stories I hear about students plagiarizing each other's code is done without the other student's permission. Many systems have files readable by other students by default, and students don't bother to read-protect their files. Students will take printouts out of the trash. And of course, it's always convenient for students to claim they didn't know the other person copied their work.
It's better for students if professors have an accurate way of detecting cheating. The worst thing is if the method is inaccurate, and innocent students get accused. This method sounds accurate.
Find free books.
Yeah, diff is basically it. There is minimal statistical stuff, they simply strip the comments, change the identifiers to x and check for similarities in the structure. We have it at our uni. Hell, they even mark our programming (beginners stuff) automatically.
This is no news at all. All the students need do is change a few whiles and fors and they are in the clear...
Who cares? If you need to cheat on hello world then you really do have a problem...
~~~
The software that U of T uses was developed somewhere else, I thought it was MIT, but I could be wrong.
They didn't actually tell us that they were using this stuff, I found out after I graduated from reading it in the newspaper. So it's probably in widespread use, it's just not something CS departments brag about (I guess catching cheaters is fun).
Moderators should have to take a reading comprehension test.
If you can't program, switch to MIS.
So how exactly does consulting with a fellow student (or co-worker) result in both parties having identical code? I have to say that this is the most ignorant comment I've seen attached to a slashdot story ever.
At my University they have the same code policy but they encourage you to work with others! Under no circumstances are you to copy their code line by line but you can certainly ask for their help or use a module or two. The only condition to all of this is that you credit them on the cover sheet of your assignment.
Sorry for the flame but I saw that comment and it made me quite irate.
--
Todd's Law: All things being equal, you lose!
When I was a TA, if I saw two assignments that looked suspicious, I'd hold them up side by side and cross my eyes to get the stereogram effect. If it was a bad cheating job, there would be an almost perfect match and my eyes would be able to focus on them, with the differences jumping out at me.
Of course, with freshman assignments, they tend to be pretty damn similar even without cheating (write C code to implement bubble-sort using the pseudo-code in the book as a guide). And usually, the students that were cheating would fail the tests, so there was little need to do anything special.
Why someone would pay tens of thousands of dollars to learn nothing and end up with a job that pays well but that they'll get fired from within a few weeks...
Okay, like others have mentioned, think of diff:
* determines exact matches
umm, sure, diff does that
* written in 1993
I checked man on my machine and got this date: "22sep1993"
Heh.
-ted
to stop cheating, will GT bust them for plagiarism? ;-)
Read the EFF's Fair Use FAQ
I wonder if they will start using this on the resumes of their coaches from now on?
Based on the results, one should not blindly take action, but if these results are used as a guideline for further inquiry, it might be a help. I know that this sort of cheating can get rampant, at least where I went to school. And even if you are working with other students to figure out a program, it's pretty unlikely that all the positioning of characters in chunks of code would be identical.
It might seem like the face recognition stuff that's controversial right now. But the important differences between these two things are:
- it can be done much more precisely with code.
- this is for school assignments, not unknown checks in public places.
Seems like this wouldn't be too hard to trick, if someone could run a script that would randomly insert spaces, tabs, comments etc. throughout the code.
mark
If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan
but there is a lot of cheating in undergraduate courses.
I was one of the better students in my comp-sci classes and so other students looked for me for help etc. I would routinely point them to my own finished assignments as example of how to do something or provide listings in which we would discuess the assignment and how to do things.
This worked well until I got called before the teacher in regards to two students having taken my listings and typed them in ( with practically no modification whatso-ever ). I explained the truth - that I provided it for purposes of instruction not stealing and managed to escape. The other students were forced to retake the course.
After this incident I kept my eyes wider open and noticed more students "copying"...
It happens. Whether this program is really needed or not I think is more an indication of how well the teacher stresses the students on final exams and such.
There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
Back in my day, my teachers actually LOOKED over our code. Wont This just make it more convienent for teachers to ruberstamp grades and not bother to check and make sure that you code is elegant and not spaghetti crap?
Writing a computer program is an exercise in understanding the basic fundamental operations on a computer. Personally I don't want a guy with a CS degree that I hired to come up to me and say "no, I can't figure this problem out. I borrowed all my code from my friends." My response, "may I have your friend's contact info so I can hire someone who can get the job done?". Good grief, the early CS projects are "hello world", and some basic algorithms. They are pretty much canned and it is really important for someone to have a grasp of those fundamentals before moving on.
Later projects need to include collaboration and workgroup projects.
Just my $.02.
John Kramer
God may be my co-pilot, but the devil is my backseat driver.
``But for the most part, the degree of similarity that this program is looking for - the commas are in the same place, the semicolons are in the same place, the spacing is the same, they've made the same mistakes - the only explanation, and what most students will eventually concede, is they actually did it,''
This type of thing should be caught and should be punished. Some people just don't belong in CS, and they should be weeded out instead of riding on another's shoulders.
And if he believes in karma, then Rob will spend the rest of his days being the coworker who has to be continually "consulted" by shouldn't-have-graduated CS cheaters who can't write anything themselves.
There were several people that got caught 'cheating' last year in one of my CS classes last year at the Univerisity of Washington. The software they used to check the homework, which is always electronically submitted, went beyond looking for exact matches; it also looked for similarities in code structure that had merely variable names changed. In exchange for some donuts once I gave a snippet of code to my roommate, who was also enrolled in the course, but we were smart enough to make the code look completely different in his submission so we didn't get caught.
4-bit adder: A snake made of 1's and 0's
Those who can't teach, prevent colaboration.
Besides - those topics aren't covered until
CS3300 (Team based Software Engineering)
Heck, even in the open source world, I can't copy someone else's program verbatim and claim I wrote it.
Even if two people work together on a project, as long as they write their code separately, the code will be significantly different enough that it shouldn't be recognized as cheating.
Probably what this will catch is the last minute "Quick, let me copy your program" right before it's due. And this DOES happen, and I find nothing "right" about that at all. That IS cheating, plain and simple, and should be stopped. In a class of 30 students, the instructor (or TA's) will probably be able to notice similarities. In larger classes, its easy for these things to slip by, especially if the grading process is split amongst multiple TA's.
-Restil
Play with my webcams and lights here
Boston University has been using software developed at a university in California (if someone knows where, post) for about 3 years now that does the same thing. It is known to make mistakes and a human must check those papers deemed plagarism to concur the programs decision, but it is still very useful for the purposes of a university where they expect you to do your own work. I know many people who have been accused and suspended because of it, mainly because they didnt believe the professor when he told us about the software the first day. Though a lack of collaboration is not realistic when applied to post-college experiences, it is invaluable for the university to ensure that we are getting what we pay for (because most people honestly dont seem to want to learn, so the school tries to make sure they do.) It is also useful because most cs classes are curved and if you have half of the class cheating, they are royally dimishing the chances of good grades for those who do their own work.
It is a questionable practice, but in this specific setting, I believe it is nessicary, though a nessicary evil perhaps.
BU student
I am a fourth year CS undergrad at Tech and they have had this system in place at least since I took these classes as a freshman, and I know that it was there even before that. A friend of mine got caught one time because he let someone look at his old code, and the idiots copied it and changed variable names. Some people never learn... I am curious as to why its getting all this media attention (this is the third media report I've seen on it) as its been at Tech in some incarnation for at least 4 years.
This encourages students to write horribly obfuscated programs that only barely compile (in the hopes that nobody else would do anything taht bizarre)...in other words, it prepares them for the real world nicely. :-)
This isn't anything new. I'm a student at Virginia Tech, and the CS dept. has used a "cheating detector" for some years now. It's quite evolved and doesn't only detect obviously copied code (exact copies and copies w/ renamed variables, functions, etc.) but indications of cheating such as a section of code with a drastically different coding style than the rest of the code. It's quite good, and the CS instructors often brag that while it's rarely a case that students cheat (the Honor Code here is a point of pride), the program's garunteed convictions in the Honor Court.
May-be /. could youse dis tech 2stop re posts.
-- Knowing too much can get you killed, but knowing who knows too much can make you rich.
Looks like there's finally a market for my new product idea: A code obfuscator that also accesses a thesaurus to substitute different meaningful identifier names!
CmdrTaco says: ;)
Cuz remember programmers: in the real world you are fired if you consult with a co-worker
As someone who TAed classes at GA Tech, I take a lot of offense at this comment. There is a difference between working as a team on project based classes (of which GA Tech has a good number off including classes where we got to hack the Linux kernel and another where we got to deliver a product to a customer) once you've shown you understand the basics of programming and wholesale copying of other people's work in entry level classes where you are supposed to be learning to program on your own.
Beginning programmers need to learn how to program, find information from MAN pages & API docs, and come up with solutions on their own before being introduced into team based environments. If not they never learn how to be self sufficient or even if they are cut out for programming at all.
It is true that in the real world no man is an island but on the flip side, how many people have worked with co-workers who completely clueless about how to perform their jobs but held degrees or certifications that implied they shoould be knowledgeable about programming? These are the kind of people who hid behind the work of others in team based projects and submitted others work on individual projects.
My high school CS teacher once found a few identical programs. He printed out the source code to some transparancies, then lined them up, one on top of the other, on the overhead projector. The only blurred spot was the comment with the students' names.
Not a typewriter
When I was taking programming classes in the mid '80s at UCLA, they had a rather clever cheating detection program. It didn't look at the source (Pascal or C) code, but rather at the produced assembler code to see if students were copying others' algorithms.
So you might obfuscate your copied code by moving it around, changing variable names, etc. but it would still catch you.
Someone did this as a Master's project when I was in grad school 5 years ago. Not new. Not news.
Also, this is another case of "Stupid people deserve to be caught."
But for the most part, the degree of similarity that this program is looking for - the commas are in the same place, the semicolons are in the same place, the spacing is the same...
It wouldn't take a lot of thought to alter these factors. Like I used to tell my undergrad students when I was a graduate TA, if you're not smart enough to answer the questions, you're probably not smart enough to cheat either.
Even in the case of "the mistakes are the same" -- if your verification and debugging skills are that bad, you still deserve to be caught!
Some more info on the cheating detector from a Georgia Tech Alum of the CS program.
1. The cheating detector is not new. It's been in place for years. When I took intro programming in 1994 they mentioned it, and it wasn't new then.
2. Everybody at Tech knows about it. They tell you about this script the first day of class. Nobody here should be suprised they were caught. The fact that they were caught only shows them to be some of the stupidest people at Tech.
3. It catches people every term. Usual numbers are below 5% range. The fact that it caught someone isn't news. The fact that it caught 10% of a class is news.
4. These classes are cake. There is no reason anyone should need to cheat to pass these classes. They are the most basic concepts of programming.
A system similar to this has been in place for a while at the University of Florida. I talked to several of our professors using it and they claim that it is quite effective. It detects the obvious ways that students try to obfuscate their cheating, e.g. changing variable names, whitespace, etc. Whenever the program turns up a match, the professor examines them by hand before calling the students into his office. In almost all cases, the students confess. The first semester it was put in place, nearly a third of the students in the Intro course were caught cheating! The rate hasn't been that high since. When the programs assigned are sufficiently complex, the odds of finding two people with the exact same decision tree is quite small.
A similar system was developed at Columbia.
Shockwave Flash movies are the greatest thing to happen to non-sequitur humor since Japan.
I wonder if this is anything like the MOSS (Measure Of Software Similarity) program developed at Berkeley in 1998?
The press release here.
You can also see the MOSS website here.
As a student at Rensselaer Polytechnic Institute I had a teacher run the entire classes code through MOSS for each assignment last semester, and aparently caught several people who had very similar code.
MOSS also has the ability to detect similarities in software strcuture as opposed to just checking for exact duplicates of code.
CS students who cut and paste each others code deserve to be caught in my opinion.
I believe that there are two types of code... The kind you can (and should) freely copy, and the kind you can't.
This has nothing to do with copyrights or stealing someone's work. If the code fragment actually solves a conceptual problem, some form of algorithm, then it should definitely not be copied. Of course, some fragments are so common (such as a sort) that all code will look the same anyways, and you should be able to copy some of your own code.
But the kind of code that should be copied at all costs are things such as system calls, user interface calls, connecting to various sources, and much of the stuff that has nothing to do with the actual problem. These fragments are frequently tedious, and rarely are useful to understand, and such fragments should be freely available in online databases. Especially the UI stuff, you WANT your program to work the same as others!
My two € cents.
Homework Assignments:
...
... And use the following template for the the form letter: ...
1) Write a program that takes in a temperature in Farenheit and converts it to Celsius. Have the program diplay the correct result on the screen to two decimal places.
2) Write a program that will calculate the day of the week of any date in history*. Use the following formula:
3) Write a program that will send a form letter to everyone in the class. Use the following list of names:
The problem is that the solutions to these programs are going to be pretty similar across the students' submissions.
...because they all declared "i" as an integer...
Summary
Best Slashdot Co
Georga Tech's Honor Code for CS1321 - Introduction to Computing
For all of you who posted : "gee, they invented diff again", it's a little more involved than just "diff". I'm sure other schools have similar cheat-detecing programs as well. Also, why Yahoo decided to pick up on this now and pass it off as news is beyond my comprehension. Maybe they had nothing else better to pass off as news. In my entire 4 years at GA Tech, I only heard about this program once and it's not a big deal. "There's nothing here to see people. Please move on with your lives."
Biodiesel : domestic, renewable, clean, and in the fuel tank of my bone stock 2002 New Beetle TDI
Would you be so stupid as to copy it exactly? I mean theres 20 different ways to do the same thing in just about any langauge, how stupid would you have to be to copy someone eleses code without changing variable names and statements to be slightly different.
CS programs at schools are not out to end colaboration with students. They are aiming to produce students who know how to program. In the real world YES, you can just copy the code directly from someone else, but what does that teach you? Nothing. Well how to copy/paste.
I mean, we talked about this today in class, if a guy gets a degree and makes it out of school riding the coattails of others his degree is worthless. Once he is out in the real world, he also drags down everyone else who has the same degree from the same school because employeers will think - Guys from school x don't know jack.
CS departements are not evil, but they are trying to uphold the principles of school. Don't misinterpret actions such as these as some sort of action to "keep people down".
-RonB
It is human nature to take shortcuts in thinking.
When I was at University, the group approach to individual assignments was a very real problem. In a class of 300, there would be two or three groups of 50 people who all submitted the same solution, with the profs or TAs being none the wiser (or simply not caring). The upshot was that the CS program was producing graduates of programming classes who couldn't actually program even the somewhat trivial class assignments!
I'm all for anything that helps to solve this serious problem that seems to be common in CS departments. CS is about more than programming, but if you claim to have passed a programming class you certainly should have to cut code to get your grade.
You just have to give credit where credit is due.
If you say "Mark wrote this" then you won't fail for cheating, you'll fail not because 'cause someone else did it, but because you didn't do what you were supposed to.
If you say "I had trouble with bounding this loop and I got it fixed with help from Mark" then they'll not penaliz you in any way, unless that was the whole excercise.
No two programmers are working on the exact same bug/feature. You can't expect to just copy someone else's code or get them to do most of the work for you. At best, you can ask for help, but that's not what the article is talking about. It's talking about straight copying of code (with minor changes to fool a cursory examination).
Virginia Tech has had this since atleast 1998. All programs that we submit are run through and any violations are sent directly to the Honor Court. Professors have said that the conviction rate is 99%, but I'm not sure if that's a scare tactic or what.
VTCS 2002
I'm a grad student at GaTech, and from what I know of their 'introduction to computing' class, it basically consists of writing a basic webpage (I believe they have to include 1 picture and 1 table) and writing simple pseudocode. Doesn't seem much room for cheating there (and if you have to cheat writing html, you SHOULD be caught). The OOL class, of course, leaves open a lot more room for copying functions and classes and the like. Another thing to think about is that 180 students is around 10% of the campus. Surely that many people weren't involved in some mass cheating scam, were they? And if so, why wasn't it found out before?
IMO, those who choose to cheat should be allowed to. They will not learn the material and be less inteligence for it. Even if they do graduate, these people should be washed out pretty quickly in the job market because their true skills will be apparent. This will leave more jobs available for those self motivated people who are truly good with thier work.
I was in computer science last year at university of guelph where I (and a bunch of other unrelated students) got busted using their code checking machine.
All programming assignments are handed in online where they are fed through a program that checks them for similarities. Basically if the code is similar the machine flags the program with the other program(s) that are the same and the instructors are informed.
When I was called to the deans office about this I was handed a hard copy of my program as well as that of the other student involved with everything similar printed in red text. Even though the code looked nothing the same (yes, they were copied however) the machine was able to pick up on it.
I learned from my mistake.
spend money here
One of my CS profs created a program to do something similar for himself. It would take two programs and compare them and give a similarity score between 0.0 and 1.0. Seeing anything up to 0.6 in intro courses was considered normal, since the assignments were easier, but much above that and things got suspicious fast. Of course, any red flags were hand checked. Seeing as this is the prof that taught the compiler courses, I don't think there were many false positives. :-)
:-) Having a URL to an identical file from an algorithm archive helped too.
It caught a few guys that I know. When confronted they tried to say that they didn't cheat. So the prof does the only sensible thing that a CS prof should do when dealing with cheating intro students: Single out a common line of code in their programs and ask them what it did. Hint: How many of you knew the ternary operator in your first forays into C?
Pax, Ardax
It could also look at a batch of source code, and look for similarities. Of course, our class was only 16 people, so you pretty much knew something as up when a person you knew wasn't getting it suddenly gets an A on the project.
I just graduated from GaTech in December and I was a Teaching Assistant for the Into to Computing class for 2 and a half years at Tech. The students are told on the first day of class that cheating is not allowed and that if you are caught you will be punished. They are told about the program and whether they believe or not is their problem.
The students are told it is ok to discuss the homeworks and project with each other and that it is ok to discuss the concepts. However it is NOT ok to copy each other's code.
The program does not just compare the text of each student's homework which is what some people seem to think it does. The program gets rid of variable names, function names and things like that because a person cheating can simply change those. It compares the style of the code and it is not given common code to look at. The only code checked is the code from problems that generally generate unique solutions.
In the time I spent there I know of over a hundred cheating cases caught by the program. In some of those cases if you had of given me the 2 pieces of code I never would have said the people were cheating but when asked the students confessed. I have never heard of someone being falsely accused. Most of the time when the 2 cheaters are asked separately they admitt to it.
Once again, Tech does not have any problem with people helping each other understand concepts like the way pointers or a vector works or the differences between stacks and queues. What they have a problem with is when each studen does not do his own work on an individual homework.
Eventhough some of the problems may seem not worth it, like writing your own version of strcpy, it is still necessary so that students understand how the library functions work even if they will never be writing library functions in their life.
I wonder how this system compares with a program developed at Berkeley called Moss to serve the same purpose. Moss is free and available as a web service. It is really pretty neat, for those of you advocating the use of 'diff' Moss is quite a bit more complicated than diff. It will match up lines of common code and also compare the choice of token names within the program. Learn more here: http://www.cs.berkeley.edu/~aiken/moss.html
When I was taking some of the early 2000 level CS classes at tech back in '95-96 or so, we were warned from the beginning that they had a cheat-finder script or utility of some kind that they used. IIRC, it was not just a character by character comparison, but used some kind of percent similarity method.
If any current/former gatech TAs/profs want to correct me/add to this, please do...
Andrew
It might be funny to use them against their own source and watch them go crazy *cheat* *cheat*...
The responses here, at least the ones along the lines of "But collaboration is allowed in the real world" sicken me. I would (and HAVE) fired programmers who couldn't program simple stuff on thier own. The collaboration in industry is not anywhere near the level of syntax and elementry algorithm design.
A University degree is supposed to signify that you demonstrated knowledge in certain areas.
Cheating is not demonstrating knowledge.
Undergraduate level programming assignments do not require even consultation with other students, IMHO. They are too simple. If you can't code an undergraduate programming project without extensive "consulting", then you can't program. Period.
I am sickened by the number of people with CS degrees only because of "teamwork" and "consulting". I would guess, from my experience, 95% of people with CS degrees can't write a sort routine. Widespread use of these kinds of programs might fix some of this. As would harsher grading. In the real world, you don't get partial credit for a program that only dumps core or doesn't meet any of the design objectives. (in my opinion, any program which doesn't properly run a set of tests, provided to the students in the project instructions, should receive an "F" grade)
No wonder the software industry is such a mess. I've seem CS *GRADUATE* students who couldn't use malloc(). Note that I did not say "who use malloc() wrong - no, these students could not even figure out how to call malloc() nor explain what it did. There's something strange happening (I call it cheating) when someone can graduate with a CS degree yet never use dynamic memory allocation knowingly...
Having spent a considerable amount of time marking undergraduate assignments it is amazing how many people will duplicate the results of others, faithfully reproducing the same bizarre errors. For one reason or another, some folks feel that they are better off cheating than actually performing the work themselves. What is more amazing is that these same folks believe they won't be caught. In these cases I would usually divide-up the mark for this one question amongst the duplicators.
During my own undergraduate years I quickly realized that unless I completed the work myself I was wasting my time and money. Collaboration and discussion of how to solve a problem is useful, but blindly duplicating the results of others is no way to learn.
Catching plagerism amongst submitted computer code is nothing new. Automagic comparison of submitted code was done during my undergraduate years long ago. Even though our instructor warned us at the beginning of the year at least half a dozen people were caught using software evaluation.
First, the standard disclaimers: my comments are my own and should not be taken as necessarily representative of the GA Tech administration.
There's a much better and more accurate article on the topic at the AJC. Take the AP version with a grain of salt.
The fact that GA Tech uses software to detect possible cheating should not come as a surprise to anyone. Such systems have been in use at many schools across the country for many different disciplines besides CS. Nor should anyone be disturbed by the use of such systems: their purpose is to detect possible cheating, which according the AJC article was clearly verboten to the students in the class.
In the real world, a completely different set of rules may exist, but the fact remains that if your boss tells you he wants you to do something on your own, then you'd damn well better do it on your own. When a teacher instructs a student to perform a task on his own, he so instructs not to make life more difficult for the student, but to ensure that the student is capable of independently executing the skills necessary for the completion of the assignment. When that student eventually enters the real world, he has demonstrated the ability to perform the skills to be expected of him in the real world, so when he then has the ability to collaborate with his peers, he can actually contribure to the group's performance. A student who has always relied on others to get by will offer minimal assistance to a group and will typically act as a hinderance.
So sure, in the real world you won't be fired for collaborating with your peers, but you will be if you can't get anything done without collaborating with your peers.
we've had this at RIT for a long time. The teachers coded it. It's really really tricky. Basically they have the attitude of, if you can get by the cheating program, then you know what you're doing and deserve the grade you get.
The GeekNights podcast is going strong. Listen!
The hard part is turning up the "sensitivity," so you get not just exact copies, but also people who have taken parts of a program or made some trivial modifications.
The problem is that it's hard to find info about these sytems for the very good reason that this is one instance where security by obscurity makes sense. If students know how the systems work, they can re-implement them and check to see if they'll be caught.
Greg
It was very simple, when we "flagged" someone who was suspect they got a very simple oral exam to prove the program was writen by them.
It was nothing hard, just what vars did you use for what. Why did you loop this, and a couple of other little things that anyone that wrote a program would remember.
It was just a first run...nothing more.
Neck_of_the_Woods
#/usr/local/surf/glassy/overhead
I'm a grad student at Tech who TA's an undergrad programming course. The course is a senior-level design based course that typically has 15 people or so. Even with such a small number, it typically takes me a full 10 hours to grade homework assignments. For a 100-200 person introductory course, this work has to be spread out over multiple TA's. Hence Johnny Apple may copy Suzy Zebra's assignment, but it would never show up on anyone's radar. Using a program to check for similarities across assignments is extremely useful as a first level of cheating detection, and I see no problem as long as the final "this person was cheating" decision falls to a real person (which it does at GA Tech, there's an appeals process and everything).
Some men spend their entire lives trying to kill themselves for having been born. --Ross MacDonald
In college I actually got "caught" cheating by one of these programs. My friend was having trouble with his assignment so I went over to his house and walked him through it as a tutor would do. At no time did he ever have any of my code. However, because I helped him much of his code was like mine. In particular I solved the problem at hand. I almost got kicked out of school for it, but since it was a first offence we just got zero's for the assignment. To this day I don't see what was wrong with what we did, I tought him how to solve the problem, I didn't solve it for him. This sort of thing still pisses me off.
Apoptosis
Ok, check for exact match: diff source1.c source2.c
great, I just wrote a program to check for exact matches in source code, and it took me three seconds. Maybe I should apply for a patent for my ingenious approach (maybe I'd get it!!)
At my organisation, (in India) we've been developing something like this for quite some time for our internal tests.
While most of the work isn't (and probably won't be) publicly released, we can look at a systematic approach to building a better detector.
indent -i8 -kr
sed -e 's,\([^ ^I]\)[ ^I]\+,\1
sed -e 'g/^[ ^I]\+$/d'
i=i;
You may also want to first strip all #include <> statements (not #include ""), and run the code through the C preprocessor first to take care of #define, and conditional compilation
There's more obviously that I'm not sharing with you. These are the basics that anyone could figure out in a few minutes - not years.
Do not underestimate the value of print statements for debugging.
In the old sense, meaning people who believe that elegance is important and simplicity is a factor of elegance, and give them identical problems and environments to work in. Syntactically their solutions will most likely be very different (one will comment, the other not, one prefers braces indented one way, the other another way, etc.) but for many, many problems, particularly those posed in acedemia, their solutions will often be extremely similar, the algorithms possibly identical. How many ways are there to efficiently write a bubble sort, for example, or a node walker? There are lots of sucky solutions but may be very few elegant 'hackerish' ones, in my experience.
So what happens when the cheating detector spits their names out?
After 20 years of professional programming, I've recently gone back to university to get my BS in Computer Science (yes, there is a wall, Virginia.) The first time this happens to me I'm calling my cousin, the lawyer. At the least I expect to have the rest of my tuition paid for. At best I want to see some lazy prof fired. I refuse to put up with this B.S.
This is news? My school (Michigan) has been using programs like this for quite a while (at least a couple of years). Written here, not bought, AFAIK. And yes, they do catch a lot of cheaters, which I think is fantastic. They're still imperfect, though... I've heard of a case or two where the software was tricked when no cheating actually occurred.
Tastes like burning! - Ralph Wiggum
If a class of a hundred are taught in the same way and they are given the same assignment, how similar are all 100 assignments going to be? They are going to be virtually identical in coding style, as theyve been taught the same way! The only differences will be layout, comments and variable names. But any good faker will know to change these anyway. So whats the point?
It's a good thing that most of the students don't use a code-reformatter to layout their code ...
Here's some more info, from the perspective of a former TA (once for one of the classes in question). First, everyone at GaTech is required to take the first CS class, not just CS majors (== people in the CoC). Second, GaTech doesn't restrict collaboration in all classes. The first tier of classes are strictly individual so everyone has to be in front of the computer. In the second tier, CS2130 - Languages and Translation explicitly allows colloboration as long as people turn in their own code. Going further, later classes involve heavy amounts of group work.
With regards to the cheater-detecter program (called 'cheatfinder'), it's significantly more complicated than diff(1). It involves checking the structure of the code (ignoring variable names , indentation, and whatnot). Admittedly, I've never seen the source for it (very few people have), but it's been around since at least 1997. The output of the program is a single number indicating the probability that two people colloborated on an assignment. The threshold is typically set fairly high (0.90+), so false-positives are less likely. 187 students, the number caught this time around, is definitely the highest I've heard of, but it's definitely not the first time we've hit a large number -- just the first time it made the cover of the local newspaper.
Interestingly, many students (including myself before becoming a TA) think (well, thought now) cheatfinder is just something the profs made up to scare students.
It turns out that about half of the people cheating really weren't- they just all happened to independently come up with a seperate working implementation than the Professors originally intended, and hadn't even thought of themselves.
All that ended up coming of this was that the Professors apologized on the class newsgroup- I think they still check the code using the same program.
bleh -- this doesn't seem particularly groundbreaking. As others have noted for their respective schools, Harvard has had this system in place for years. They keep pretty mum on its inner workings, but we know that it does more than straight text matching, probably using some of the parsing internals of the compiler to break the code down into a canonical form, and then do the necessary comparisons.
-mike
what about last years's students? or the entire internet? there's a ton of code out there that is freely available.
I also know that I would be pissed and question the credibility of the profressor that uses thison my code. Because that is exactly what the professors is doing, assuming that everyone is cheating and run them through the detector... maybe we should frisk and do drug testing on all the students, and hook them up to polygraphs during tests too just to be sure they dont have weapons, cheating aids, or are taking controlled substances or are cheating on the tests as we know they all are.
Any professor that would use this or even allow it to be used on his students has ZERO respect from me as that is the amount of respect he is showing his students.
Do not look at laser with remaining good eye.
We had one like that at fanshawe college as well. It could detect more then just exact duplicates, like duplicate functions as well. It would then be up to the professor to decide if it is an attempt at cheating.
this is nothing new, fanshawe college isnt the first to do this.
Programs like this either generate horrible false positives, or they're little more than a diff that ignores comments.
We had a professor who used one of these. I know for a fact that at least one person was copying his buddy's Pascal programs, but rejiggering them to use "repeat until" loops instead of "while do" everywhere, and changing variable names, and this was enough to throw off the scent.
This just in...there's this new thing going around called the "Internet."
Apparently it is some form of communication medium.
More on that as it develops.
Magius_AR
Okay then all you Open Source advocates and coders are cheaters and bad people who are just plain uninteligent?
No I do not think you are and I also do not consider cutting and pasting code to be cheating. The only person you cheat is when you cut and paste code is yourself. The goal is CS is to get the job done as quickly and as efficiently as possible. Allot of times those goals involves cutting and pasting code code you found from somewhere.
"Help me Obi-/.-Kenobi,your my only hope!" -$
Oh, so they've developed "diff -s" now. Watch out for the patent application any day now...
First off, diff doesn't work if the kids are smart enough to change their variable names and add spaces here and there.
I was a grader for the C++ and data structures class back when i was in school. And I saw my share of cheating. One instance that stands out is when a bunch of kids had variables called "dude" and "funtime". Problem was, they had enough differences elsewhere in their code, that an automated diff wouldn't have worked. For a while, I was going to write some fancy perl that would look for certain cheating patterns that I was seeing, but then I got lazy.
One deeper way to check for cheating is to pass code through the front end of a compiler and check what comes out. if there are too many simmilarities, they will stand out even if kids change paramater names and the like.
Finaly consider this: Checking for cheaters in a class isn't just doing a diff of two files. For every student in the class, you have to check his code against everyone else's. This is a O(n^2)problem. My class had around 350 people in it
so that's 122500 checks to do. If it is anything more complex than a diff (multiple files, compiler front-end, fancy perl parcing) this can take a mad amount of computing.
Blaze a trail to the New World
Mabey the BSA should make thier members run this... see who's copying off who...
Could Mr Gates and Mr Jobs please see me after class. I have found a problem with your assignments.
Cruise TT
If this program is really necessary, it suggests that very little effort is being spent evaluating students' work. Given that universities have been cutting back on teaching budgets, it's probably due to too many students in a class or inexperienced professors. If the professors took the time to actually look at the code (or in the other case, read students' papers carefully), it is pretty obvious if a student is cheating.
Also, if the professors took the time to create good homework problems, cheating wouldn't help a student even if they didn't get caught. Of course, if you're teaching two classes with 100 students each, you can't really do that.
Perhaps students should do a little math and figure that if they pay $4k per quarter and take four classes, that's $1k per student per class. And if a professor is teaching 200 students, the university is receiving $200k for the classes. The professor is probably making somewhere around $10k for the quarter, call it $20 to include benefits and taxes. Where is the other $180k going? And when budget crunches come up, perhaps budgeting should be done similar to code profiling -- look to the biggest expenses first, and make sure that the cuts don't damage the overall purpose.
Personally, if I was getting the sort of cheap-ass education where this cheater-detection software is considered a cheap alternative to proper education, I'd expect to pay much less in tuition, rather than the constant 10+% tuition increases students have been seeing for the past 20 years or so.
There is also the problem of getting code from the net and from people unrelated to the University. When I worked in a Comp Sci department in a college of The University of London we would sometimes run reports and thesis submissions against AltaVista and Google; usually if we thought that the student had far exceeded expectations.
One student was caught because the person in the States that the student commissioned to write his final year project for him complained to the University that the student hadn't paid him. We checked into it and the student was expelled.
Kevin
"It's not the cough that carries you off, it's the coffin they carry you off in" O. Nash
One assignment: modify this open source program to fix an assigned bug.
Another assignment: Modify this (Same) open source program and add a feature that they've been wanting for a while.
It's not like the solutions aren't readily available and well documented, it's just like in the real world: it hasn't been done here yet.
As long as intro courses use textbook problems with textbook solutions, students should be penalized for not doing their own work. The point in these classes is to provide an educational foundation. As soon as the foundations are laid, the students should be given work that isn't straight out of a textbook, and should be allowed to use any legal method of getting a solution. It's not like they'll find an exact solution, so they will have to do some tweaking and patching to get it to work anyway.
This was the philosophy my teaching cohort and I presented to the college Provost and the head of the CS department. They bought into it and gave us the class.
Network Security: It always comes down to a big guy with a gun.
Hey, you can't expect TA's to use judgement...these guys can barely pick their ass without notes.
The idea of using judgement and common sense in grading undergrad papers beggars belief. Therefore, they use a program which blindly follows logic thought up by a TA. A professor takes credit and it gets published.
See how the world works? You should cheat, but set it up so you can't get caught.
Here at Stanford, where CS classes account for more than half of all cheating incidents, we've been using a system like this for a couple of years. Apparently, rather than comparing source code, it actually compares object code so that it can detect people who change variable names and so on. The theory is that no two people should actually write code which compiles to the exact same object code, no matter how similar their algorithms, unless they really are cheating.
One of the professors in the CSE department at my school wrote (at least he claims to have written it, and I have no reason to doubt that) a tool that, among other things, tries to match patterns in the ways that memory is allocated and logical paths are constructed. It's actually a whole suite of tools that are used by the department to catch illegal collusion among students, and every submitted program is run through these tests.
I have found there are just two ways to go.
It all comes down to livin' fast or dyin' slow. -REK, Jr.
$ man 1 diff
Why bother.
Someone at the GATech CS dept figured out how to run diff?! Wow, no wonder its considered a top University. I guess now people will start reformatting the code until the University discovers the -w flag. Once that happens, who knows, CS students might actually have to learn to code.. The horror!
he called it 'diff', and he caught some people too!
"i was saying gnu-rd"
(Geez.. how many people just don't bother reading any articles before posting their "original idea"?)
T
---- It puts the lotion on its skin or else it gets the hose again. It does this whenever it's told.
We have had this in the UC Berkeley computer science department for some time now. IIRC it's been quite effective; when it was first unveiled it nailed many, many students for cheating (I think). The verdict amongst students is, if you're good enough to defeat the cheating detector then doing the assignment on your own should be no problem anyways.
I think there is a world market for maybe five personal web logs.
May B ?, Wood mouse disects two slippery posts.
I'm sick of programmers with degrees who can't program. Honestly, 100 level CS classes are supposed to be the acid test; if you don't have the basic problem solving skills to perform simple conditional logic, understand arrays and symbolic links, and output "hello world," you shouldn't be a CS major. And yet I've seen CS students in upper level classes (there was one in my networking group in his senior year) who can't write a program from scratch without a book or template to guide them.
This is a major difference from consultation. Copying hurts the computer world, it hurts other students (who have one less peer to help them through the rough patches with hints and examples) and it unalienably hurts the cheater most of all. I dropped out of my CS department to work for IBM *because* there were so many students sliding by and yet still maintaining higher averages than mine.
Of course, I am guessing that the GT system will probably be fairly stupid at first, and it won't take long for some overworked freshman to write an app to automatically thwart it. But this system will definitely help keep honest people honest.
Hey freaks: now you're ju
This is nothing new, MIT has been doing this for awhile...
...I wrote BASICA (aka GW-BASIC) code for a living (and you can wipe that smirk off your face right now, I got paid damned well for it). Because this was an interpreted language and worked within (God help me) 64k of memory, space was at a premium.
:)
There was this 3rd part product that did a bunch of nice things, but one of the options was "compact". It would remove all whitespace outside of literals, jam as many statements on a line as possible, replace long var names with "A0", "A1", "A2" and so on, if desired it would renumber lines from 0 on in increments of 1, etc etc, all in the name of reducing the memory footprint.
It would excrete the most horrific, obfuscated, unreadable, unmaintainable little turd of code you can possibly imagine.
I hope something similar is available for these students
The policy when I was a student was that talking about your code with another student was fine but looking at another student's code was not ok. If you wanted to sketch out a basic solution in psudeo-code that was ok to. You just couldn't implement it together.
Later classes actually required collaboration. I took an OS design classes (used the dinosaur book by Tenenbaum) and we worked in teams. There were plenty of times when two or three of the teams would sit down and has through how to do something without getting into actuall implementation details. By this time they want to have filtered out the cheaters because if you are cheating on projects of the magnitude that this one was the only way you are going to get caught if is the seg fault is on the exact same line as someone else.
"You can't fight in here! This is the war room" --Dr. Stra
I attend the University of Maryland, College Park, and they've done this for years. In fact, they back-catalogue assignments for years so that at any time they can compare your code with that of programmers that have long since graduated (if they've assigned the same assignment as a previous year).
When I was at Tech, I took the OO programing course. The first day of the course all the TAs and professors were like... If you think this thing doesnt exist then try us. :) We couldnt talk about how we were going to right an algorithm and were definitly not allowed to look at each others code. But while the diff cant really be fooled... at least the auto grader (we upload our programs to a webpage and it grades the program in about 5 seconds) can be fooled. :) Muha ha ha!
"Today I am going to give you two examinations, one in trigonometry and one in honesty. I hope you will pass them both, but if you must fail one, let it be trigonometry, for there are many good [people] in this world today who cannot pass an examination in trigonometry, but there are no good [people] in the world who cannot pass an examination in honesty."
- Madison Sarratt (1891-1978), dean, Vanderbilt University.
I was in a Joint Enrollment Economics class where the prof. pulled four students out of the final exam: he took them all into the hall, and essentially dismissed them (they came back in sans exam). It turns out the four of them had (while the rest of the class was busy doing original work) quietly trading answers in the back of the classroom.
the real kicker is that the professor was kind and gave them a second chance: he let them take the exam again after the school day was done. Well, he leaves the room for five minutes and when he comes back, the four cheaters are at it again! Needless to say, they got Fs.
Personally, I can't imagine someone who thinks that it's okay to use someone else's work in any environment where the goal is to measure the individual.
What do you call a person who copies code instead of writing his own? "Hello, Mr. CEO."
Cuz remember programmers: in the real world you are fired if you consult with a co-worker
I guess the university wants to guarentee that there will be well educated people that workplace cheaters can consult with.
_______
2B1ASK1
There isn't anyone to copy off of because they all have their own projects to work on.
A lot of people have been saying "this is news?" The news is that 187 people were "caught" with this program, not that the program was written. The story appeared in the Atlanta Journal-Constitution today; I presume it went out on the wire services, too.
Frankly, I'm skeptical. I'm sure this program is more sophisticated than diff, which means more likely to get false positives. It was in two introductory programming courses. I'm not sure that there are 187 different ways of writing substantially different programs for that kind of assignment, especially by people who have only seen the style of the teacher and that of the textbooks.
I love this idea. Rather than outlawing good practices, they're doing real education. And if you really cheat with this policy in place, it's very clear-cut.
http://www.cs.berkeley.edu/~aiken/moss.html
The problem is giving credit where credit's due. There was a post on here a few months ago about Redhat releasing a driver (sorry I don't remember the details but it's on /. somewhere...) that was taken from one of the BSDs, without giving credit to the original developer of the driver (as the BSD license requires). I've had professors in the past that had no problem with you borrowing code as long as you cite where you got it from (just like writing a paper). I think Computer Science is still about getting the job done without reinventing the wheel but you need to make sure you aren't outright stealing someone else's work without either their consent or giving them credit.
Some clarifications from a random cs major at 'Tech.
The first class cs1311 is a scheme course which is required by every student (as far as I can tell). The second class cs1312 [Java] is required by CompSci, CompEng, and several other engineering majors (Industrial, Electrical, etc). Since all the departments pressure the CS dept. to dumb these classes down there really isn't a reason to copy code except laziness.
The surprise really isn't that they caught people cheating (it's usually around 5%), but that it was around 1 out of every 8 students that did so.
As a side note most of the assignments given aren't copy algorithms out of the book assignments. More so, according to the honor code you are required to cite any book from which you obtained an algorithm to use.
If only they had used this on their football coach before Notre Dame hired him!
sulli
RTFJ.
Cooperate to Graduate... thats the American Way!
i used to work in a lab as an undergrad computer science student, i was SHOCKED at the number of people that would copy. some would get away with it, others would get caught (depending who the prof was). as soon as they started taking more advanced classes, where individual projects were involved, it was very clear who had cheated their way through the easy classes.
now, a few of these people probably managed to get through the entire program, and i am certain this reflected poorly on the school when they got out in the real world and didn't even know what OOP was, or how to write classes in C.
i fully support schools that implement systems like this, as long as they work. it's better for the school, the students, and the eventual employers of the students.
if you can't do the programming, you shouldn't be in the class.
Yeah, six words is enough to determine that we're cheating.
Or that we're both quoting the same relevant material from our (text)book.
That one's fucked.
Curiosity?!? My ass! He stole shit! -T. Carpenter
I'm a CS Student at Drexel University in Philadelphia. The last term I had class (out in the real world now, working in Perl on co-op) was last spring, when our Prof told us he was implementing a simalar program.
At the time, they were still very inaccurate, and being used only to pick which ones needed to be carefully examined by the TA's. Not Really sure how they run now.
Am I Over-Moderating??
Manchester University has had this for years, and it's fairly advanced, checking overall structure, so people who use different variable names or replace do whiles with while loops will still get caught. It's fairly extensive, and also works with Functional languages. It also highlights how much code looks similar, and highlights the pieces of code in question.
There was a link but I can't find it
There is a really simple way of determining whether or not someone has copied code for an assignment:
Remove all whitespace, and change all variable names to a standard name, and then compare the munged sources. Its a good way of figuring out whether anyone has copied code and then just reformatted it and changed the variable names. Sure, it won't catch folks who copy a routine here or there, but it is a quick and easy start, and if you get more sophisticated it can cut down on cheating considerably.
I would think these students would receive a bonus for efficiency in reusing existing code. :)
'He was a dreamer, a thinker, a speculative philosopher... or, as his wife would have it, an idiot.' - Douglas Adams
What they really need is a tool that can determine individual contributions to "group" projects and give credit to the people who did them!
One fool actually asked me to translate one of my Java programs into C for him so he could use it in his class! What a yutz.
To make a "cheating detector" work, they'll need to compare student's work against the vast body of public domain, published, and open source code found in the wild. And that just isn't practical. With the current system, they're only going to catch those who cheat off each other, or those who copy from the same source.
All about me
if cheating is so bad in Software Engineering then why don't the universities add Ethics to the list of prerequisites for the course...
I took a number of classes at a big10 univ that used systems to do this, and let me tell you, they don't work. How do I know you ask? I spent a semester under the gun of the academic honor board(or whatever fancy title they use to pad their resumes - they are other students) with the threat of expulsion looming over my shoulder. My school has very strict policies about cooperation among students, essentially help of any kind is a violation. I asked a friend(already took the class) to help me out on a problem I was having debugging a small section of code. His contribution amounted to 1 line of code(used for debugging ONLY) that I accidentaly left in when I submitted my solution, and a 5% increase(from a 91 to a 96%) on the score (or 1% of my total grade for the semester). To this day I do not feel like this was cheating, and if the Prof. would like to argue otherwise I would say that I was the least of his problems(the class was rife with major league cheating(entire assignments)) and at least I knew the material(for the most part obviously)
I went and talked to the prof, and he asked me if i would like to admit to anything. I knew I had technically violated their rules, and that my friend was now open for punishment so I made the dumbest mistake of my college career - I lied straight to his face and told him I looked at an old solution(they use the same assignments each year - NO changes). Had the prof actually looked at the code I claimed I copied, the code I turned in and the code that was presumably flagged they would have easily seen that I was lying about using the old solution. He was more than happy to accept my explaination(look at how effective it is!) even though it was an IMPOSSIBILITY(i NEVER looked at it, and my code was totally different from their code(i finally saw what they accussed me of AFTER they handed out my punishment.)
To make a very long, convoluted and just plain stupid story short: these systems don't work, there are only so many ways to skin a cat in most cases, and they rely on other methods(FUD) to "catch" "cheaters"
In the end it didn't really matter to me(aside from the massive amounts of stress) - I had my final grade lowered by 11% but I still ended up with an A, but only because I knew the material
My advice to ANYONE flagged by one of these systems - find out immediately what they are accusing you of, and certainly don't try to make things better by making them worse
fvck yeah i'm anon
... you aren't even hired if you don't fucking know how to code in the first place and got your degree by copying others' work.
Don't say such stupid shit, Rob.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Ok, who will be the first to propouse a anti-anti-cheat program.
You give your classmate's source code to it, and it returns an anti-anti-cheat-program code :o)
We must always be a step ahead!
-=-=-=-=
I know life isn't fair, but why can't it ever be un-fair in MY favor!?
I graduated from GA Tech, and I had to take both intro to programming classes as all engineering majors were required to. the second class was programming in java, and yeah they would use the "cheat detector" to weed out those who shared codes....and like it had all the work of all students going back to previous couple semesters so you couldn't even use any old work by someone else. they would check it on everything you turn in, homeworks, programs, design docs. I remember around 50 students every semester got a "F" and disciplinary warning. It was a scary thing....one wrong move and you were busted, and there was no arguing the mighty "Cheat Detector"
Such a project is not that difficult, in fact, we had to develop a program for my senior software engineering class, with a GUI and all. It was based on Halsted's metrics which is based on lines of code, variable names, function calls etc.
The problem with such a method is in introduction CS classes (or even OO classes) there are very few ways to do things. Furthermore if the professor gives information related to it, it's highly likely such information will appear verbatim in the solutions.
My university uses such a program, and often it seems to flag people who didn't cheat. We might hear about how many people get "caught" but it never shows how many people actually DIDN'T cheat (ie the rate of false positives). Of the three people who I know well that were flagged, none of them cheated and all were exonerated. Fortunately, I attend a small enough school that a few well placed connections can correct this, but I feel bad for people at larger institutions.
As a TA for a senior level class, I still frequently come across copied homework and lab assignments. Unfortunately, all I can do is give the people 0's on the assignment (no harm if they never did it in the first place). I'm wondering what the punishment at other institutions is.
My Slashdot account is old enough to drink...
diff..not much more...
My school, UQAM..(Université du Québec à Montréal), has been using it for ages....If you can't code, switch to MIS.
The point in having first and second semester students do their own work is to evaluate what they have learned, not what their roommates, or classmates, have learned. Once they have the basics down, we expect them to work in teams and to collaborate, as long as they document where they got any code.
I have a hard time having sympathy for someone who turns in work that is not their own, passing it off as their own, and then wondering why they fail. There are rules to follow; the rules are clear; the students agree to follow the rules in order to attend; the rules should be enforced.
Setting his threshold to 5, Sparky eliminated most of the trolls on /.
As has been said for various other schools, the RPI CS dept has a script like this... after a pre-compilation stage, it doesn't matter if you changed variable or function names, or moved the functions around in the file... at that stage of the compile, if the searches find very similar code, it will flag it, and a real person will then compare the findings. Nice, efficient, and tougher than changing "int foo;" to "int bar;"
"It's tough to be bilingual when you get hit in the head."
Anyways... I have no particular problem with finding cheats. Learning the social skills and dynamics of group software development is one thing, but outright cheating is quite something else. When I think back to my school days, those who were pathological in their need for 'help' on their assignments were the ones typically over their head and/or in the wrong profession. Weed them out!
CrazyLegs
"Pork!!" said the Fish, and we all laughed.
Virginia Tech has been doing this for years. The worst part is that anyone caught with this system gets an automatic Honor Code violation.
I mean, what would you name a variable that is the bottom axis of a plot. If you said "x", congratulations, you have an honor code violation. Something like 60% of the students in my programming class got a violation in one form or another.
Of course, some people are so stupid that it's really not necesary to use a script on them. One of my professors told a story of a student who copied someone else's C++ code and changed all of the instances of this to that. Obviously, the new code didn't work too well.
n/t
I thought the best way to discover cheaters was to ask them about 'their' code. If they waffle about how they implemented it, they usually did not do the work.
You might use a tool to help find potential cheating, but using a tool to prove it is very wrong and unfair.
make Linux, not Microsoft. sin(beast) = -0.809016994374947424102293417182819
At my CS department (University of Washington) we had a similar program running. As it turned out, about 10% of the class was cheating. Normally, the faculty can't do anything about cheaters because it takes up so much time cutting through the red tape (that, and the student union is really strong). But for some reason my class had so many cheaters that they decided to actually go after these guys.
The class had a about 200 people or so... 20 or so cheaters. That's practically a whole section!
-me
Taco, anytime you want to come work for my school, please feel free. When you're responsible for guaranteeing that the graduating students can think for themselves, making sure that they (and you) understand the difference between collaborating and cheating isn't something to make a snide joke about.
I don't remember any article generating more me too posts. Everybody -1 Redundant.
Knowing slashdot, they'd prolly implement it by comparing how well they compress.
You know they're dying to use this revolutionary compression filter technology in other places!
C-X C-S
I went to Purdue-Cal where we had a guy who was cheating his way through the assignments by getting listings from the TA he was sleeping with. He would show up the day before the project was due, sit at a terminal typing for a couple hours, compile, get up, stretch, walk over to the printer, collect his listing and leave.
When he graduated, he landed a job at a local company where he bluffed his way through the interview and then was more or less useless at his coding job. Guy was an ace at fooling the boss, though - convinced the slob everything was someone else's fault. Conned his way into a project leader job where he didn't need to code and pretty much 'delegated' any real work to his subordinates. Last I hear, he was on the fast-track to a director slot.
Something like this would have kept this guy from ever passing his first ASM class.
Damn it. So that's why they had us write a cheating detector in my freshman computer science class. I should have cheated back then...
--
http://www.aikiweb.com - AikiWeb Aikido Information
Its not about the software, which is implied in the /. version. Its about 186 people caught cheating. 186 students who will proably be going to community college next semester.
The worst thing is that they did it in to the easiest classes in a computer sceince program, I got an A with skipping most classes when I went to school.
madness takes its toll please have exact change
background -
I attend an English university of no distinction, and I'm completing a master's degree in CS. It's not a first CS degree, but a conversion for those who did a different subject for their undergrad.
coding -
We've done some x86 assembler, C and Java, which I think is reasonable mix. Some people on the course seem to hate programming. Personally, I wish there was more, but that's by-the-by.
question -
Of the above, the Java project was collaborative. Makes sense, given the increased orthogonality of code units versus C or assembler. I wrote most of the code, someone designed the (optional) database, someone else did most of the worrying, and betwen them they produced vast amounts of documentation. I know for a fact that our code was not read over, despite the fact that implementation was supposed to be fifty per cent of the marks. We could not have been, as far as I can see, marked fairly. I like my code - it's the thing I do best. I'm not particularly interested in documentation &c. Does anyone else think that code is overlooked in marking, to the advantage of less core CS skills? How common is it for markers to skip looking at code? Has anyone often had their code looked over in great detail by their lecturers?
I wonder how many people have tried compiling their code, then decompiling it and handing in the tidied up decompiled code? Don't know much about decompilers, but I'd imagine they make the code look pretty terse, if the initial compile was any good.
> Prior to that year, VT had an average of 75 cheating violations for the WHOLE university (25000+ students). For that one class, on one assignment, 150 students were found cheating by the cheating detector... out of the 500 or so students in the class.
IIRC, there was a year when 20% of all students busted for cheating at my alma mater were in one of the sections of Intro to CS. I wonder whether cheating is especially endemic there, or whether it's merely the getting caught that's endemic.
After all, CS seems to hold the promise of a gilded career these days, but the subject matter is so difficult that Otto B. Abusinessmajor can't hack it. Lots of motivation and opportunity to cheat, it seems.
Sheesh, evil *and* a jerk. -- Jade
...which I suspect is what they're doing.
It's easy (almost trivial) to instrument a parser to build (say) a Python data structure representing a tree (I wrote some code to do this a while ago; see the CVS repository of http://sf.net/projects/dtct, and please try not to laugh). Applying a diff algorithm to a parse tree (or putting it back out into text and running diff on it, or comparing varieties of text, and so forth) is really quite simple, and not so prone to many workarounds as the system you mention (where using inlines, macros, pragmas modifying compilation &c could have significant effect).
This is diff plus a VB frontend, right? And those guys with a PHD on computer science education put a 1million bucks and 2 years work on it.
It's amazing what a good PR can do!!
------I can please only one person per day. Today is not your day. Tomorrow isn't looking good either.------
http://slashdot.org/article.pl?sid=01/05/09/198259 &mode=thread
"Perhaps most amazingly, votaries of 'diversity' insist on absolute conformity." -- Tony Snow
As a former Head TA for one of the classes in question (CS 1502 - Intro to Computing), I'll try to elaborate and answer common questions.
;)
;)
No, I have no current affiliation with Georgia Tech.
Yes, the cheatfinder really, really, honest-to-God exists. We used it every quarter that I was associated with the class and caught _lots_ of people. You'd be stunned how many people thought we were just making it up to scare them into not cheating.
Yes, it actually works. It examines mostly source code, although some versions of it were twiddled to look at "in-between" assembler to help catch those who just change variable names and such. It scans for patterns in the logical constructs of code blocks, even if they've been rearranged or altered in other "cosmetic" ways. It also looks for exact matches in text (like the "commas in same places" mentioned by Kurt in the article), but this is misleading -- it does a whole lot more than that.
Yes, depending on how you run it, it can generate a boatload of false positives, but it contains several tweakable threshold levels that let you control how "suspicious" a pair-match has to be before it gets flagged, and these thresholds are made looser for simple programs where there's really only one way to do it.
No, no action is *ever* taken based on the output of the cheatfinder directly. It merely alerts the TA who's responsible for cheatfinder that quarter and he/she then manually reads the source code to see if it looks like a case of cheating. If so, it gets sent on to the professor for a final verification (and possible discussion with the student if it is a borderline case), before being forwarded to Kurt for examination and possible disciplinary action.
Finally, yes, it's an old and very "evolved" codebase. You wouldn't want to be the one to maintain it, but on the other hand, it has been tweaked to the point where you'd be really surprised at the sort of clever cheating it can detect. (i.e. it works a lot better than diffing the source code
Anyway, figured I should throw in my $0.02 on this one, since I used to run that class.
If anybody has any specific questions, please post to this comment and I'll reply. (Questions from current Tech students asking how to "get around" the cheatfinder will be happily ignored, of course.
Michigan State, at least, has been using such a program in their computer science program for more than 5 years.
It was always interesting, when multiple students used an example of code out of the book for part of their projects, and then were accused of cheating as a result....
As a former TA for one of these classes who nearly ended up working on the cheat finder software for a quarter, let me add some additional fuel for the fire.
1. These are not just "programmers" in the traditional Computer Science major sense. The first class is required for almost all students at Georgia Tech. It started off just for Computer Science and Computer Engineering, then expanded to all engineering majors (civil, mechanical, etc). Now, even management majors (Georgia Tech's version of Communications, Basketweaving, or whatever the weak major that many athletes did at your school) have to take the class. The language used to be a locally developed pseudocode language (affectionately known as Russcal). Right or wrong, many of these students consider the class to be an unnecessary hurdle on their way to a degree, and to a technologically illiterate management major, programming does not come easy, nor are they inclined to learn their ethical obligations as a "programmer" - they just want out of the class.
2. Contrary to many snide remarks, the algorithm is, in fact, quite sophisticated. It is not fooled by extra white space, variable name changes, or simple rearranging. As a TA, I saw even simple algorithms done a slightly different way by every single student. Chances are that a student who will resort to cheating doesn't know enough to rearrange the code beyond the recognition of the cheat-finder and still have it be correct, and a student who does know enough would probably spend as much time dressing it up as it would take them to write the thing in the first place.
3. Once two submissions are flagged as possible copies, they are first reviewed by a student TA. If the TA believes that they are in fact copied, it is escalated to the class manager (GT staff), and then to the dean if need be.
It's not a perfect system, but the cheat-finder does a good job of crunching the role of a human down to a minimum, and leaves room for people to make a subjective judgement. It's pretty good, so cut the sarcasm back a bit - it's unwarranted.
Seen any BadMarketing lately?
The University of Colorado, Boulder, had/has a similar grading/"cheating" program that has been in use for several years. Ask any CS student about DORA and their Intro. to Programming or Data Structure's class. The usual reply is, "...that bitch." :-)
We're way ahead of you, Georgia Tech; the University of Toronto's Computer Science department (www.cs.toronto.edu) has been using software to detect plagarism in code for quite a while now.
Firstly, it doesn't just look for exact matches of code. That would be stupid. As far as I know, it uses certain algorithms that look for similarities in, for example, the structure of various classes used on a program. Presumably, they would also look for blatant similaraties in execution patterns.
As for CmdrTaco's ignorant comment on his deluded preception of the "real world" - obviously he doesn't understand the purpose of a university education. As a student, I'm glad that universities take such measures to prevent people from copying code. Especially when assignments don't involve implementing code for the sake of learning a language, but rather involve creating algorithms from scratch, for the purpose of understanding more fundamental concepts in computer science.
and programing is for fags anyway.
Cheating and getting away with it is all the learning I need. Hell, that is the real world.
A simalar system exsists at the Universtiy of Colorado in Boulder. The system does not do anything to those that just consult with each other it however does catch anyone that tries to just turn in someone elses work. It checks for programs that are 96% like other programs that have been submitted to the system. I think it is acctually a good method of catching blatant cheaters. I mean come on if your that bad at cheating you can't tell you know anything about the subject and would survive in the real world.
While the big-O notation is O(n^2), a comparison of this type is realy (n^2)/2, so you've *only* got 61250 comparisons for your 350 students...
In the Pascal course, the students had to generate a maze and solve it, finding the path recursively. They were supposed to use a random-number generator from the textbook, with three 5-digit constants. One student made a typo in one of the constants. Then I saw it again. A quick grep revealed a third program with the same goof... sorry, wrong number.
The other case occurred in a course where all examples and assignments were in ANSI C. Imagine my amazement when a homework file came in using K&R syntax. Just how red does a flag have to be? :-)
-Joe
Lose = not win
Pardon me for posting anonymously, but I've got to let a little venom loose.
I was a TA at an prestigious, well known computer science program. The professors there always unfurled elaborate anti-cheating policies. Cheating of any kind whatsoever would be brought before the Dean, where you are going to be subject to a wide range of punishments, including possible expulsion from the University. They purported to be using a script very similar to the one being described here. Yet many of my classmates cut and pasted their way through all the entry level classes while i labored away at every assignment. How did they get away with it?
That one issue -- that people who did no original work whatsoever got scores at least as high as mine -- has been dismissed as "a fact of life" by friends and family, and I tend to not think about it too much. Why? Because I'm the one getting the education, not them, and in 5 years when college GPAs don't matter a fraction as much as intelligence, experience, and work ability, I'll get sweet (or is it l33t?) revenge.
But in the meantime, I thought, let's become a TA and be on the other side of the fence for a change. Let me do my part to bring all these cheaters to justice.
And you know what? The reason they all got away with it was not because the previous TAs slacked off, but because the professors, when push came to shove, just didn't care. They lied about using the script.
When I brought identical assignments to their attention, they didn't pounce, but gave me options such as taking off some points or letting it go.
As it turns out, we have a very forgiving Dean, and any cheaters brought to his attention will get no more than a slap on the wrist. For that, professors get to do a lot of paperwork, cast themselves as the bad cop in making the case, and get a poor repuatation with students who are used to the status-quo of a cheat friendly environment. They don't want to do any of that , so they put on the pretense of being tough on cheating and hope it all goes away.
Slashdot is mostly a young crowd, and young are naive like I was, so let me break some bubbles: maybe at GA Tech profs let ethics take precedence over apathy, but not everywhere.
And to all your cheaters out there: yes, you're off the hook for now, but wait until we're co-workers.
Can someone please mod this out of the anonymous doldrums? Thanks.
...to check the resumes of candidates for their head football coaching position.
Considering the last guy who coached there was George O'Leary...
I Heart Sorting Networks
Almost 3/4 of all majors are required to take this and many people don't see programming as a skill they need to learn. The program is a little more sophisticated than detecting 'exact' matches. It supposedly is able to detect similar code with different comments or variable names, etc.
Having become familiar with the process by which the cheat-detect system works at Georgia Tech(and not by ending up on its wrong side, either), I have become aware that it is much more complex than a simple "diff filea.java fileb.java" operation. It is much less likely to be either spoofed or given a false positive than many of the people posting to this thread have suggested.
It should also be noted, for those people whining about the innocent suffering at the hands of the automated, zero-tolerance faculty, that having your project and someone else's tagged as possible cheats simply means that both potential offenders are referred to the dean of the College of Computing, or to the instructor of the course. It isn't as if the algorithm is the first, last, and only say in whether or not you are tagged/punished for cheating. That would be stupid.
One thing I've learned as I've become more and more versed in technology is that it should never affect major events, such as one's potential expulsion from school, without a very close degree of human oversight on a per-case basis.
I remember in my Unix class in college, we had to write something in AWK to read in a text file, average some grades and print out a report on the screen. At the beginning of the AWK lesson, he said if you have any problems with AWK, then use the "Oreilly Sed and AWK" book. Well that exact program was used in the book. Everyone figured it out and typed it in and changed variable names. Everyone got an A. If he said anything we all agreed that we'd mention to him that he was the one that told us to use that book as a reference.
I used it as a TA for a lower division class at Cal and it really was quite impressive--color coding similar sections, etc. A human review is necessary or course, but there was never really any doubt that the top few catches were cheating.
See MOSS and the 1998 Wired article about it.
-- Chris
(free; for instructors only)
Measure Of Software Similarity
While originally written as a "cheat" detector, it also does a good job of identifying similar structures in systems that are good candidates for refactoring. It helps find things that should be functions rather than cut-and-paste-with-trivial-or-no-change.
-dB
"It if was easy to do, we'd find someone cheaper than you to do it."
In the real world you also won't get fired for being a complete fucking idiot. What's your point?
Well, I suppose that depends on where you work. And like whether or not they suck. The fact of the matter is, there are a lot of people out there who do cut a lot of corners in their code, and still manage to make money. Those people suck to work with, of course, because their code is always shit and they don't really know what they're doing.
autopr0n is like, down and stuff.
I teach comp sci at a community college. I used to get all bent out of shape about plagiarism but it seemed to be a losing battle. Now all the students in my classes have to demonstrate their lab solutions to me. I ask them to explain the flow of program, point of the tricky bits, etc. I ask them questions like "if you were to change this subroutine/method to add this extra parameter, show me the places in your code where you'd have to incorporate the changes". I am concerned that the students have learned and understood the material, not how they arrive at the solution. The students are even allowed to demonstrate other people's code if they wish; we discover very quickly how much they know. The demonstration tends to result in much better feedback and interaction between the student and the instructor. I'm fortunate enough to be in an environment where we can give this sort of personal attention.
...good programmers create; great programmers steal.
KangarooBox - We make IT simple!
No one accuses the teacher of cheating when they copy the assignment straight from the textbook. Yet if the student does exactly the same thing with the answer, then suddenly it's bad.
If I assigned the classic "Hello world" program, I wouldn't expect or want something like;
main(){char *p=&13["\0\n!dlrow olleH"];while(*p) {putchar(*p--);}}
I'd expect the student to copy the program exactly. If I was worried about students not learning the material, I'd just assign a problem that hasn't been solved - preferably something actually useful, like "change this open source program to accept '--word' style options as well as the current set, or "change this to work with IPv6"
The question shouldn't be "did you copy?", the question should be "did you learn?".
So, when I was the head grader for the first hard cs couse at CMU, I wrote a similar program -- it does't need to be very complicated because cheaters are, by definition, lazy. Change some variable names, comments, whitespace, move some code around and maybe break a function into two smaller ones, and that's it. My code just counted the numbers of braces, 'if' statements, parentheses, equal signs, etcc, producing a set of numbers for each program. Then, sort them and pick out the ones that are really close to each other. Those get picked out for hand checks.
As for why: everybody was supposed to do their own work -- this was not one of the courses where people were supposed to collaborate on their programming assignments (those courses came later.) Some students went overboard on the restrictions -- there was never anything wrongg with discussing the assignments (that's part of the learning process as well), but everybody needed to do their own work to prove that they understood the material.
Now that I'm out in the "real world," this makes sense -- I can tell the people who cheated and slacked their way through school, because they don't last long without understanding what they're doing.
Now that I'm out in the work
Consider that one of the biggest problems with cheating is that it is often difficult for the professor or marker to determine which work was the original and which was the duplicate. The system below could theoretically completely solve that particular problem.
Using a system similar to carbon-dating, you can, in theory, determine the time at which a particular thing was written. Of course, this would require that the students be given pens with radioactive ink immediately prior to the exam, but in the cases where identical work is found in submissions from people that sit closer together than about 2 or 3 feet, this system could verify who copied off of whom. "Don't mind these glow-in-the-dark pens, class... we use them to detect cheaters."
Of course, nobody would ever actually do this... but it makes for an interesting thought experiment
File under 'M' for 'Manic ranting'
You may not be fired for consulting a coworker, but if you take a coworker's worker and then claim you did it yourself, you'd certainly better cover your bases.
C//
When I was in college, one semester the professor teaching the course on compilers failed every single student in the class, except for a couple of them. The reason? They all "cheated". Everyone had been broken up into small teams to work together for the entire course, and in the end each student was supposed to turn in his/her brand new compiler. But because the members of each team had worked very closely together, the team members' work was quite similar (though not exactly the same).
I guess the prof got upset and failed everyone save for a small few who I think hadn't worked with any team. The students protested, having worked very hard and produced some good work, but the school sided with the professor. So an entire class went down for the count.
Now we're making software that can fail you automatically! Great idea!
> Finaly consider this: Checking for cheaters in a class isn't just doing a diff of two files. For every student in the class, you have to check his code against everyone else's. This is a O(n^2) problem. My class had around 350 people in it
so that's 122500 checks to do. If it is anything more complex than a diff (multiple files, compiler front-end, fancy perl parcing) this can take a mad amount of computing.
Not so hard, because you just generate a parse tree for everyone, which takes a bit less time than compiling the programs. Then you use Lisp to try pattern matching on the parse trees, and terminate the comparison as soon as you reach a point that won't unify. (Probably no point in checking the rest of the program on that pair.)
Though an O(n^2) problem, the actual number of pairs to check is only about half of 350^2, call it 60K tests. No sweat for a fully automated system on a 1GHz PC, unless the programs are really big.
Sheesh, evil *and* a jerk. -- Jade
No, in the real world, they try not to employee you in the first place if you don't have the basic skills to do your job.
Take a look at any Usenet group for a particular language, alt.comp.lang.learn.c-c++ for example. There are plenty of volunteers in these groups who are happy to help people out with problems. Some, like the group above, are even dedicated to helping newbies, often with their homework assignments. However, most of the professionals there will refuse to post any help at all until it's clear that a respectable effort has been made.
Every now and then, some smart-ass objects and tells them they're something unpleasant, and the reply is always the same. It goes something like this: "I do this for a living. If I help you cheat, then one day you might get a qualification you don't deserve. And then you might wind up working for me."
There's a world of difference between helping someone out who genuinely doesn't know or understand something -- a common and sensible practice in industry -- and doing everything for someone because they're too lazy to do it themselves.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
A program such as this has been in use since prior to 1998 here at Virginia Tech
but in these places of higher learning they teach trivial stuff like "write a for loop that counts 1 to 100"...
... and since when do employers make a bunch of codemonkeys to the same thing?
no good debugging skillz.... no design skillz...
no clue how to work on a large software project with a team....
in any real software engineering workplace you better learn real fast how to reuse what's out there so you can get the job done FAST
being slow and stupid gets you fired... not making the best of what's out there...
whew... I sure did get emotional there....
If a person cheats their way through college, ideally, the only people they've hurt is themselves. If the people who work with these supposed inept programmers were to actually report it no one would be grumbling. I have no sympathy for those for people who complain and do nothing about it.
The only way the system will change is when people start truely believing in the part that education plays in becoming a good human being. Bravo to CmdToco for pointing out the stupidity of the system.
I was a teaching assistant for a CS1 class. When I found two students handed in virtually identical pieces of code, I'd just grade one then write on the other one "See [other student]'s paper". If they want to cheat, let them. It'll only hurt them in more advanced courses.
What I didn't like was when they fabricated results (no, "^" is not the exponentiation operator in C...). For this I gave them zeros.
[..] that is, a program which compares students' coding assignments to each other and detects exact matches. [...]
;))
Let me guess... Is this what they used ?
This isn't new. Several of my professors at Michigan Tech ran Perl scripts which did this exact thing...in 1995! I never actually saw them, but they did indeed catch anyone stupid enough to cut and paste code. I always made sure to obfuscate all my stolen..err.."borrowed" code.
-These aren't my pants.
in general... they are the extreme... there is a lot of general cheating going on and I think something like this is a good idea... to catch "the smarter cheaters"
-- Note: These Comments are Generated by ME! Not You! ME!
Diff already does this
If this is your solution to the problem of how to create a "Cheat Finder" then you are probably one of the idiots that was caught cheating.
I'm no genius but I know that text comparisons are extremely inefficient in a DB and will only catch the most blatant cheaters anyway. A comparison of the parse trees generated that ignores identifiers would be the barest minimum to catch cheaters in my book.
You're probably a chickenshit "web developer" aren't you?
I'm in my final year at university and my project is a coursework submission program which I will hopefully have plagurism detection built in...
I wonder if anyone will notice if I just borrow some the code from the Georgia one...
I would hate to see a new batch of B.Sc's comming out who graduated from copying off one another (cure for cancer would take another 100 years). I guess in computer science individual creativity isn't too important... By the way the only people I hear making the argument that a university education is analogous to the open source movement are hacks that can't code anything for themselves.
Why is this news? NC State University has been using a similar program for at least 5 years now.
I think this is a GOOD thing. Plagarism at the undergrad level runs rampant. If you're not smart enough to do the work - perhaps you should consider an educational path less taxing on the mind.
Subject: Academic Integrity
Importance: High
Welcome back! I hope that you had a wonderful holiday season. I am writing this note to you to give you a "heads up" regarding a new process for which Villanova has contracted to help us enforce our academic integrity policy.
At Villanova, class papers can now go to "Turnitin.com," which is a search engine that compares papers with others from Villanova and with thousands of websites to determine whether the material is the same. Once the search is complete, faculty receive a detailed report of what materials have been copied and from where.
I am telling you this to help you avoid academic integrity violations. Please be VERY careful and provide complete citations for your work; if your professor has indicated that you are to do your work individually, then do your OWN work; and so on. If you have ANY questions AT ALL, please seek clarification from your professor PRIOR to submitting your work to him/her!
I sincerely wish you a very successful semester!
Best regards,
Dr. Victoria McWilliams
Associate Dean C&F
Welcome to the Life! Boy it sucks!
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
It really isn't a valid criticism of preventing cheating in a compsci class to refer to the real work world where you DO consult and work together. The point of the university class is to help teach you a computer language - to make sure an individual gets it, understands it, and can use it. Once you get that, THEN you can enter the work world and collaborate instead of simply leach off the expertise of another.
If you take a chemistry class, it is generally expected that you the individual student will understand the reactions and the processes/tools rather than simply copy or leach off another student who DOES know what they're doing. You gain nothing and provide nothing if you can't do the basics on your own FIRST.
In Bushworld, they struggle to keep church and state separate in Iraq as they increasingly merge the two in America.
So they run this program, and find that 10 people all have the same code. Why coppied whom? It's just one person's word against another.
Tim ODonnell (trying to be the most
The problem isn't the cheatfinder.
Using a program to narrow down the field of what a real human needs to scrutinize is fine. The problem is when real humans refuse to use reason, based on the apparent belief that a computer said it so it must be true. In my experience, this has been the situation at Georgia Tech, and it's a sad situation indeed.
is competition good, or is duplication of effort bad?
mod me down if you like, but I've seen it happen in college. No rich kid I ever met worked 40hr/wk and took extra classes
A 'cheat catcher' was in use for intro programming courses at the University of Illinois when I went to grad school there in the early 70s. I suspect 90% of major universities now use some form of automatic similarity scanner.
Not only is this a strangely old topic on the face of it, it's even older than they claim. Michael Wise at Sydney University (now a research fellow at Cambridge) had a system called "yap" (Yet Another Plague) which did fairly advanced plagiarism detection on electronically submitted assignments that was already old hat when I was a 1st year undergraduate, back in 1990.
The first paper on yap (from 1992) cites related work from 1981 (in SIGCSE) and a series of articles on the system "plague" as far back as 1986.
All of this material (as well as the successor system, YAP3) is easily available on the Web. It would be nice to see people do 10 minutes of research before spewing out Yahoo news stories and Slashdot posts (note: "research" is a slightly different skill to "cutting and pasting press releases").
- Copy the original code.
- Change every variable name (even if to a less sensible name - HalfCircleWidth instead of Radius).
- Rephrase most comments, but in the most transparent manner (e.g. "incerment the counter": becomes "the counter is incremented").
- Grab one or two lines of code near the top and rewrite them in the most awkward manner possible. Presumably, this is to prove to themselves that they're more clever than the teacher and that they could've actually done the assignment if they'd bothered.
Inevitably, it was the trivial stuff (indentation, comment structure) that set off my alarms. Then, I'd give them a moment of truth and sit them down to try to explain how "their" code works. If they didn't, I'd kick their tails out. If I was teaching a seminar at someone's workplace, I might or might not inform their management. Since all these penalties were spelled out in my syllabus, I never lost any sleep (in fact, putting them in my syllabus tends to ensure no one tries it).As to the the differenece between "consulting" with another and "cheating", I've found that the "explain your own code" is a pretty good yardstick. If I spend 2-3 hours preparing to teach a lecture, I have no sympathy with someone who doesn't spend enough time to do the assigned work but instead cheats.
"Prepare for the worst - hope for the best."
just wanted to give my past experience with this kind of thing. at my (undergraduate) college, one of the cs profs had implemented a cheating detector of his own. in my senior year i was in his class and took an extended break one weekend. i had my code done early as i knew i wouldn't be able to work on it at all that weekend, and some people in the class knew i wouldn't be around. a guy who was also in the class came over to my dorm, was let into my room by my roommate, got on my comp, and took my code. the roommate's excuse was "well he came over before to study with you before so i figured it was ok to let him on your comp". dumbass.
/tmp, stealing files located in shared storage space with bad modes set (644), etc. all of this happened while i was in college to various friends etc but most of it occured in the low-level cs classes where the non-cs/non-engineer types were struggling to get "hello world" type programs to work.
the prof called us in separately a couple weeks after the assignment was due and i honestly had no idea what was going on. despite my explanations of what happened etc, he decided that it wasn't his job to decide if i was telling the truth, what should be done, etc and so he turned us BOTH over to the honor council. we were tried separately and with my roommate's testimony i was found innocent, and never again gave my l/p to the guy so he could play games on my box when i wasn't around. the other guy got off too, but that was because he was a 2nd semester senior with 2 weeks left and they just decided to get him out of there.
there were similar examples to this (where innocent parties are in trouble unfairly) due to people stealing printouts of peoples code in a shared lab, taking printouts from the garbage, stealing floppy discs w/code, stealing code from
when a similar cheating detector was used in the cs101 intro to c class, something like 20% of the class got in trouble. it was a real mess for the honor council. groups of people would steal code from smarter people and then share it around. amazing...
wayne
E V E R Y T H I N G I W R I T E I S F A L S E
This will probably get modded down as off-topic, but this is as good as almost any ./ story to complain about redundant links.
So you've found a nice tidbit on the web and decided to drop slashdot a link. That's cool. We love you for that. So you found it on news.yahoo.com. Why provide another confusing link just to the homepage? I know where it is. If I wouldn't know where it is, I could easily deduce it from the article's url. Moreover, it distracts my attention. It forces me to actually read, parse and interpret the submission, something I try to avoid as much as possible.
Your submission is by no means an exception. It seems to be some sort of editorial policy. Slashdot submissions abound with redundant links to the homepages of "google.com", "cnn.com"... hell it wouldn't suprise me to see "slashdot.org" linked. I think we can assume that people know how to find their way to these sites.
Why am I so upset? Because I hate skimming slashdot in the morning and hitting the wrong links all the time. It costs me time. It spoils my mood. Which makes me yell at my cat.
Links are meant to stand out. Keep them that way by using them sparingly. Stick to the meat.
Being well balanced is overrated. -- John Carmack
My first programming course 20 odd years ago led to a pretty funny cheating story. For a final exam we were given a fairly simple coding assignment to do on the last day of class. We were instructed to write our code down on sheet of paper, put our name on it, xerox a copy of it and hand it in right then and there. And then we had until the last day of Finals week to get the code to compile and work, and we would be graded on the delta between the first draft and the working code.
:).
The funny thing was that the course was taught in two sessions, a 10:00AM and 11:000AM session and the two sessions were given completely different coding assignments.
And yet there were a number of people that somehow managed to hand in working code for the other session's assignment instead of their own
Feh.
How long will it be before somebody develops a program which takes your code and changes variables and structure enough to get by these cheat screens? It's a lot easier to do with code than with your English paper.
You won't be fired for working with your co-workers. But, this is about ethical behavior.
Consider that you are a consultant and work for 2 different companies. If you take code from one company and use it for the other without permission, then you are acting unethically and will probably be fired if you are discovered.
If you can't even be ethical about writing "hello world", then how can you be trusted with someone's company?!
or htmltidy?
geez, what do we follow the best practices guides for??
see perldoc perlstyle for what i mean
or just grab perltidy and confuse the profs!!
back in the day we didnt have no old school
Instructors shouldn't waste time trying to catch cheating out of class; if people are stupid enough to short *themselves* by cheating, let them.
Catch cheating the same way other courses do: with solo in-class exams that can't be faked.
Ellen
mods metamodded as "Unfair"
I think that, for the classes that it's being used for, it is a reasonable check. I can't even remember how many people I've know who managed to cheat their way through lower level programming courses and then find out in Compiler construction or Operating Systems that they don't know how to translate their logic into working code.
The concept of "sharing ideas" is definently a valid concept but only when the basic ability of writing code has been learned
I took the course last semester, and well, the rules about cheating are insane. If you so much as look at another student's code in order to help him, or her, you both are technically cheating and if found out will both flunk the course. Also, alot of people who take the course, have no business taking it! Why? Well it's not every student at the college of computing, it's EVERY student who has to take that course. Some of these kids can't move a mouse in a straight line, much less write computer programs! And it IS a programming course, from day 1, you are taught scheme syntax and usage and some stupid sorting algorithms. Half the time, there aren't very many different approachs to the problems give you (they are VERY structured in how you are to complete them.) Also...at the beginning of the semester there were 5 of these classes being taught, 300 to a class. Of those who remained after drop day, 186 people ACCUSED of cheating is very surprising that it was so few to me! This so-called scandal is not because kids wanted to circumvent the system, it's because the CoC's administration has some overbearing rules.
On a sidenote...this is actually not new...it happens there EVERY semester, it's just the first time it was announced in mass to the press.
Face it, in real life, these students will have to collaborate on projects and problems. Telling them that they can't even give each other hints (I'm not joking,they devoted an entire lecture to what constitutes cheating!) is moronic in my opinion. And no, I was not one of the 186 students, I was so bored in the course that I never even went to class, I only bothered to show up for tests.
Derek Greene
>Cheating is not demonstrating knowledge.
I agree, so why not use our ability with networking to solve the problem. The goal is to see that students can code their way out of a paper bag.
Why not have a class of networked boxes and then 'test' the students by having them come in and write their code while the network is shut down, preventing the students from getting access to any help during the test. Take the floppys and CDROMs out and they can't bring in outside help. There could be bonus marks for speed.
Experiment!
Funny comment on the story about how programmers never can consult other programmers on a project. I do admit cheating in class is a bad idea, but....
They should come up w/ a class on how code REALLY gets written:
1. Do Systems Analysis and Design (ie. 20 minutes of chicken scratching on a piece of paper stolen from the laser printer).
2. Write The Groundwork Of The Program (ie. dig through your and co-workers code to find pieces that will do what you need to do).
3. Write A Little Code (ie. anything that you couldn't scrounge from friends/coworkers/examples on the net)
4. Release Software
This amounts to cutting and pasting from the instructor, rather than another student, but these are intro classes. How do you teach a language without providing example code that illustrates the concepts you're trying to teach? How can the students (who don't already know the language) complete their assignments without anything to work from? Can you imagine an instructor asking a bunch of newbies to format a printf() without an example to work from?
I think a lot of people who are going to get "caught" by a system like this won't be cheaters at all. There are going to be a lot of students getting slammed for using the resources that SHOULD be available to them; textbooks, lecture notes, tutors, etc.
What it WON'T catch are the real cheaters who get their solutions off the internet. A solution to that problem would be a lot more interesting.
Under capitalism man exploits man. Under communism it's the other way around.
I took Intro to Computing in the Spring of 1996. It was cake for me because I was a Computer Science major and I dig this stuff. But a lot of non-CS people dreaded that class above all others, especially Management, International Affairs, and Architecture majors, but also some engineering people, such as Aerospace and Industrial Engineering.
(And can you really blame them? How many civil engineers really need to know how to sort numbers in O(N log N) time? Or insert into a linked list for that matter? They write hacked-up FORTRAN if they write anything at all.)
Kurt Eiselt came to the first lecture and gave us a scare speech about Cheatfinder. Knowing that it looks for similarities between two students' works, I was worried constantly about my homework answers. A typical problem was to write an inorder binary search tree traversal routine in pseudocode. Honestly, how many different ways are there to do this? And there are 500 people in all sections of the class?
Fortunately, I was never flagged, but I have heard a few stories (which may not be true, you know how that goes) of people who were flagged, and were only vindicated after losing student jobs and failing classes.
I don't think an automated cheat detection system is applicable to small problem sets like binary search, stacks, and Mergesort. For the later classes, say Sophomore level, I have no problem with it though.
Besides, many Greek orders and clubs on campus have extensive "word" banks--archives of previous homeworks and tests, with solutions, from previous class offerings. Are they going to check against all previous students' work too?
LAMP hosting on Debian, SSH, no bandwidth cap, PayPal accepted - http://secondbrainhosting.com/
Could something like this be used to detect violations of gpl'd software? It would be especially useful for detecting usage of gpl'd software in proprietary products where the source is never released.
"Don't blame me, I voted for Kodos!"
Cuz remember programmers: in the real world you are fired if you consult with a co-worker
But remember, College ISN'T the real world. The point of College is not to get the job done, but to learn stuff. If you just copy somebody else's stuff, you won't learn anything, and you'll be pretty useless in the "Real World"(tm).
Disclaimer -- when I was grading programs (back in '83/'84) I busted a few people for copying...
Fascism starts when the efficiency of the government becomes more important than the rights of the people.
Here at the Rochester Institute of Technology, I am an undergrade freshman Computer Engineering Student, and one of my CS2 professors here has developed a program more advanced than this one at Georgia Tech. It actually bypasses source code variable names and order, and deeply analyzes the structure of a program, and how the program operates. Much better anti-cheeting program, and has huge success rates. It still catches false cheeters, though. It's Not perfect. My CS2 professor brags about it.
They started using this in the CS Department at VT in '94. Not only does it compare you to your fellow classmates but it compares to its history of programs it's read... so you can't use the same code someone else used years ago if the project ever came up again. It caught about 60 out of 300-400 people the first time it was used. What ever happend to "working as a team"? hehe.
"Times may change, but standards must remain the same." - George Carlin.
When I took a lot of creative writting courses in college, the topic of plagurism often came up, often in the context of "stealing" a single phrase or idea. What I found was that people who were good writers were incapable of cutting and pasting someone else's prose. They'd wind up teaking, twisting, and "improving" it. By the time they read it aloud to the class, you couldn't recognize it at all.
Then, of course, they'd get all offended that you didn't realize that it was a Hemmingway reference or something. Human nature, I guess...
"Prepare for the worst - hope for the best."
How much is cheating? Where is the line drawn between collabration and cheating?
If I give code to a friend, then thats easy. But for example, what if my friend is having a hard time working out a program, so I show him mine and explain, 'Look, I used a loop here to search throught the structure then used a iterive procedure yada yadda' and he goes away and codes his practical using my outline?
...in a few of my entry-level CS classes where on more than one occasion my work looked remarkably like others in the class, including that of the instructor that also work the assignments during the same timeframe that we did. I didn't copy and neither did anyone else. It's just that the instructor grilled steps into our head on how we *should* program something. When all was said and done, our source had a certain resemblence to one another. On one of the hardest assignments in the class, a calendar, my source was almost identical in parts to the instructor's. We even chose many of the same variable names. I can easily see how 2 or 3 people out of a class of say 500 could have very similar source. Can't you?
Well we got every assignment in the last 5 years on file. We run hacker (which produces functionally equivalent programs = renames functions and vars) and the like (rotate ORCAD schemes) and got a cgi script that does most of this automagically.
Those dumb enough to carbon copy stuff deserve to be caught.
Cut'n'Paste is evil. Real computer science students should know how to use wizards to write their redundant code.
I hope I don't sound like an old fart when I say this, but if you have to cheat at beginning programming class, you should re-consider whether CS is the right major for you. Seriously.
My freshman year, we got Pascal I & II. I might have been one of four or five people in a class of 15 that wrote all my own code. I mean, it's fucking pascal. If you can't grok pascal, maybe it's time to change careers. (Sorry if you had trouble with pascal...I'm not saying I'm a god but I never had to cheat to get my homework done.)
All this talk of Pascal makes me nostalgiac for the goofy "everything is one big nest" code. I bet google could find a pascal compiler for Mac OS X...
Who did what now?
Geez, Taco, wtf? The classes in question aren't the real world -- they are there so you learn how to program, not how to collaborate. Perhaps instead of allowing posters to add their own comments to submissions, they should be forced to attach their witty or insightful comments they same as the rest of us.
Your quip turns the whole focus back onto the school -- they are the ones doing something wrong by looking for cheaters, instead of the cheaters themselves. Whats next, complaining about the fact that the assignments are graded at all?
Cuz remember surgeons: in the real world you will have someone assisting you.
Cuz remember pilots: in the real world you will have a co-pilot.
Granted, they were a breeze if you had any coding background at all, or just happened to be someone who thought in that style. But the intro to computing class is required across the board for all majors at Tech. I think when you have your average Management or Architecture major in there trying to decipher the fundamentals of Scheme, you get a lot of clueless people who would definitely not consider the class "cake." Those were usually the people all crowded into the local computer whiz's dorm room, copying and pasting for all they were worth.
I once worked for a company (circa 1990) that was being sued by a competitor for stealing their source code. They did a line-by-line comparison of the codes involved and found what they determined to be a "significant" number of matching lines. Of course these lines (in FORTRAN) were 99% the typical comments, declarations, DO statements, CONTINUE, FORMAT, RETURN, etc... that occured in most programs anyway, or they were in routines commonly derived from the NETLIB or HARWELL source libraries. They figured a judge and jury wouldn't understand the issues involved, so the burden of proof (or clarification) fell on the accused.
This is really anything new.. I live in IL and I've heard of one of the larger state schools doing this.. and I've heard a lot of the younger professors describe this exact same thing from their days in the university..
whats sad is the amount of false positives produced by these things..
I'm just glad my school doesn't implement something like this.. it would be more of a hassle than anything..
It seems to me that good developers are those who can
:-)
*) develop code on their own
*) collaborate with others to produce code
*) apply code that has already been developed to a new project
And the job of a CS program is to produce people who can do all three of these things.
In my experience, collaborating and applying pre-developed code is an important part of development in The Real World. It's pointless to reinvent the wheel if someone else has already come up with a workable solution. That's why there are websites like the Access Web.
It's also seems to me that it's fairly likely that people will create similar looking (and even identical) code independently, especially for basic programs, given the push for coding standards.
Now, none of this suggests that Georgia Tech is wrong to use their cheat-finder program, but I'd be wary about relying on such a program as your sole gauge as to whether a person is cheating (or, more importantly, if it's your sole gauge to determine if a person is *learning*) which, thankfully, doesn't sound like it's the case at GA Tech.
In response to the comments about bad developers, it seems to me that if people are getting CS degrees (or any kind of certification) without the basic skills, the problem isn't just that they're cheating; the problem is that the institutions that are granting those degrees are not teaching their students the right skills and/or are not testing those skills rigorously enough. And simply saying "write a program that does X and doesn't look like anyone else's" is not enough, IMHO.
But then, I *don't* have a CS degree, so what do I know?
-- D.
Well of all the things to mention about Georgia Tech, this one is pretty lame. I know Georgia tech is Squeak-central and they have some amazing stuff.
http://www.SqueakLand.org/learn/university.html
Wonder if this system is written in Squeak?
Anybody know?
Comment removed based on user account deletion
I acutally had the problem of a co-worker taking a bunch of work I had done to the boss and claiming he did the work.
The funny thing is the guy was so stupid he didn't understand that there would be backups that would show who created the files and when.
Here's the flip side of Cmd Taco's comment:
Manager: Hi Sally, this is Bill, our new hire. He graduated without writing a line of original code and is totally clueless, despite successfully litigating against Georgia Tech's use of a cheat detector. You'll be doing all his work as well as your own.
Sally (pulling hair out in clumps): Arrrrgh! Damn you Cmd Taco and your smug comments!
Bill (puzzled): I thought it was okay to consult with co-workers?
Curtain closes.
This mini-software-drama has been brought to you by Beatrice.
Anonymous Kev
Proudly posting as Anonymous Coward since 1997
Here at Iowa State, we have been using something similar for our two data structures classes CS227 and CS228. Most any other class in the department where people turn in code, the code is run through the same kind of thing. And not just exact matched.. but % different between certain ones (only 5% difference raises suspition).
As far as I know, this process has been in place for at least 2 years now, maybe longer.
Its not what it is, its something else.
I was a TA for an undergraduate course. It was almost 30 years ago - ~1974. We wanted to identify assignments that had been copied. Sometimes variable names were changed, sometimes not. The compiler was hacked to produce a signature derived from parse tress. Graders looking at assignments could compare the signatures. The prof simply announced that independent work was expected and showed the statistics. A few people needed more personal explanations. By and large the exercise was carried out with good humor and good taste.
Word of advise. Don't cheat on your last assignment in your last course of your degree.
I'm going for a BSCoE at KU, and in our programming classes last semester, we were told that they used software that basically did the same thing. I don't know how long they've been useing it, and I didn't hear of anyone getting caught. I think that this is just the first instance of it CATCHING people.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
So, we've now proven that in the real world, you can't very well "cheat" off a coworker becuase they're doing something different. You could reuse code, but that doesn't count either. You can ask for their input, but you can't pass their work off as yours. Try that and see how long you last (probably about as long as those cheater students).
Too big to fail? Does that make me to small to succeed?
you trolls are startin to get desperate for new material it would seem ;)
I've come in and had to clean up behind lots of other programmers who didn't fucking know how to code. The one benefit to this crappy economy is that these people will no longer be able to job hop every 3-6 months due to the shit hitting the fan because they don't know how to program. Oh, the horror stories I could tell...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
This is just another reason why CS departments are going to hell. This method of catching cheaters is wrong.
In entry level courses, all the assignments are going to be almsot completely identical for students who have never done any programming (or little) before. And colleges are full of these people, chasing the promises of money.
Think about it. A group of students all have the same point of contact to learn naming conventions, data structures, problem solving techniques, and algorithms. Since they all have the same point of contact, their solutions will all look really close. On a sufficiently small project, even as big as a couple hundred lines, there is a large chance that 3 or 4 students are going to produce relatively indistinguishable code.
An exact match would be cause for concern, but a cheater won't have an exact match anyway. So there is a flaw. Anything really close to an exact match is to be expected, and exact matches aren't likely from cheaters.
There used to be a trick that a junior high teacher did with his students. Didn't catch cheaters but made it hard to cheat to begin with.
Exploiting a "feature" in Apple's basic, he set up a program that could "poke" in the values for lines numbered above 65000. Essentially setting up a template with the students name that all programs had to be turned in, looking like. Without "poke"ing the values around (which he didn't teach), you couldn't change those lines.
If you wanted to cheat, you would have to retype the entire assignment into your own template.
Not full proof, but for that level, it was pretty effective.
My school (University of California, Santa Cruz) as well as many of the other UC campuses have had this sort of software in place for years. Ours checks for code where only the comments have been changed, code where the variable names have been changed, or code which is just flat out identical. It keeps archives of past assignments for all classes, and checks submissions against the entire archive too. It may also do other stuff, but those are the few details I've gotten out of various professors.
I go to UVA and this program was put to use and modified to check term papers for identical strings of words greater than 6. Two semesters ago about 145 Honor Charges were brought against students in a single class, leading to quite a few expulsions. The program isn't perfect, since it still raises a flag for quotes and bibiography, but it seems to work in exposing an existing problem.
----
Striving to put right what once went wrong, and hoping each time that his next leap, will be the leap ho
....who thought he would cheat by copying someone else's code.
But he was pretty paranoid about getting caught, and realised that a verbatim copy wasn't enough, you'd have to change the variable names, comments etc.
So he did some research, wrote himself a little parser that read in the source code and built a parse tree of the program. He then wrote another function that spat out all the code again but with different spacing, block ordering and some simple variable renaming (e.g. x,y,x->a,b,c)
To make sure the structure of the code didn't give him away, he wrote a few code transformations, e.g. if a then b else c became if !a then c else b. The order of non-conflicting assignments were swapped, and mathematic expressions were re-arranged (sometimes actually optimising the original code in the process!).
Still wasn't good enough, the comments needed changing and the structure of the code looked the same. So he linked in a thesaurus and NLP/AUG engine to change the words in a meaning-preserving manner. Same principle could be applied to the more complex variable and function names, so buildTree became makeStructure etc.
Finally, to put the icing on the cake he modified the program so it could output the code in a couple of different functional languages. Made the plagiarism almost impossible to spot.
Best programmer I ever met.
I admit, I cheated on intro to VB! I stayed out drinking almost every night right before the program was due, so I schmoozed up those nerdy chicks to give me the code so that I wouldn't fail. Honestly now, why waste time coding VB, which I never use, when I could be out partying, and drinking, which I always use? Cheating in VB is just as much a part of getting a well rounded college education as getting laid is!
This is nothing new many univerisites employ such automated scripts. My London university has been doing this at least 7 years and probably before I started.
When I was in college learning programming, we all started with the same basic assignments. These consisted of, "Type this in. Watch it work." And later, "Modify this by changing X line to read Y." For a long time, I tended to use the same generic variable names. I remember comparing my code to friends in the class, and many times sections of our codes (functions, procedures, etc.) would match word for word, even though we hadn't even discussed the projects among us. I don't think this is going to be taken into account here, and I shudder to think what will happen to the education of young programmers.
I think it is even more interesting that this happened, considering I had started programming about 9 years earlier. The elementary school I was in in 3rd grade started teaching us basic programming on the TI-994A. I miss that hunk of junk. I think I'll go look and see if I can find one, just for nostalgia.
It's easy to stand out when the general level of competence is so low.
Alex Aiken at Berkeley came up with something like
this years ago.
We have been using a system to detect cheating for years---it started before I got here. The one we use is Moss (from Berkeley). How Moss works? I'm not sure, except that it does examine program structure, at least to an extent. I can comment on how it's used.
In an intro-sized course, 200-400 students, it's impossible to check the programs by hand, especially when they are graded by different TAs. Moss is very useful as a first pass in detecting cheating. When Moss flaggs a pair of assignments that are very similar, we examine them by hand and make a judgement.
If there's any error the process errors on the side of the student. If there was plagarism that is not caught by Moss, then the students will probably get away with it, since the chance that the TAs will discover it is small (although it does happen). No accusation of plagarism relies on Moss---Moss is only used as a tool to narrow the manual comparison process.
The purpose of a programming class is different than a commercial software project. That does raise the interesting question of someone reusing code from an open source project as part (not all) of a programming assignment.
I teach an introductory, pseudocode-only programming logic course at Pierce Community College. Enrollment is usually 50-60, with typically 25 students completing the course. This is a public institution, fees are $11 per unit, and about $38 for enrollment and night parking, so we aren't ripping people off.
I assign 10-12 pseudocode homeworks per year. At the beginning, as many posters have noted, there are few logical ways to implement most of the algorithms. So cheating may occur, but I usually only catch it at this point when they repeat a unique blunder. This is all graded by hand.
As the algorithms get harder, the cheating becomes more obvious. Also, the number of submissions drops, so similarities are more pronounced. (In this class, a harder algorithm is gymnastics scoring for 75 contestants, file input, discard high and low from ten scores per contestant and calculate average, show 1st-2nd-3rd, use subalgorithms and arrays, et cetera - this is not intended to be a very rigourous course.)
Punishment for cheaters is usually to note the solution similarity to all parties involved. Once is usually enough.
...then I taught a fourth-semester OOP course. By the third assignment, it was easy to spot the - um - excessive consultation. (As a more real-world experience, the last project was a team effort, with only one set of code submitted for the team.)
For those who have not taught, and haven't graded 60 papers a week from beginning students with varying degrees of aptitude and comprehension - an automated system like MOSS can be a great aid. In particular, it lets the instructor concentrate on what each student is comprehending, and I know that I need this feedback weekly.
Let them cheat all they want, it won't help anyone a penny when it comes to Exam time. If you can ace your homework, but fail your final.. and still actually TAKE the Exam .. that should be factored into your grade.
I'm not sure I've ever taken a class where one could outright FAIL the Exams and still pass the course.
Damn, we're old, aren't we :)! I think most of those MS BASICs wouldn't allow direct entry of lines above 63999.
Another proud carrier of the $rtbl flag
Although this kind of software has existed for some time, it is to be taken with a grain of salt. Any instructor who blindly points the finger at the 'cheaters' this software discovers, is opening himself and the institution up to a barrage of lawsuits.
I've spent many evenings correcting C++ homework for a fellow teacher (who doesn't know squat about C++, of course). In the first few weeks, about half the group was copying off each other. Some of them were sneaky enough to swap lines of code around, or change a few variable names, but with a little thinking it was obvious that the code was identical. It was most obvious when those little modifications resulted in code that didn't even compile. Of course the students would argue that they did not copy, some cocky fucks even had their parents call me to personally insult me (it was a prestigious college, lots of the parents were ambassadors and/or wealthy business meatheads). Needless to say, we had them kicked out of the course very quickly while the college's mgmt handled the legal threats.
There were also some cases that looked similar, yet once you grokked the code you could sense the subtle differences in that particular student's reasoning; those cases would probably turn up as false positives using cheat-sniffing software. In such a case, being sued would be a very bad thing, since the odds are on the student's side : punitive damages and bad publicity are a great way to destroy an organization.
Finding cheaters can be easy, just as it can be devious. I wouldn't trust such a task to any dumb software.
-Billco, Fnarg.com
I must have gone to the most enlightened university in North America going by the responses by most of the crowd here.
First of all, go look at MOSS before you call all cheat-detection tools copies of diff. For my university's more rigourous classes, they've set scores VERY low (I've been told about 20-30% of a perfect match) and only 5-10% of the class meet that. The majority of that group were dumb as a rock - cut and pasting ALL code, INCLUDING comments, etc. There was even a case where twins were in the same program, same class, and they used the SAME code in a major assignment. Didn't even change the documentation to reflect the "new" author. There have been false positives, but they are rare and the department errs on the side of the student if there is reasonable doubt.
Like a few have said - these are tools and should be used accordlingly. No one who wields a scapel is qualified to perform surgery, nor should one tool be used to perform cheating tests. A school of any size gets lawsuits for everything. Students can appeal just about anything. Why would they suddenly trust one source of error when students can be ejected from school?
Consider the alternative argument: many students ARE cheating. Hmmm - a course forced on most students, CS degree or not. Many prior students take the course, probably got away with quite a bit.
Collaboration is one thing. Cheating because you think you can get away with it - oops, not anymore - is another thing. Flag the code matches, look at it with your eyes and ASK THE STUDENT. If they can't answer a single question about a 200 line program they wrote, the odds are slim that they wrote it. (maybe we can finally use drinking as a defense though!)
Make sure punishment is swift, certain and (relatively) severe and in a few years, students will either a) do their own work or b) the manditory course's workload will be reduced.
This is the stupidest software I ever heard. If ya ask me, the best way to tell is to just read the code!" The teacher looks at the code with his/her own eyes anyway right? Humans can see algorithms, software can't.
python >>>
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))
I go to school at UVic, in BC... They have been using a system like this for years.
There is absolutely nothing new about this.
The University of Nebraska-Lincoln cs department has been using something like this for a few years now. They call theirs handin and it handles all programming submitions and checks them for cheating patterns. Big deal...
user@host:/usr/bin$ whatis
java: nothing appropriate.
Too bad they didnt have this when George OLeary first became coach, they could have ran it on his resume.
Here's 7 lines of C to reverse an array. The assembly would be more or less identical. I don't feel like dredging up my memories of 8086 assembler... it would probably end up screwing up my Perl for the next hour or so :-)
int list[] = {0,1,2,3,4,5};
int i,j,len=sizeof(list)/sizeof(int);
for (i=0; i < len/2; i++) {
j = list[i];
list[i] = list[len - i - 1];
list[len - i - 1] = j;
}
Reversing a linked list would be marginally longer, but a doubly linked list would be just as short or shorter than this. Only a real novice would take 100 lines of C to do it. BTW, how could you possibly learn assembly before learning what a stack is?
* And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."
How does this comment possibly deserve score:5? His quote was as follows:
;)
>> Cuz remember programmers: in the real world you are fired if you consult with a co-worker
He then went on to discuss how in the real world you are fired if you "steal code from someone else without their permission". How is this even remotely similar to "consult with a co-worker"?
In my life experience, both with 10 years of programming, and years working with creative/future problem solving tasks, adding a member to the group often doesn't just increase output linearly, but exponentially. Bouncing ideas off someone else is a great way to help with your own creative ideas as well. And that's what the poster of this article meant by "consult". He didn't mean "steal".
Also, on the vein of this cheat detector, compare programming to math. In math, there is one solution. In programming, as long as the requirements are strict enough, there is only one requirement as well, for example, a program accepts user input, parses it, and displays it in a precise manner to STDOUT. There are only so many ways this can be done. Just because two students happened to use the same algorithm doesn't mean they're cheating.
Unfortunately, a certain degree of trust in the standards of the students is required. I'm not going to say which one I'm from, but there are an awful lot of people who are here because of the fact that their parents had a lot of money and that the only way little johnny was going to get ahead in life is if he goes to university.
I don't want this "cheat detector", but I'm caught in a Catch-22, because I don't really think most of the people in my class should be trusted..
So this is why I wouldn't compare structure (ie, what the algorithm is implemented), but I would compare the implementation. Actually, I have a feeling this would have more to do with mathematics (statistical analysis of filesizes or something) than coding.
How do you think bugs spread in m$ programs? Cut and paste and cut and paste...
Imperium et libertas
Autocracy and freedom
People ahve been doing this for years. we use MOSS (measurment of software similarity) which was writen by alex aiken at berkeley. As for the editorial comment I think that there's a little difference between asking a coworker (or another student) for help and copying code directly from another source. But everyone's entitled to their opinion.
--aiee
+1, piss-pants-hillarious-'cause-it's-true! ;)
I went to CMU. I saw a fair number of folks get "convicted" of cheating...
Take the case of friend of mine, Doug: His program was considered "simiar" to another students. It didn't matter that he hadn't cheated. It didn't matter that he was far, far above the bell curve, even for CMU. It didn't matter that the class was not required for his degree -- He was taking it only because he wanted to learn more.
His code was sufficiently similar. That was enough. His choices were to accept a forced-failure mark for all his hard work, or fight it at the disciplinary committee -- a well known rubber stamp group -- who would kick him out of the school.
He opted for keeping his degree.
Sooner or later, somebody will have the balls to hire a lawyer... I'm hoping it's sooner!
Let's see, so the point of this program is what? To teach every one to write their code in a different manner than everyone else, so only they can read it and understand what's going on?
My entire CS class would have failed if this had been used. Why? Because we were taught to all use the same indentation system. And how do the semi-colons get put anywhere other than where they're supposed to be? We were all told to use the same basic setup and variable nameing scheme so that any one will be able to read your code and understand it. And I can't tell you how many times we borrowed functions from eachother. Or how many times each person in a group wrote one part of the function and it was all combined to a final program, that each person copied and turned in.
But ofcourse real programmers never trade ideas or code. And everyone does everything uniquely, so no one should have to worry about this at all.
Could you imagine running this on something like GNOME and KDE? We would discover that both were written by others.
T Money
World Domination with a plastic spoon since 1984
If we're talking about code that solves a novel problem and does it in a new way, we're also talking about "minor changes" that MAKE IT WORK! I don't fear the programmers who copy others' work. I fear the ones who think they have to write every piece of code they use themselves. (wheel meet reinvention, maintainer meet torture)
Now, I expect a professor aught to be able to tell the difference between students doing what was assigned and those not doing it. But I wouldn't expect their code to be able to make such a distinction un-aided. Regardless, homework policing is beyond what a university professor should have to do. Do the work or don't, it should be the tests and unique projects that decide your merit. Perhaps professors should be working on getting to the unique stuff sooner if "weeding" is a problem. Or, perhaps we should have grad students teach smaller CS intro classes, instead of professors teaching huge ones...
Cuz remember programmers: in the real world you are fired if you consult with a co-worker ;)
In the real world at least one programmer has to know the answer.
My friend Allen wrote the original cheatbuster in 1993. He described it's workings as
:)
"My version worked by compiling the students' Pascal source code into assember source, which was then diffed. The compiling process got rid of variable names, function names, and a lot of the little variations, and left mostly structure. We then diff'd everyone against everyone else, and output a list of the smallest diffs. All of those programs were then examined by hand to see which ones had real evidence of cheating."
As a freshman CS major in 1993 and 94, we heard stories from the TAs about the powers of cheatbuster. They ranged from 'a modified diff' to to something involving blood rituals that let professors see the guilt in your soul.
CMU 15251 Course Document and Cheating Policy
His policy encourages collaboration and specifically forbids cheating. It itemizes various types of cheating, for example copying from another student, letting another student copy you, and looking at someone else's files online (even if they forgot to set their file permissions).
Furthermore, he requires all of the students in his class to sign a statement saying that they have read and understand the cheating policy. Not only does that discourage some students from cheating, but it also makes it much easier for him to get students into serious trouble with the school when they are caught.
In addition to the course document, here's more or less what he had to say on the first day of class: (I apologize for paraphrasing; this is how I remember it) "Nobody plans to cheat. You all must be very smart, or you wouldn't be here. You think you're going to try hard and do well in this class. But later in the semester you'll get busy with other classes and activities, and all of a sudden an assignment will be due in one day and you haven't started. Or you'll be taking a test and realize that you forgot to study an important equation. Or you'll work hard on an assignment and almost completely get it working, but get stuck on one subroutine. Even though you never planned on cheating, all of a sudden you'll find yourself in a circumstance like that and it will seem tempting."
(BTW, I shouldn't have to say this, but Prof. Rudich's cheating policy is copyrighted. If you're a teacher or T.A., don't copy his cheating policy without his permission. That would be just as dishonest as cheating!!! If you want to use it, contact him and I'm sure he'd be delighted to let you use it, as long as you give him credit.)
Total: 10 points
I don't see why such a big deal was made out of this. This is old news really. Most universities use cheating scripts to determine if something is wrong. My CSE 142 class at the University of Washington (that's the lowest CS class there) used one in the automated homework turnin. (Yes, for the first assignment, 90% of the class copied off eachother). It seems like it's only a big deal because it sounds like Georgia Tech is the last school on the planet to develop one or something.
Hrm, now if their cheating script looks like another schools cheating script, do they get in trouble for cheating?
Just wanted to let you guys know.
Drexel University has been using a similar cheating software for about a year now. Drexel's checks for minute similarities between all pieces of code submited and then returns a percentage for the chance that that code was copied. I really don't see what the big deal is.
But my biggest worry about programs like this is what would happen if someone's code was stolen? This happened to me in one of my early CS classes and thankfully nothing came of it. (Now where did that disk go?) I found out later that over ten people turned in 'revised' versions of my program. Now how would the real programmer prove that out of all of the 'cheaters', that he/she turned in genuine work?
Hopefully you and your coworkers are working on the same exact code. That's what version control is all about.
"Cuz remember programmers: in the real world you are fired if you consult with a co-worker ;)"
since when does education have any connection at all with the real world?
This just goes to show that Profs are really just too damn lazy to ACTUALLY look at the code they grade...
..." when executed... LOL
Oh many ages ago when I was finishing my CS degree, I had an assembly course - I *KNEW* people were stealing my code from the trash because they were too stupid to write their own...
So I planted stuff in the trash that would print out "I was stolen from
On the lighter side - CS Dept's have been wrestling with this issue forever. They've concocted scripts written in a zillion different languages - but for the same reason that we can't filter out all the spam, they will never be able to detect all the cheaters...
Once the detection mechanism is known it's trivial to avoid - even if your program is "the same" as someone elses...
My advice would be for CS depts to actually have people with a brain teaching the courses, and have as many assignments as there are people in the course. That is, if you have 20 students, you have 20 different projects - each project is designed to test the same CS concept, but each requires a different program to be written. This would ENCOURAGE thought on the part of the Prof, and the students... Alas, it's too much to expect...
Except its called MOSS
Measure Of Software Similarity
And, yes, I know it's against the interest of the AC that posted his account of pay-for-code to help me out here, but I thought I'd give it a shot.
"Prepare for the worst - hope for the best."
Many of the students caught "cheating" may have been applying an Extreme Programming technique called "pair programming".
I just finished up an intro class in CS, (creativly named "Introduction to Programming"). I had so many oppurtunities to cheat, but wouldn't learning the concept I'm paying $1500 for be more worthwhile? Cheating teaches you nothing. You might pass a test, or an entire class, but if you rely on cheating in college, how will you survive when your job is on the line? Copying someone else's code off the net sounds just like copying an essay off the net and changing the words around and handing that in. If you want to program, then learn to program your own stuff. If you have a hard time programming simple stuff, maybe you shouldn't be in CS. Put the effort of cheating into where it's more acceptable, like political science.
There are three types of programmers:
1) collaborative coders
2) solo coders
3) adaptive coder
4) copy coders
Guess who's getting the most done?
Hammer of Truth
Whoa... they use such a "cheat" detector in their O-O course, yet teaching O-O teaches "reuse". Well, I see a dichotomy there, even if others do not. If I am reusing my own objects, fine; but reusing others.... wait, isn't that how the real world works? It does in my organization. For Christ's sake, anything in the world of source code can be construed as "non-original". Only some new algorithm would hold under scrutiny as "original". Don't teach reuse if you are going to scan the student's code for reuse .
This was used for two undergraduate classes: "Introduction to Computing" (required for any student in the College of Computing) and "Object Oriented Programming" (required for Computer Science majors)." Cuz remember programmers: in the real world you are fired if you consult with a co-worker ;)
They use it for Object Oriented Programming class, yet the entire concept of OOP is code reuse. That's a contradiction. It would mean If program A is: class foo{ main() { cout "foo" endl; } } Then this program will get you expelled. class bar{ main() { cout "foo" endl; } } And this will give you an A. class bar : public foo;
our profs at vt use MOSS to figure out if we are cheating or not.
The only people that i've seen caught with it, though are the really lazy people(those who just copy their program without even attempting to make it look like their own)
Got Freedom?
Thinking?
First of all, did anyone else actually read past the title of the article? The article is about an astonishing amount of cheating in the classes, not the amazing advance of cheatfinder. Nobody is claiming that cheatfinder is original, they even mention similar things for English papers in the article on Yahoo. However cheatfinder does have some advanced portions to it from what I've heard, however I have not actually seen it run nor the code.
Cheating is a problem in the two mentioned classes. The point of the programs is to help teach the material for the class. Understanding what an AVL tree is and how to implement it by glancing at lecture slides doesn't work for most people I'm willing to bet. With the programs, the students have some experience in everything taught in the class (or at least the main material, a few offshoots aren't given as assignments). If people are cheating on the programs, they aren't learning the full material they should, and may be able to slide by just by knowing some key words on the test. (Example question: which is a heap and which is a BST...) If you want to work with someone who has a degree in computer science but can't program a binary search tree, then there's something wrong with you, cause your boss isn't going to notice that you have 100% working code and theirs does nothing, instead he's going to notice that the two of you only have 50% working code. Have fun getting yelled at.
Also, in addition to running cheatfinder and finding similarity, all of the suspected students will go through a formal trial system where they can defend themselves against the charges (no lawyers or anything). I'm not entirely sure how the trial system works, but I know that it is fair to the students and those who are innocent will be able to demonstrate that.
We've had a similar program here at the University of Bristol for years!
the University of Utah also uses one of these to keep students from "cheating". in fact in my last cs course 5 people were kicked out of the major because their code was "too similar" it wasn't even exactly the same, but they had too many similar styles/variable names. Of course, if anyone was thinking that college actually prepared you for the real world, they were always wrong, but whatever, I didn't go to college expecting that so I'm not suprised that its not preparing me for it.
This would be a nonexistant problem if profs would just assign different assignments to everyone in the class. Oh wait they want to have the simplicity of one stlye of project and not have the hassle of checking for cheating.
As you may have read, bestselling historian Stephen Ambrose was recently caught having lifted sentences and even passages from other sources, and passing them off as his own writing in his books. (While he mentioned the source books in footnotes/endnotes, he did not put the cribbed text in quotes.) At least four different Ambrose books have now been shown to have the same pattern of lifted, unattributed passages.
These instances only came to light because an author of a lifted passage noticed it while reading Ambrose's book. Subsequent episodes came about because other authors started looking, and now some people are checking out new likely sources; this works because Ambrose only lifted passages from books that he admired and heavily footnoted (at least, so far as we know!).
Perhaps Ambrose was really just lazy, as he was fairly open about crediting others for the ideas (he "just" failed to credit them for the words, too). There are many cases of sneakier plagiarism than that, both in academia and in journalism.
So, class, the programming problem for today is, given the text of two books, spit out the most likely candidates for lifted passages, based on length and similarity of words. You get a B if you can do this for exact, verbatim matches, an A if you can do it with individual word substitution, and an A+ if you can recognize re-ordered clauses. The end users for this tool would be 1) authors everywhere who want to protect their own writing, and 2) journalists looking for juicy plagiarism scandals.
We (it was a project for the grad students) to write an online project submittal system that included a system to address cheating. We checked for white spaces, variable naming, complexity analysis, etc...
of course, with simple programs there's the issue with triviality (e.g. "Hello World") and with those types of programs we just flagged them and left them to be dealt with by the discretion of the professor or TA.
I think that these types of programs are good, but for more complex programs. Honestly enough, there are only so many ways to do a "Hello World" program when you're dictated a structure to use and what features of a programming language to use. (e.g. pointers aren't ususally taught until the later end of first year programming classes)
So the idea is hardly hot news.
As a graduate of Georgia Tech's Computer Science program (BSICS 1981), I can tell you this has been done before now. I was an assistant for the Survey of Programming Languages class and as such graded programs for the course. I found several duplicate LISP, PASCAL, and SNOBOL programs, down to the placement of which columns that parens or semi-colons were in, that also had duplicate comments. Those I found simply because of recollection of what they looked like in the prinouts. After that, some simple "diff" runs on the submitted files turned up others that I might have missed had I looked only at the printouts.
It amazes me that people are dumb enough to think they won't get caught. Some of the folks that submitted copies even forgot to take out the comments that included the name of the person they copied from!
You've never been to Villanova University, where collaboration is punished by death at all undergraduate and graduate levels in the Computer Science program.
Back in school, a cheesy community college, we had one professor who always used a program called SHERLOCK.EXE to compare two files. She thought this was an infallible method to catch cheaters. We, as a class, chose to mess with her mind one day. We all submitted the same program for a weekly project, but we GREPped the hell out of the program, changing variable names willy nilly and rearranging the procedure calls. Then to add some insult to injury, we added superflous comments here and there. Wahoo, the program did not flag a single instance of "cheating". Sure the include files/libraries were the same, what could she expect? I personally had one include file that contained all the required libraries. The layout was identical for all programs, being a requirement. Wubba wubba. Exactly how many ways can you execute the same algorithm? Gazillions, or 34 to be exact. Just to let you know, we told her about her GREAT program and it's limitations. Bummer for the next class, but my group got decent grades. Yes the program worked per specification. Ho hum.
So what is this "new technique" that lets them detect cheaters? diff file1 file2? Sheesh.
I mean, it seems you are in college. How did you get admitted without being able to spell? Just wondering.
They've been using something like this at the University of Michigan for years.
For each class there is a database of every student and teacher solution to the assignment going back several years (maybe all the years they've used electronic submissions). It checks program stucture, etc... so simple changes of variable names will still be caught.
Thing works pretty well too. (but yes, I have seen ppl beat it.. Most get caught though)
Maybe they should focus more on writing code that detects someone cheating the hiring process and/or discrepancies in media guides.
Sure, CS100 classes and maybe the second or third programming classes a student takes don't need collaboration, though many students will need help (i.e. teaching, even if it's not done by a professor) in understanding what's going on with their programs, why they're not working, learning to use the cardpunch (:-) or editor or whatever local facilities you have. But by the third or fourth programming class you're taking, you should be doing things that are large enough to require group work, whether it's writing a small operating system or writing a simulation of some complex activity, because most real projects are too big for one person to do.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
A decent program doesn't just do a String compare, it checks the structure i.e. if you change variable names and minor details it'll still catch the duplication.
Of course, why this is terribly relevent when the program was written several years ago and plenty of other places have implemented this sort of stuff since... not my place to say :)
********************
I object to Intellect without Discipline.
So how exactly does consulting with a fellow student (or co-worker) result in both parties having identical code?
I have to say that this is the most ignorant ... well, you get the idea:
John: Hey, Bob, you got a sec? I'm trying to get this stupid utility done and it's not working right ...
Bob: Yeah, sure ... hmm ... hey, I wrote something just like this a while back. Hang on, lemme go look for it.
(a few ls's, cp's, and cc's later)
John: Cool, thanks!
Granted, school and work have different goals (learning vs. getting things done) and this argument doesn't apply to students, so I agree with the point that students shouldn't be copying without actually thinking about the assignment, but for the same reason it's absurd to say that co-workers shouldn't just copy code either.
Dumb cheater like that deserves to have you take his money and give him an Obfuscated C Code program that
prints out "you fool, I AM the TA".
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Besides which, this a boring repetitive task, exactly the kind of task computers should be used for. Why check for cheating manually which takes hours and is really dull if a computing can do the work for you?
No-one is suggesting that computers replace marking work, just that they ease the pain a bit.
A student came up and said "I've written all of the assignment, but the compiler is broken." My friend looked at the error output from the toy teaching language compiler.
"Unknown keyword 'From:' in line 1 'From: student2@cs.university.edu'"
"Unknown keyword 'Subject:' in line 2 'Subject: Assignment 2 answers'"
The student tried to insist that it was all his own work.
GUGC (Griffith University Gold Coast) had a similar thing in place 10 years ago. It was used for COBOL assignments, but could be adapted to other languages. It compared literal blocks of text, parse trees, variable names etc. I'm pretty sure Colin Thorne wrote it. You could take a program, cut it into sections, remname large chunks, move things around and it would assign a correlation with other works in the class.
You could then set 'honeypots' up and see if there was a pattern of collusion or plagiarism.
I was also thinking, for some types of projects wouldn't the code come out quite similar if it was done correctly? If you are teach C++ classes, and were implementing some simple program to test the knowledge of the students, then a correctly written program would be very similar to another correctly written program. Otherwise you are not teaching standard techniques very well, imho. A shell sort is a shell sort, not much difference between properly implemented versions.
what if the students have used c/etags? Then there screwed!
The university I go too have been using a system like this in it's introduction to algoritms course for about 2 years. It doesn't just check for exact copies but will detect copying even after minor changes. It also looks at the size off the code, because the probability of similar programs is higher for shorter programs
Sorry I meant indent, its been a hard day! :)
Sometimes the formats are the same because people use automated tools, like indent. Somebody else mentioned EMACS automatically formatting code.
And if you develop your code in traditional newbie style - write some code that doesn't work, hack on it for a while until it actually does work, and send it in - then you probably should run it through indent or your favorite language's equivalent reformatter anyway.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Couldn't you get around this by adding extra functions or something... say your program is this:
// do work
//do work
// calculate a factorial, recursively :)
;). But the end result is the same; the program prints "Hello, World" and exits with status 0.
int main(int argc, char **argv){
printf("Hello, World!\n");
return 0;
}
you could change it to:
int work(int argc, char **argv){
printf("Hello, World!\n");
return 0;
}
long int factorial(short int n){
}
int main(int argc, char **argv){
int r = 0;
factorial(16);
r = work(argc, argv);
return r;
}
--
Would this get through the cheating detector? With that added factorial and work call, it doesn't look at all the same (and it uses more CPU
And if your teacher asked "why the extra factorial", you'd say "i just wanted to try recursive functions! The program still works, right? And it completes the assignement."
And she would say "ah yes! what a wonderful student."
Heh heh heh.
My other car is first.
So Georgia Tech has an (arguably) sophisticated system for detecting cheating. You'd think what they really need is a decent system for detecting lies on resumes.
For those who don't follow - the former head coach of their football team (George O'Leary) lied on his resume, including degrees from schools he never attended.
Lemme tell ya, if you're not going to hire me simply because I don't know what malloc() does, then you can keep your job.
If you're finding that 95% of the candidates you talk to can't write a sort routine, and that you're having to fire people because they can't "program simple stuff on thier own," then you might want to re-think your hiring process. Sounds like you're getting all the wrong people.
Dear professor Andrew S. Tanenbaum of the free university of amsterdam and his colleagues already used this kind of program to check the hand-in programs we students wrote in modula-2 back 8 years ago or so, and actually it was kinda a big thing in dutch news back then. Guess these guess just reinvented the wheel.
so
The point of the Yahoo! News article was not that Georgia Tech, like many other universities, is using cheat detectors, but that out of 1700 students last semester, it nailed 187. That's around 11% and seems to be news-worthy to me.
"Which is more musical: a truck passing by a factory or a truck passing by a music school?" - John Cage
Virigia Tech has had something similar, and maybe better, for some time. It was written by a graduate student (I believe), and it pays no attention to variable names, it purely focuses on the syntax of a program. Programs with a similar enough syntax are flagged and looked at by TAs.
I know because I went through the process (long story), and it's mentioned at the beginning of any coding-intensive class at the 1000 or 2000 level.
My college (University of Kent at Canterbury, UK) has been using this kind of software for years for CS code (mainly Java, but also Haskell, C, Occam and a few others. Isn't really news!
Cuz remember programmers: in the real world you are fired if you consult with a co-worker ;)
That reminds me of something that happened during a CS lecture last year...
Prof: All of my tests are open-book, because the real world is open-book.
Jesse: Can I use Google during a test?
Prof: No.
The shareholder is always right.
do u think they all got busted for copying stdio from eachother! :)
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Our first assignment in BASIC progging in high school consited of..
:-)
print "Hello World"
I should think there aren't THAT many permutations of that
While I can understand that direct copying could be easily caught, does this thing go down to every exact detail?
Oh my, you used the same variable "name" here and so did Johnny. Thats a bit ridiculous.
Granted there is more than one way to do everything, but how many are there, especially on the introductory level (ie, basic Hello World applications).
I'm all for students doing their own work. I just don't think innocents should be persecuted like this.
- Nothing is true, everything is permitted
I remember taking a class (I think in BASIC) at a community college. I already new BASIC, had been programming in it for several years. (I was only taking the class for the easy credits.)
After a test was turned in, the teacher called me over. He showed me a code fragment submitted by another student. It was practically identical, even to the variable names. (Of course, in this old dialect of BASIC, variables were single letters.)
How did this happen? Outside this class, the other student and I were collaborating on an astrophysics simulator (also in BASIC) for another class. Today, our style of coding is called Extreme Programming. In the course of this we had tacitly developed common coding conventions and styles.
Even so, I was surprised how similar our independent output was.
Fortunately for me, the teacher was a friend of mine and he believed my explanation. Even so, I sensed some doubt on his part. Were he a relative stranger, things might have gotten messy.
I honed my programming skills in college by writing a 10,000 line perl script which tranforms other students' work in ways not identifieable by automatic cheat detectors.
-mjjm
---
Nothing is more depressing than the sight of people who believe they are following collective manias of thier own free will.
--Czeslaw Milosz
Nothing's more depressing than the sight of people who believe they're following collective manias of thier own free wil
Any student not smart enough to do a global search & replace on the variable names, function names and comments in the code they have "borrowed" from another student deserves to be flunked! Plus, you can't really flunk a student for using the same logic as another student, can you? The only real defense against cheating is teachers/proctors that know what the hell they are doing. And yes, I was a computer lab assistant in college...
Big-Oh notation doesn't mean anything in terms of how difficult the problem is. For this, you should be using Big-Omega.
"Evil will always triumph over good, because good is dumb." - Dark Helmet (Spaceballs)
I used to work for two brothers that were coding (seperately) modules for a project. Their code looked almost identical; variable names the same, data structures and algorithms almost identical...
These two both had the same training, and guess what...their code looked really similar.
Besides, how many different ways can you code "hello world" without failing for using inefficient/bad programming style?
-ted
Actually, until this past semester, the cheating detector was simply an urban legend here at GA Tech. It didn't exist, or at least not to the extent that the professors described it. Sure, some students were caught cheating in the intro CS class in thr past, but the majority of those cheaters were turned in by a friend, roommate, or even themselves.
The professors have always described the cheat finder as a white-space-eliminating, pattern-matching, we-will-catch-you-every-time cheat detector. This went on for years, when in fact the real script was quite lame. It is my understanding that the previous script was a simple "catch em if they copy code exactly" script.
They finally deployed the legendary cheat finder once and for all at the end of last semester, and caught a significant number of students. Why did they catch so many students? An improved cheat finder can't be the only reason, and it isn't.
- Last semester was also the first time (that I know of) that the same CS course was required by all entering freshman (yes, even management majors and such). What do you expect when you put a bunch of people in a class who don't give a hoot about programming (let alone people who don't know you can actually catch cheaters automatically)?
- Last semester was the first time an actual executable language was used for intro cs. Prior to that, GT was using pseudo code. I'd think the temptation to cheat is higher when you can readily see if your program works or not before you turn it in.
Was in a class where the instructor asked us to write a program to perform an ascii sort of a file (kind of like 'sort' actually). I specifically asked if we could use libraries, and he said yes. Of course most of the students were using Pascal...
You can probably guess what I did. My program featured the prominant use of "qsort()" out of the C library. Even though I had learned about callbacks with the thing, he really didn't like it. Made me go back and reimplement it so that there was an actual "sort" being performed in my code. Ug.
Now I'm a Principal Engineer.
That's NIH, or Not Invented Here, syndrome, and yes, it is a major problem in this industry, caused mainly by developers' perception that it's easier to write their own code than to understand somebody else's. In fact, it may be quicker to write your own, however, somebody else's code should already have been debugged, which usually takes longer than writing.
First of all, this is no innovation, and therefore doesn't qualify as news. I am a University professor, and we had more-sophisticated plagiarism detectors in use as far back as the middle 1980's at SUNY Potsdam, and I'm sure other colleges were doing it earlier.
Secondly, (as many posters have indicated), there is a difference between consultation and copying. I can't tell you how many CS-1 students I have had over the years who were mentally unable to either read or write code! In many cases, these were hard-working students who, while otherwise highly intelligent, simply did not have the correct wiring in their brains to be programmers. Driven to desperation (these students are entirely un-used to failing courses), they take someone else's code, and, with no understanding of what the code actually does, make cosmetic changes, and submit the program as their own.
Programs modified in this way can be matched with their original in several ways, most of which require no CPU assistance: 1) The program contains a wierd error or unusual logic. When the grader sees this, a bell goes off in the brain, and then you look for the program where you first saw the same pattern. 2) Identifiers are slightly "off". For example, a variable that should be named "speed" is instead named "how_fast". When many identifiers are slightly misnamed, it is a good sign of use of global-search-and-replace. 3) (Used after plagiarism is suspected) The window test: put one listing on top of the other, looking to see the "envelope" of the text to see if they essentially match. 4) A computer program can match programs using several techniques. It can count tokens, compute a moment of inertia on the text, or do a simple diff, among other techniques. This fourth family of techniques works better than the others when you are mass-producing CS-1 students; the others work better when the professor does the actual grading him(her)self.
In any case, an accusation of plagiarism should not be made until a human being has personnally inspected both listings. In my case, I don't assert plagiarism. I summon the individuals into my office and state, "These two programs look remarkably alike. Would you care to explain?" Usually, self-incrimination is the result.
More-sophisticated copying probably cannot be easily caught. However, it is usually easier to write the assignment yourself than to take someone else's assignment and re-write it so that the copying won't be detected. In any event, if a student has the mental wherewithal to edit another program into something that both evades plagiarism-detection and also works, that student probably could successfully write it him(her)self anyway.
In my courses, I have just introduced a new plagiarism-deterrence policy: on exams, I will give a problem on an exam that is similar to one that had been assigned as a programming project. My syllabus specifies that if the student cannot solve a problem on an exam that (s)he has solved successfully on a project, that project will be assumed to have been plagiarized.
Wasn't this written yeeears ago?
?> diff jims/file1.cpp janes/file1.cpp
I do everything the voices in my head tell me to...
I'm a teaching assistant for Harvard University's intro CS course. We have our own custom made software we've used for many years successfully, comparing every students code to every piece of code ever handed in for that assignment. The program identifies code that is similar so people can look and see if people have cheated.
Very few people are allowed to use the program (i'm not), but everyone who uses it comes back absolutly astounded at how difficult it is to fool. I guess changing your variable names won't save you here...
The program run for my class, interestingly enough, apparently gets more people thrown out for cheating every semester than every other class at the college, put together. People, for some reason, don't take its existence seriously...year after year after year...
Your signatures belong to me.
The cheating community dealt universities a harsh blow today when they unveiled a Perl one-liner that thwarts all attempts to catch cheaters with their fancy-schmancy new program.
"And like that
The technology not only detects exact matches, but it compares memory allocations on two programs so that massive search and replacing, whitespace, and subroutine order does not matter in match detection. The professors frequently threaten students with the cheat detecting program, however in large classes (~60+) it would be quite unreasonable to compare each student to every other student, so they must use some scheme to pick who gets compared. Most recent submissions or random detections are possibilities.
I really don't see why this is a big deal.
Why do it to intro classes. All the assignments are so short that most programs will be alike. Why not do it only on longer and more drawn out assignments.
I bet this program would bring up some flags:
#include
int main() {
printf("Hello World\n");
return 0;
}
-TC
This really isn't that new at all. I did some development on an autograding system at UCSD and incorporated a cheat detector using an age old system developed at Berkeley called MOSS. You simply tar up all your files and send them via mail to moss and they send email back to you with a link to an html page that gives percentages and all. Great system, uses program structure and logic to determine matches not commments and variable names. Give Berkeley the credit. DWP
Cuz remember programmers: in the real world you are fired if you consult with a co-worker ;)
The point of not cheating is so that the person learns something. Then when you consult with your coworker they may actually be able to help you.
.. has been doing this, and doing it better, for years. I used it as a TA in Brown's CS department, and it helped us track down quite a few cheaters.
I am in Intro to Computing this semester at GaTech, and we had a 20 minute lecture yesterday about this very topic. They do not automatically kick you out of school if your homework or project gets kicked out by the program. In fact, my professor told us that he got 87 of these papers that he had to hand check over christmas, all of which were found to be cheating. The program checks for similarity where there should be differences, and takes the purpose of the program into account, so that variable names can be changed and you'll still get caught if you are cheating. The program is very sophistocated, it is not merely the 'diff' command.
The reason that so many different people get caught is that they only review the cheating at the end of the semester, so it gives everybody who wants to cheat the opportunity before they are caught. EVERY student is told all of these details in lecture at the beginning of the semester, so it should not be a shock, but some people don't believe that it actually exists and don't even try to change things. Some people put their CS programs on the network, or leave them on a shared computer and other people steal them without even knowing the other person. The administration is generally pretty good about finding those who are guilty, and those who are merely ignorant. But as the article indicates, most people are just plain cheating.
I will just come out and say it how it is... GT sucks a fat one, all they are out to do is shaft us students
I remeber there were some cheaters who caught and the why they were caught was that both of their assiments did the same thing, to bad for them the assiment did not call for that and I don't know how they got that interpertation.
I wrote the original one in RPG II.
How low will Georgia Tech stoop?!
Give a man a fire, and he'll be warm for a day, but set him on fire, and he'll be warm for the rest of his life.
Purdue had a simple but effective cheater-detector back in 1987 or so when I was in grad school there. It ignored variable names and comments, and simply stripped a program down to a list of keywords and tokens (for C, things like if, switch, =, etc; there was a Fortran one too). Then it took the case-insensitive token list, and ran sum on it. Then it took the sums of all the student programs, sorted the list, and ran uniq -d on it. (This was more efficient than just running a massive diff.) Matches were considered potential cheating cases, and were examined by hand. Usually they were, and usually it was fairly obvious that they were. This method didn't care about renaming variables, or comments, or indention, so those common methods of hiding cheating failed. It was effective enough to keep freshmen and sophmores honest, so it did its job.
At least mafia-owned pizzarias make excellent pizza. Compare to Bill Gates.
I saw something like this at my school in the early 80's.
Not too sophisticated -- eat the whitespace and comments, change the variable names to a1, a2, a3, etc. Change the function names to f1, f2, etc. Diff.
I don't know if it had a parse tree. (I probably didn't know what that was at the time.) Never heard of anyone being accused because of it.
If Chaos Theory has taught us anything, it's that we must kill all the butterflies.
Do professors get an F or fired for using the author's lectures, in powerpoint format, from the author's or publisher's web site as a crutch?
SIOT in NJ
It's their time and money being wasted. let them cheat on homework. i always thought homework should be optional anyway. WE are paying the school. WE should tell them what to do. But then I think testing should be done by independent 3rd parties too.. not the schools themselves. Even within the same school Professor Y can have much easier homework and tests than Professor x.. between schools the difference can be just as huge.. a degree earned by one person isn't equiv to a degree earned by another. I know some CS grads that kick ass and a lot of others who shouldn't be hired to count beans.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
Then they would have known he cheated. Hopefully they used it on Chan Gailey's ;-)
Pardon me if I'm too drunk to be posting, but...
That looks like four words to me.
As a TA for a similar course, I've been very suspicious of certin assignments. The problem is, its very difficult and time consuming to prove beyond a doubt that a student is cheating. The only time you realy can nail them is if there is the other persons name has been left on the assignment or there exatly identical (its actualy happened). Otherwise they get off scot free, though are usualy more catutious.
This does bother me, but I can take solace in the fact that those students who cheat usualy end up doing poorly on midterms and finaly and ultimatly flunk the course in any case.
The majority of people (at least that I have heard of) caught cheating were NOT computer science majors, just computer science students. At Georgia Tech, all students in all majors are required to take intro to CS, and some majors are required to take CS2.
Every university CS department runs these cheat detector programs, and every semester several people get caught and punished. They never publicize it though.
-
Cuz remember programmers: in the real world you are fired if you consult with a co-worker
This casual quip misses the point. One does not produce identical answers to nontrivial assignments merely by "consulting." The most common cause of identical duplication typically derives from non-collegial efforts, such as "frat house sharing" (not limited to fraternities) and other modes of mass-distribution of a single individual's work.
This casual quip further misses the point. Each student who cheats in this manner, however collegially, typically has violated an express oath and promise taken at some point. YMMV as to specific honor codes. It is one thing to break the rules because one thinks one knows better -- it is another to lie about it.
The purpose of these exercises is to train each individual so they can later be part of a team. In my view, pair programming should also be taught as an inherent and principal development skill -- but a particular mode of pedagogy is the discretion of the professor.
I must confess that one of the most difficult moments of my young adult life was serving as a teaching assistant for an introductory programming course at Cornell (hundreds and hundreds of students). A massive amount of cheating was discovered, and we ultimately sorted and compared all assignments and then called in the groups, one-at-a-time, giving them an opportunity to account for the similiarities. It was horrible, for me, the professor and the students alike.
People cheating just because they (thought they) could. It was a very sad thing.
So, most programmers actually use wizards all the time. The key thing is to know when to use them.
There are reasons why democracy does not work nearly as well as capitalism.
-- David D. Friedman
You can't compare the real-world with academia. In college, you have to do your own work because that's how you're graded. In the real-world, consulting with a co-worker is acceptable because the job has to get done, and management doesn't give a damn about who gets it done and whether or not that person had help as long as it gets done. It's probably past time that our institutions of higher learning change the way they evaluate student performance so as to take into consideration interpersonal skills. Then again what do I know?
This problem just screams Scheme. Throw the code through a parser function that puts the code into syntax tree, call a helper function that counts the number of similar tree segments (using recursion to count from certain subtree segs).
The parser could store the variables in lexical addressing so as not to get fooled by "Find...Replace-all."
The profs have long had a similar cheat detection tool, although I've heard it's somewhat more sophisticated in that it even checks code for being the same with different variable names and that sort of trick. Supposedly it catches code blocks that are arranged in a different order too.
;)
Then again, I'm not on the inside, so a lot of that could just be the professors trying to scare us students into not cheating....
As a side note, though, in the courses there, a lot of the time, sharing code with each other was allowed as long as it was cited.
If work is being copied then there's at least one innocent victim here.
Plagerism happens all the time in these classes and the hardest working & best student is the one who get's copied the most.
I know a 1st class honors graduate who got his degree because he used to take the second class of a set of two. At the start of each class he'd pull the old printouts from the waste paper bin and use that as his starting point. He'd have a complete solution at the start of the exercise which he could then refine.
I'd like to see guys like him get caught, but what about the people he copies from. Unless the professors are going to take the time to investigate this there will be innocent victims. Tenured professors can be heartless and lazy and the administrators in academic beaurocracies would put Rudolf Hess to shame.
If this continues then we need regulations to protect students.
Basically, all these posts about groupthink skills being a necessary component of the CS curriculum are uninformed. As we went from through the curriculum, eventually almost every class at Georgia Tech (at that time) became group oriented.
The classes where "cheatfinder" was used were frosh classes DESIGNED to measure individual capacity. Anyhow, I was the TA assigned to run the stupid thing week after week on hordes of Pascal programs, and later Java programs. So I want to clear up a thing or two:
The cheatfinder program only flagged similar programs so that they could be visually inspected by a person (me). Nobody ever got nailed b/c the cheatfinder ratio was 100.00% correlation.
What I would do was set a ratio (like 99% correlation), run the thing, and if there were no flags, lower the ratio (to 95%), until I got about 15 or so pairs or triplets of programs. Then I dumped those to the printer and looked them over. VERY OFTEN it was just two or three very good programmers who had the same (best) way of solving the problem.
Then there were the ones who got busted. You know what killed them? The comments! It was hilarious how these guys would go to all kinds of lengths to make the program "look" different: change variable names, move functions around, and so on, (and cheatfinder was never fooled by such tactics), but they would leave the original owner's comments (often with the same spelling errors). To me, it showed that they thought the cheatfinder only cared about code (TRUE) but they forgot that there would be a human being looking at the code.
There was some sad scenes. One guy was an Industrial Engineering major fulfilling his CS requirement. He had a job with Anderson Consulting lined up, was set to graduate that semester, the whole family including grandma had bought plane tickets to Atlanta to see him graduate, etc. I believed him, but I still nailed him.
M.
I don't see why every university doesn't just implement this, instead of trying to write elaborate AI anti-cheating scripts....
>|<*:=
a program which compares students' coding assignments to each other and detects exact matches There must be some really sharp coders at GT. Making a marketing major will come up with a clever name for their program (how about diff?)
Many years ago I took a Cobol class that I hated. It also coincided with the peak of my slacker years at college. The assignments were put into a manilla envelope and slid under the profs door. One week a friend and I went to turn in a Very incomplete project. And lo and behold a bit of manilla sticking out from under the door. It turned into a game after that, we even got one of those flexible grabby things. How much can we change the code without changing the logic.
People in my high school c++ class change stuff around like couts and function names when they copy programs.
If you don't understand any of my sayings, come to me in private and I shall take you in my German mouth.
When I was a TA at a prestigious private university that had recently decided to change its admissions policy to take into account the parents' ability to pay, we were told that any cases of suspected cheating would need to be referred to the prof. We were told that the incoming class would have certain students that might not be as well prepared as the others--but certainly better connected.
I suggested in the meeting where I was introduced to this unique view of academic integrity, "So hy don't we just grade on the basis of how much the kids parents make?" I was sooo popular with the faculty after that. Not.
Sometimes life really sucks.
As a University of Georgia grad and a programmer, it doesn't surprise me one bit that students at the North Avenue Trade School would have to cheat.
"Laugh and the world laughs with you. Cry and I'll give you somethin' to cry about!"
You'd know all he did was copy buzzwords....
i go to RIT and they use lexical anylysis programs to anylize work submitted for each student.
Your work is tokenized, functions are reordered, variable names, reduced to enumerations.. etc, then compared.
if the code has a high enough similarity to another persons code, it is hand checked..pretyy hard to get away with cheating, unless you are capable of changing it enough (which usually implies you could have done it by yourself anyways)
As a professor, this story seems off, maybe the student left out some details. There are two sets of cross allegations, one of the prof's lame teaching bthe other of the student's academic dishonesty. The Prof's lame teaching may have gotten him/her into some trouble, or there may be other unreported circumstances. However, let's consider the academic dishonesty. What if you were beat out of an A by some students who cheated, and wound up getting an inferior grade, is that fair?
If the story isn't completely bogus (which it might be) the only wrong behaviors of the Professor could be:
In any case, if this guy wasn't allowed to collaborate, he shouldn't have done it. Most faculty will sometimes give a hard problem and NO student can solve it, mainly to see just how good the best students are. The direction of copying does NOT matter and it is not the faculty's responsibility to figure it out. If there is a ring of many cheaters, they could all get off the hook by claiming to be the originator if that were allowed.
However, I hope that I'm never in a position where if I catch a bunch of student's in this situation (and I have a record of applying the F penalty) that some coweed administrator doesn't reinstate a grade of a student who cheated (or try to force me to do it). Cheating sucks the life out of good students and drags everything to the lowest common denominator.
As a professor, my best students need to benefit from a good recommendation. Cheating degrades this by attempting to subvert the measurement processs. And the measures may not be completely fair, but they must be consistent .
Worse still, your compnay could be forced out of business by litigation and not only you but ALL YOUR COWORKERS could be found on PUD's hall of shame . Just look at what happened at Enron and Arthur Andersen, not every employee in the company needs to be a wrong doer to be taken down by a serious ethics violation, hell at your next job you might not just work for the dept. of corrections (as prison labor) you might be a client.
I'm a CS student at Purdue University, and they put all of our source code into their security software to check for those naughty cheaters. They caught ten people who had "colluded" on the same project last semester. The final project for a graduate level course is to defeat that software.
Electron Pulse...indie rock/jazz/blues
When I took this course at Ga Tech back in 1995, the intro to computing course was taugh in a psuedo langauage simaliar to Pascal. Since "code" submitted never had to compile it was very unlikely two students would have the exact same code without cut and paste copying. They also told us the cheat program would be used to analize our assigments.
Years ago, I was accused of cheating in one of these classes and I must say that it was an extremely traumatic experience. My only intention was to learn and to help others to learn. Eventually the charges were dropped, but for the rest of my career at Ga Tech, I was reluctant to work with others. On the surface, this seems like an admirable effort to weed out academic dishonesty, but for me at least, Kurt's holy crusade resulted in destroying any sense of community i felt at the school.
I'd just like to say to any Ga Tech students who will be taking these classes... If you want to teach others, wait untill after the class is over and become a TA. As a student, you'll be far better off working alone, otherwise you may go do wn the same path I did. You don't want to deal with Kurt. He is as Unfeeling and sarcastic as they come. He will have no sympathy for you, no matter what your intentions.
..Unless they've fixed it since the last time I've used it (Circa two years ago).
After hearing the big bad 'we know if you copy each other's code or not!' talk, a friend and I submitted the exact same code to it. Nothing. Not even a glare from our professor.
I miss the coffee machines in the CS building.
As a student at tech I've known many friends over the years that have gotten I's (incompletes) for coming up on cheatfinder. They have told me that they were just required to explain their code and what it does. Like if you can explain the code and understand it then they can't prove that you cheated, however if you can't then how can you explain yourself writing the code in the first place. The problem with this is when they quiz you on a program that you wrote months ago of which you have already forgotten the concepts used.
These are called "Undergraduate TAs" these days ;)
yours,
kbs
I don't think cheaters should be caught and here's why. Cheaters won't learn and in the real world will not succeed as developers. So who cares if they pass a few courses along the way. I knew a cheater in school who later cheated at work and was fired.
But instituting a cheat-detector, especially as lamely as some have indicated, will force students to find "creative" ways to make their code different from their classmates. Not better, but different. These people will be educated enough to be successfully employed. But their code will be fraught with illogic and inelegance because of the ingrained "must be different" attitude. Coding standards will be ignored completely. People will begin to submit production code to the IOCCC. And software quality as a discipline will take a huge step backwards.
Anybody want a peanut?
The Unviersity of Washington's had this kind of software for about 5-6 years for their intro C/C++ courses. The programs they use check for cheating in all sections of the class by checking for similar design flaws, variable names, formatting, etc. Of course, its all been setup so using 'i' as an index doesn't set off an alarm. It detects things like a group of studdents all using a variable called DataSetArrayofIntsOrWhatever. Even when students know about cheating 2-3 people are typically caught cheating every other quarter. The penalty varies depending upon the professor. University policy say's you're expelled, though some professors will deal with it personally and depending on the nature give you an F for the assignment or a F for the course.
DETAILS MAN, DETAILS! You left them out, I love this kind of story, so fill in the details. I, for one, would like to read them.
Linux is so bad it's free and most people don't use it. But you have the source code, so it's your fault.
I had a friend in college that got charged with Plagerism for turning in the same term paper in two different classes. It was relivant to both assignments and fufilled each but the professors saw it the other way. dumb. self-plagerism.
Back in school I was so poor I would write other people's coding assignments for money. I made what's equivalent to a few thousand dollars doing this. The trick, of course, was to do things differently for every single person. This meant coding it in different ways which couldn't be caught.
Needless to say I was in "high demand" by several groups. Advertising was purely through word of mouth, and I never got caught.
Do I regret it? Absolutely not. Those same students now can't code their way out of a paperbag, and I got to put myself through school.
well when you look at it it's only 187 students out of about 1600 that were taking the classes during the semester and it's mostly freshmen plus there is a project due almost everyday in those classes and just the work load that's assigned in the classes makes them to be 2 of the hardest classes on campus. hell when i took it we had projects due on the weekends. talk about being a waste of a friday night. well all that's left to say is that freshmen do the stupidest things. including not renaming the variables or constants. but not to worry about the stuipd freshmen since GT suspends on the first honor code violation so that the cheaters can learn how to cheat AND not get caught
.j.
...this wouldn't bother me anyways. Even if I *did* copy some other person's code, I'm *really* anal about having my code look like I want it to. I'll even make sure that it's tabs instead of spaces.
As for the cheating part, when I took a CS class in high school I had an algebra major for a teacher. The guy knew nothing about network administration, yet ran the computer lab. We had lots of fun playing jokes on him, and the final exam was a total fiasco.
He gave us worksheets which detailed the project. I was at the time, under his orders to use my laptop for all projects. I typed up all 3 versions of the final exam, using three different programming 'styles'. I printed them out, and put them on disk, and handed them out to half of the class. I typed up mine not using any of my previous notes, because it was such a simple program anyways. The guy never figured it out.
Model 551, Chambered in 6mm
I am a CS teacher (hum, preparing a PhD actually), and let me tell you that this kind of tool isn't necessary (-:
When you read/correct code, cheaters are usually the cluess, panicked students ; hence, they copy whatever they can, included the stupidest mistakes. Tracking them is rather easy, really.
But this is for "paper work" ; note that we never ask students to perform computer assignments alone, we just want to know who they worked with.
Singularity stupid: stupid gotten so dense that no intellect can escape
It is time for some really smart young student who is obviously wasting his time in one of these required intro classes to write a pair of programs: one program takes a program as input and attempts to perform as many permutations a possible which change the source code to be unique without affecting its function. Ie rename variables, change white space at random, change formatting and layout. Add comments at random. etc.
The second program again takes a program as input, but this program attempts to transform its input such that all input that perform the same function will look the same in source also.
ie strip all comments, use a mechanical formatting system, break all compound statmenets into their most simple forms. Introduce temporaries, do dependency analysis to reorder statements in a mechanical order. etc.
Its time to fight back!
I never cheated in my CS classes. In fact when I took N602 Pascal at Drexel U in the late 80s I used to sleep in the front row of the lectures except for the ones where they handed out assignments. On these days I would pick up the assignment at the beginning of class and go back to the dorm and code up a solution. I usually had it done by the time the other kids came back from class, at which point I destributed it to anyone who asked and explained it to anyone who asked.
While I was working at Leeds University we had similar programs, but it wasn't exact matches we were searching for. It's far harder to detect plagiarism than that. You have to tokenize the work to get rid of variable names (although you can take that into account later) get rid of all the formatting, try shuffling it all round a bit, and then seeing if it matches.
Most of the people we caught cheating had at least made a cheap attempt to hide their wrongdoing.
How does this make a story?
jh
jh
<InMyDay>
I think that I can now safely admit that I did cheat once at U, in a hellishly tricky lab assignment that took several days work stretched out over a couple of weeks. Even students who did everything right tended to end up short of time to do all the data reduction and write it up (this was at a time when programmable calculators were still in the luxury goods bracket as far as the average student was concerned). Anyway, two of us on that assignment didn't do everything right, and ended up the day before the result was due with abysmal data.
So we used the data we had as a guide, and "adjusted" it - aka invented readings out of thin air. It took until the early hours of the morning to get it all done, so we were pretty whacked out when the time arrived to go over our report with the supervisor. To cut a long story short, we got away with it and even got a reasonable grade. We'd taken care that the invented data was still bad, but that it was consistent with a plausible reason - in fact, the reason that the data we'd actually measured was so bad in the first place.
Would've been easier to just take the bad grade, in retrospect, but a lot of marks were hanging on that assignment.
</InMyDay>
his post was funnier than the other guys post.
This isn't new stuff. I wrote this code in 1983. Students taking Purdue's EE 263 class wrote programs in Pascal and Fortran. My program made a pattern of their programs then compared for a percentage. The pattern method would instantly find two programs that were no different except for recommented or variable names changed. Seem to remember catching quite a few cheaters.
This isn't very new news; RIT has had this plan for years now. My friends at other schools also have similar programs. What is really interesting is that our school supposedly keeps a database of all programs ever written, so not only does it compare the current class, but all past classes to.
I go to the University of Maryland now and for 3 years now they have been warning us that they use a program like this to check our code. It has even caught some people. I only know of one person who has been falsely caught by this and that was because the code it matched was stolen from him by his roommate and handed out to others. He eventually got it overturned and stayed in school.
......right after Apple's GUI division got fired(remember Xerox PARC?)
I'm not positive, but I think the usual penalty for cheating was a zero on the assignment/exam or an F in the course.
I taught a junior level CS course in the past, and used Alex Aiken's MOSS (developed in the mid 90s - this is a new version of an old story) to detect cheaters. Found quite a few. Every accusation was based on my inspection of the similarities; I treated MOSS as an aid in helping me spot those similarities but not as proof that the students were cheating.
Tools like MOSS will help you find a certain type of cheater (those who neither grasp the problem nor have the ability to code the solution) but not others (those who can code, but do not understand the problem to be solved). The latter group can look at working code (eureka!) and then code their own solution. The amount of work required to take existing code and make it appear different to MOSS was non-trivial; renaming variables, reordering code, moving code from one function to another - these were all things that were caught. My belief is that for someone to fool MOSS, he would have to be well versed in the language, and would have been able to finish the assignment without resorting to cheating.
As for making the accusations - I consulted with other instructors and faculty and found varying opinions. Some felt it was their duty to report all cheaters; others felt it was too much work and that nothing would happen. There was some consensus, though. I had to be absolutely certain that the students cheated; if I had any doubt I should not make an accusation. Furthermore, I had to be able to prove to others that the student cheated. "Others" refers to a college or university level committee that handles appeals; this group would have members who would not have CS backgrounds. Since my only proof consisted of the similarities in the students' programs, it had to be good enough to convince such a committee.
Even so, I heard of appeals being successful. A student had clearly cheated, and was given an F for that course. This would have dropped the student's GPA below a certain average, which would have meant he lost his scholarship, which in turn would mean he'd have to drop out of school. The committee weighed this heavily and overturned the instructor's ruling.
If you ever wondered why those guys who cheated got a better grade than you did - well, maybe there were other things going on that you were not aware of.
What are you going to do, run to you Mommy?
Fucking candy-ass baby.
You don't think the really good ones can spot the fakes quick?
Berkeley has a much more sophisticated project that does far more than find exact matches. Most Computer Science departments know about it, and many classes use it to detect cheaters:
http://www.cs.berkeley.edu/~aiken/moss.html
I am a junior/3rd year student at Georgia Tech. I took both of those classes back in my freshmen year. Cheat finder exsisted back in those days as well, but it seams they have changed there algorithms because a lot less people were labeled cheaters in the past.
One problem I have is there definition of cheating not only includes copying the work of others but mearly discussing the projects in any way, shape, or form with a fellow student. At least that is what I was told specifically by a TA back when I took the class. Copying work is most certainly cheating. However, in college other students are great tools to learn from and there is no reason why discussing strategies for solving a problem shouldn't be allowed. I believe we like to call that cooperative learning? I remember reading some studies about how well that worked in a psych class I took at Tech....
The other problem is the simple nature of the problems could cause many students to use similar methods that could be picked up by cheat finder. I happen to know a freshmen who struggled greatly in the class and came to me for help on many occassions. This particular student never discussed the projects with any other people in the class, and never shared code, but was picked up for cheating because there were trends in his code that matched other students. I realize that by the strict and stupid nature of there defition of cheating he couldn't even ask me about solving some of his problems (I didn't write any code for him, just gave him so ideas how do to certain things since he had never coded before) but the fact is he now knows the material and deserves to get the B his grades earned him and not an F for all the time in effort he put into learning it.
The CS department needs to focus more on teaching there students rather then catching them for cheating. Maybe if they focused there effort on creating a class that better taught the material less students wouldn't feel the need to use the students which already know the material as a crutch to make it through. Especially in a class required by all majors.
I'm a freshman student here at Georgia Tech and I took the 1321 computer science class in which about 150 students were caught for cheating. Some of my friends were "caught" and are now having to face appeal courts. The last week of school, right before finals, is called dead week. It's called that for a reason, no projects or tests are to be given out to students because they are supposed to be studying for their tests. Well our soul-less profs decided to give us the largest homework assignment of the year in that week. This was the first semester that our professors used the book we had and also it was the first semester that gatech used the program, Dr Scheme, for its course. So naturally the professors hadn't taught all the material to us in time and they crammed the rest of it in the last two weeks. And so our last hw assignment was HUGE. People were stressing out over finals and also over this assignment and so the majority of the 150 people that were caught for cheating, were caught on this assignment. And for the most part, it was for sharing code on the hardest problem. I guess people wanted to help their friends when it got down to crunch time and they are paying the consequences for sharing their code with a friend who was too stupid to re-arrange and rename some of the code.
The University of Wollogong in Australia had similar software based on analysis of the solution.
But their favourite method was to scale your assignment marks if you failed the exam. Since most exams had to be more than 60 of you're marks if you failed you were in deep water.
That's a good one, wish I had some mod points!
~ now you know
It makes you wonder what educators are thinking. That's just a plain ignorant way of looking at things. A year ago I was working on a project using C to make a real time video game based on physical movement measured through digital cameras. If you told me to reproduce it right now, there's no way in hell I could. Professors are so quick to lose sight of the fact that people who haven't been doing the same thing for 30 years have a hard time remembering every little detail of it on the tests.
~ now you know
I once worked for a company which makes computers named after fruit, and a "well respected" engineer on staff, took all of our design and test plans and wrote a book, for which this company paid him handsomly. He took credit for every line of it, but me and a freind counted only 15 lines of origonal text in a book of 280 pages.
The only good thing about it was the fact that one of the engineers was about to get fired so he wrote a bunch of bogus stuff, which was published right along with the rest...
Fast machines, powerfull AI, impulsive invention,... All I lack is a good espresso machine!
Back when I was a student, I took a Data Structures course where the professor would do automatic grading. We'd simply email the code files to his account before a certain time and the program would run and generate output. There would be 9 tests that the program would run to make sure it was getting the correct output. You would always get 10 points for handing in something so if you even bothered to send an email you got a 10. Now if your program passed a test, another 10 points, all 9 tests and 100%. When the assignment was given out, we'd be given 6 of the 9 tests that were going to be run. During the semester I was also taking other time intensive courses so I did not have time to sit down and code some of the projects correctly so I would analyze the inputs, find the differences and have my program act as a parser finding simularities and printing out prefabricated output files for the 6 tests we had. This would ensure me a 70%, and a B+ for the class (I also scored very highly on all the exams so I had a A in the course). The instructor leading my course was an assistant professor who was looking to make a name for himself. Turns out there were quite a few people doing this as well. So he made an example out of me saying how I was cheating and instructed other students to do so. Instead of accusing me for whatever cheating scheme I described above, he accused me of plagarism. This turned into my savior since I could easily claim that my code was unique from all others. Hell, some of the other scams would find the output files the professor was comparing to and copy them as their own output files so the program would always pass the test. Now I work for Microsoft ;)
I recently took a quiz in a Digital Logic class of about 15 students. We all made exactly the same mistake. Turns out that the prof didn't fully explain the methodology for finding the solution and we all went down the wrong path. Guess we all should be brought up on cheating charges.
I also took celestial mechanics as an undergrad. We had 4 students. 1 A and 3 C's. You think the prof thought we were cheating? No, there was one smart person and the rest of us were average. The prof was not interested in raising us to the smart person's level. If we had studied together and cooperated we probably would have done better.
This originality thing can go tooo far. It results in committees with no subject matter experts on them. "The experts have screwed this up so why get them involved."
OK. For all problems that have a discrete set of answers there is one solution or group of solutions that is/are the best solution(s). Why not learn the best solution. If a group of students turn in the best solution then their programs will look very similar. You can't assume they were cheating unless you have corroborating evidence. Well maybe the academics are afraid we'll stiffle the next Einstein (who by the way probably got alot of good ideas for things needing improvement from working in the patent office. Did he cheat then?)
If the prof is bad, and the book is hopeless you have to look elsewhere for help. First I look for another book. Is it cheating if I find a way to solve the problem in another book? Second I check on-line. Third I'll ask the TA. So if the TA tells everyone the same way to solve the problem we all get hauled in for cheating. Real learning is about methodologies and learning which to apply to the problem. I wish the teachers and books would outline the method prior to solving the problem.
Supra et Ultra