NoOneInParticular · Slashdot Mirror

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-18 06:20 · Score: 1

Fundamental misunderstanding: the student is not assigning over his copyright, he's merely asked (informally) to give the right to store his paper to be checked for plagiarism. This particular student in the article does not want to give this right. Most likely outcome of this is that the informal transferral will become a formal one, and yet another laywer can make a living.

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-17 22:18 · Score: 1

The fun thing about being a teacher is that *you* make the rules. So you can demand word, latex, rtf and/or plain text. I usually demanded either postscript files for papers created in latex or html files created from Word. It's fairly easy to get the plain text out of these. If it didn't the paper went back.

Re:Well how can they safeguard against this? on Student Fights University Over Plagiarism-Detector · 2004-01-17 22:09 · Score: 1

Of course they would say that a quoted block can be found somewhere else. Note the choice of word. All this does is checking for probable sources of segments. It will not say anything about plagiarism. If you get positives using this trick, you'd better check the positives and see if they are indeed plagiarism/random hits (very unlikely) or properly quoted pieces of text.

Re:Turnitin@home on Student Fights University Over Plagiarism-Detector · 2004-01-17 11:00 · Score: 1

My guess is it happens more often than you think

All other points in your message can be as easily debunked. No time for that now, time for bed.

Re:Turnitin@home on Student Fights University Over Plagiarism-Detector · 2004-01-17 10:56 · Score: 1

You've missed the point completely. What do you expect this script does? Flunk students and sling them to in front of a fraud board without user intervention? No, it just creates a small report with all links where potential originals are kept including the 'offending' passage. Just check these links and form your opinion.

And do you seriously think a student will keep a paper long enough on a web page for google to be indexed before it gets handed in?

Why 10 words? Simple, that's the maximum length google accepts for a single query. And while you may guess that clashes of the same sequence of 10 words may occur more often than you think, my experience suggests otherwise. I've tried the script on many texts (including papers of my own), and although it very rarely occurs that a hit is found for a single 10 word sequence, it never happened that two hits to the same webpage for disjoint sequences didn't point to a few paragraphs of identical text. Granted, this can be a literal citation, but I never said you should automate your response based on this. Just take your own message apart and feed it 10 words a time through google and see how original you actually are! You might be surprised (like me when I first started this script and found it solved the problem directly) how much variation there actually is in human language.

And btw, the sentence: My guess is it happens more often than you think does not get any hits in google. You seem to have produced a truly original sentence.

Re:Google might ban you on Student Fights University Over Plagiarism-Detector · 2004-01-17 09:46 · Score: 1

I know, that's why I used a similar strategy of being light on Google, and also didn't distribute it as Google might take offense. At some point I ported it to use the Google API and was limited to 1000 searches a day, which was good enough for my purposes.

Re:Turnitin@home on Student Fights University Over Plagiarism-Detector · 2004-01-17 09:43 · Score: 1

Ignoring grammar for a moment, 10 consecutive words from a vocabulary of say 10,000 words (college students should know about 50,000), makes 10,000 to the power 10 equals 10 to the power 40 different messages. Quite a lot, not much chance of having doubles here.

Ok, grammar reduces this, but there's still a huge load of variability in sentence structure. Don't take my word for it, go to some website and pick out an innocent looking sentence of 8 words or more. Feed it into google (with quotes), and see how many copies there are.

As an example, take the innocent looking description This article describes both the program's specifications and its role. Now how many pages could there be that contain this sentence part? On the surface it seems anything that has to do with some form of programming can come up with this sentence. In effect, there's only one page on the web that has this sentence, exactly the one I took it from. I didn't cheat here, this is general. Try it out with a few pages and convince yourself. As long as the phrase isn't a common remark, it will almost always be unique (or you find literal copies of the entire paragraph or article).

Re:Wrong. on Open Source Awards 2004 · 2004-01-17 09:09 · Score: 4, Insightful

Don't know about the other ones, but valgrind is a life-saver for development. It is a tremendous help to any kind of C/C++ development on linux/x86, and has helped developers for linux platforms to create much more robust and stable code. Without tools like these, stuff like OpenOffice can only advance at a much lower pace.

I've worked with many bounds/integrity checking programs, both on windows and linux, commercial and otherwise, and oddly enough valgrind beats them hands down in quality.

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-17 06:16 · Score: 1

I didn't check thoroughly, just pasted in the string I sent through python's urllib to Google in my browser, and it worked. Capturing the response from my script, I got a 'Forbidden' message from Google. As I said, it wouldn't be too hard to work around it, but I would prefer to use a method that Google doesn't explicitely dissallow.

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-17 04:55 · Score: 1

What do you think? Would you go to the trouble of spell-correcting the paper first, then running your "cheater checker"?

If students would start doing this, I would indeed run it through a non-interactive spellchecker first and if the number of errors exceeded a certain percentage (say 1 spelling error in 500 words), I would sent it back immediately without bothering with the plagiarism detector.

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-17 04:47 · Score: 1

Ok, bad choice of words: s/illegal/not allowed/

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-17 04:43 · Score: 3, Insightful

I can see that the profit motive seem to be a problem. Remember however that the reason the papers are worth anything is because students have the tendency to copy them to save them the trouble of writing something on their own, and there's a market for plagiarism detection. Apart from that, they're worthless (generally speaking, some papers are good).

In effect, a company like Turnitin would only be interested in the student papers from universities that use their service, simply because students from the same university are much more likely to exchange papers without using the internet than with those from other universities. In the case the internet is used, Turnitin is perfectly capable of finding this information for itself, perfectly legit, because it is publically available.

The university that uses Turnitin when explicitly asked would undoubtedly allow Turnitin to use the univerity's entire archive for detecting plagiarism for *their* students. Maybe they would not allow it to be used for other university's students, I would doubt that however.

The point is however that there is very little worth (except maybe for advertising) in collecting papers from one university and trying to apply them to the next, unless these papers are available on the net, in which case they're freely available anyway. Concluding, I think the 'making money of the student's work' argument is far-fetched.

Re:Well how can they safeguard against this? on Student Fights University Over Plagiarism-Detector · 2004-01-17 03:00 · Score: 1

Has it occured to you that these 5 seperate sources ALL have something to do with Turnitin? What do you think the chances are that there are different authors for this sentence at work here?

I think it's an excellent example for plagiarism, particulary as it was the first generic sentence I could find on the turnitin front page, and it turns out to be copied a few times.

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-17 02:49 · Score: 2, Interesting

I could, in principle. I've lost the script that used the google API to do this, but still have the original one that would post HTML and parse google's result. Unfortunately, it ceased working, as google currently seems to check user agents or whatever and dissallows the script as a flagrant violation of their terms of service... which it is. I don't think it is wise to hack around this (which should be perfectly doable) as it's illegal.

If there's any interest in this, it would be fairly easy to set it up to use the Google API and make it a small sourceforge project. It seems that if users of the script obtain a valid API key such a script does not violate Google's TOS for the API.

The one that checks for occurances in other papers is so easy (using a python dictionary) that I leave that as an excercise for the reader :-)

Re:Well how can they safeguard against this? on Student Fights University Over Plagiarism-Detector · 2004-01-17 02:05 · Score: 5, Informative

The other problem would be false positives when people write with similar styles in two different parts of the nation/world. Given enough "samples" in their filter, the accuracey drops because you now have a much higher likelihood of turning up a match

Have you actually any idea what the probabilities are of someone writing the exact same sentence for describing the same thing? Just take this particular post apart and feed ten consecutive words through google and see how many hits you get.

Also, take a fairly generic sentence such as "to improve writing and research skills, encourage collaborative online learning" and try to find out where I got it from.

Turnitin@home on Student Fights University Over Plagiarism-Detector · 2004-01-17 01:50 · Score: 4, Interesting

I mentioned this in another post for this story, but it might be interesting for teachers reading this site.

It's frightfully easy to write your own plagiarism detector. All you have to do is write a script to scan the paper and run a few samples of 10 consecutive words in the paper as a search term through google. If for two different queries you get the same site in the google result list, it's a practical certainty that you've found a copy at that site. Chances of someone coming up with the same wording of some subject in two disjoint fragments of 10 words are abysimally small.

Given that most plagiarism happens by copying from the internet (and students usually use google to actually find such documents), you yourself can use google in the same way.

I once wrote a 20-line python script to do just this, and it worked very well. It even found some plagiarism inside a an (awarded) document that was plagiarised.

Re:Hrmm on Student Fights University Over Plagiarism-Detector · 2004-01-17 01:37 · Score: 4, Interesting

That's akin to saying manufacturing anything is a job for engineers: they're supposed to know the material and how to build stuff with it. Well, once the initial design is done, it's a lot more efficient to create a machine that does the manufacturing for you. We call this the industrial revolution.

As a former university teacher, I've never used this turnitin site, but I did use a 30-line python script that would take random fragments of 10 consecutive words in the papers and would run them (a) through google and (b) against all other papers that were turned in. This worked awesomely well and saved me a lot of time that I could spend on actually assessing the quality of the non-fraudulent papers.

Plagiarism simply happens and I don't see the problem with automated checking for it. Automating tasks that formerly needed insight, training, and knowledge might be called the information revolution.

Re:Um, like duh! on Eight Biggest Tech Flops Ever · 2004-01-01 04:46 · Score: 5, Funny

No Way! Don't you know that the difference between a virus and MS-Window is that a virus is tightly coded, does what it is intended to do, and does not break down under load?

Re:Try Turing or Zuse on Happy Birthday, Von Neumann (And Linus!) · 2004-01-01 00:28 · Score: 1

Indeed, you are right, I was wrong, apologies.

(still nitpicking: didn't you mean Turing complete instead of Turing hard?)

Re:Try Turing or Zuse on Happy Birthday, Von Neumann (And Linus!) · 2003-12-29 03:59 · Score: 1

Not to nitpick, (ok, granted, in order to nitpick), first you say:

They are an extremely valuable analytical tool, because they're usually the easiest Turing-hard model of computation to implement in whatever theoretical construct that you're trying to prove is Turing-hard

And then you say:

it was straightforward to implement Turing machines in lambda calculus, but it took a couple of decades before theoreticians managed to formally implement lambda calculus in Turing machines

Hence it didn't turn out to be that easy to implement lambda calculus in Turing machines, right? In my experience lambda calculus is much easier to implement anything in (and yes, I have programmed both, much to my horror). In any case, in computational theory, Turing machines are still used, but usually in their lambda calculus incarnation (the search for the smallest universal Turing machine is done in combinatorial logic, a very nice branch of lambda calculus, or should I say unlambda calculus? Apart from practical use, LC has surplanted TM's as analytical tools as well.

Re:Try Turing or Zuse on Happy Birthday, Von Neumann (And Linus!) · 2003-12-27 23:03 · Score: 2, Insightful

Another big difference between the two is that lambda calculus is actually useful, while the Turing machine has some analytical -, but mostly entertainment value.

By the way, I've never heard of Turing actually implementing his machine in hardware. It was a hardware design, implementable with pen and paper, but I don't think he actually went to the trouble of creating the machine. Got any refs for that?

Re:Try Turing or Zuse on Happy Birthday, Von Neumann (And Linus!) · 2003-12-27 22:52 · Score: 1

Even more pedantically: technically speaking, you don't need infinite memory for Turing completeness, unbounded memory is enough.

Re:GOOD IDEA!!!! on SCO Gets More Desperate; Sends More Letters · 2003-12-22 07:28 · Score: 1

Your example is right, that's not price orchestration. Price orchestration can only happen when you amass so much stock that by using it strategically you can drive the price in ever which direction you choose. So if your Howard Stern, after he got the audience to buy about 10% of the stock, starts to orchestrate the price, he'll be in serious trouble if he were to do it in Europe. I'm pretty sure the same would go in the US as well, simply because otherwise institutions (that amass massive amounts of stock) would simply be able to determine the price of practically any stock, and make extreme profits from other players doing that.

Buying the stock, getting influence, and using it strategically was what the great ancestor of this post proposed. Still sounds like orchestration to me.

Re:GOOD IDEA!!!! on SCO Gets More Desperate; Sends More Letters · 2003-12-22 02:53 · Score: 2, Interesting

Not insider trading, but a clear case of price orchestration. Also illegal in Europe.

Re:Interesting question on Boston's Big Dig Finally Open · 2003-12-21 05:44 · Score: 1

That's what I mean with inflation, lives were less of an issue back then.

Slashdot Mirror

User: NoOneInParticular

Comments · 2,094