marpot · Slashdot Mirror

Re:Can I do my own searches? on How Do You Visualize 100 GB of Google Text Data? · 2011-01-15 04:05 · Score: 1

Try this: http://www.netspeak.org/?query=*%20microsoft%20sucks%20*

Re:Been done? on Developing a Vandalism Detector For Wikipedia · 2010-02-28 11:36 · Score: 1

Exactly, but both kinds of tools need to solve the same underlying problem: given an edit, is it vandalism? The better those tools answer this question, the more time of Wikipedia editors is saved.

Re:Existing on Developing a Vandalism Detector For Wikipedia · 2010-02-28 11:19 · Score: 1

Me too, experience that is. We tooke the feauteres from our research with high througput, and implemented a live edit analysis for the English portion of Wikipedia. It listens on the IRC channel, downloads edits wikitexts of old and new revision, and then does its magic. And it did so once on an old laptop. The computer was connected at max 1 GBit/s.

Re:Existing on Developing a Vandalism Detector For Wikipedia · 2010-02-28 11:16 · Score: 1

I cannot agree more with what you say, but I'd like to give it a twist: I want computers to assist me, and I want them to to it good, reliable, and robust. If I happen to be a Wikipedia editor that doesn't change a thing, I still want the computer to assist me with what I'm doing. Now, currently there is no such thing, and the only thing I'd like to foster research in doing so.

Now, some always go ten steps further, when someone talks about a new "solution" based on computers. They directly envision a world where computers take over. And that, apart from being unrealistic today, must be considered ideological, instead of logical.

After all, all you see here and all you see on Wikipedia is made possible only by machines working with intelligent algorithms.

Re:quite a bit of work on this on Developing a Vandalism Detector For Wikipedia · 2010-02-28 09:54 · Score: 1

Your right, it's machine learning, data mining, NLP, and information retrieval. But the fun thing is turning a research prototype into a tool that can be left alone most of the time. That hasn't happened yet. Also, research on this problem hast started only in 2008, rule-based tools developed by Wikipedians are there since 2006. All the works you listed are acutally all there is! That's not much to work with, is it?

Re:Been done? on Developing a Vandalism Detector For Wikipedia · 2010-02-28 09:43 · Score: 2, Informative

We are very aware of the existing tools (Huggle, Twinkle, and so on). See the links in the above post, and see the links in the resources section of the competition Web page. An accurate vandalism detector will take a lot of research an development, just like spam detectors did... Why did you stop developing your tool, anyway?

Re:Existing on Developing a Vandalism Detector For Wikipedia · 2010-02-28 09:40 · Score: 2, Interesting

This is by far overestimated. Dependent on how elaborate your edit model ist, you can analyse edits live on a laptop.

Re:Existing on Developing a Vandalism Detector For Wikipedia · 2010-02-28 09:38 · Score: 5, Informative

We have studied the accuracy of ClueBot, and found that (on a small corpus) it has very good precision (low falsy positive rate), but a very low recall (low true positive rate). (see: http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf) But the picture might look quite different on a large scale.

Re:Wrong Problem on Competition Seeks Best Approaches To Detecting Plagiarism · 2009-04-28 11:26 · Score: 1

Don't you think there's a difference in the oddities of writing accidentally 10 words that have been written before and 100 words? The former can hardly be called plagiarism, the latter won't happen accidentally.

Re:You said it: Plagiarism detection is easy on Competition Seeks Best Approaches To Detecting Plagiarism · 2009-04-28 10:59 · Score: 1

For a human it is really quite easy to find different writing styles, but for a computer it isn't, yet. That's why there is an analysis tasks dedicated to this problem at the competition.

Re:Irony on Competition Seeks Best Approaches To Detecting Plagiarism · 2009-04-28 10:51 · Score: 1

There's only so much that can be down with current plagiarism detection approaches. We definitely expect similar approaches from unrelated participants.

Re:monkeys on a typewriter on Competition Seeks Best Approaches To Detecting Plagiarism · 2009-04-28 10:49 · Score: 1

No, simply because near-duplicate texts of sufficient length are not written accidentally independent of one another. Take the comments on this page as an example: Although many discuss the same arguments I bet you won't find 10 words in row which appear twice.

Re:Plausible test? on Competition Seeks Best Approaches To Detecting Plagiarism · 2009-04-28 10:45 · Score: 1

It's true, random text operations are not all too realistic. However, if a tool manages to find all of the randomly created cases accurately if will definitely find the subset of texts that are also human-readable.

Re:Insightful fact... on Competition Seeks Best Approaches To Detecting Plagiarism · 2009-04-28 10:07 · Score: 1

I'd say if you obfuscate something enough it eventually becomes an original. Paraphrases are originals, aren't they?

Slashdot Mirror

User: marpot

Comments · 14