Algorithm Rates Trustworthiness of Wikipedia Pages

← Back to Stories (view on slashdot.org)

Algorithm Rates Trustworthiness of Wikipedia Pages

Posted by CowboyNeal on Thursday August 30, 2007 @11:38PM from the getting-it-right dept.

paleshadows writes "Researchers at UCSC developed a tool that measures the trustworthiness of each Wikipedia page. Roughly speaking, the algorithm analyzes the entire 7-year user-editing-history and utilizes the longevity of the content to learn which contributors are the most reliable: If your contribution lasts, you gain 'reputation,' whereas if it's edited out, your reputation falls. The trustworthiness of a newly inserted text is a function of the reputation of all its authors, a heuristic that turned out to be successful in identifying poor content. The interested reader can take a look at this demonstration (random page with white/orange background marking trusted/untrusted text, respectively; note "random page" link at the left for more demo pages), this presentation (pdf), and this paper (pdf)."

16 of 175 comments (clear)

Light Bulb Moment by dsginter · 2007-08-30 23:41 · Score: 5, Funny

Someone should make a wikipedia entry for this algorithm to see how trustworthy it is.

--
More
1. Re:Light Bulb Moment by marcello_dl · 2007-08-31 01:33 · Score: 5, Interesting
  
  Sounds crappy. Let's say you expose some important misdeed. You're likely to be edited out by an army of paid staff who keeps an eye on the 'net. (don't tell me I'm paranoid because i saw it happening and read about stuff like that in the news, even slashdot). You are not contributing much else to wikipedia because you simply wanted to expose what's in your knowledge, so you'll end up with a low karma.
  
  Anyway, i guess it'll be another pagerank or slashdot filter affair. People trying to beat it, devs trying to make it better.
  
  The plus is, there is not only wikipedia. You can always search the rest of the web.
  The minus is, you search the rest of the web with google which is equivalent if not worse.
  
  We need a good search engine on top of a tor network, and bandwidth to make it run smooth. Not many other way to achieve real net freedom.
  
  --
  ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
Seems a bit dangerous by fymidos · 2007-08-30 23:45 · Score: 4, Insightful

>If your contribution lasts, you gain 'reputation,' whereas if it's edited out, your reputation fails

And the editor wars start ...

--
Washington bullets will simply be known as the "Bulle
1. Re:Seems a bit dangerous by N!k0N · 2007-08-30 23:52 · Score: 4, Insightful
  
  Yeah, that is a bit of a "dangerous" way to go about rating the content, however I think it could be a step in the right direction. If this can be improved, perhaps the site will gain a better reputation in the eyes of professors. Now, I don't doubt that there is a lot of misinformation on the site (intentional or otherwise); however, a good deal of the information I have used for research papers or to quickly check something seems to be confirmed elsewhere (texts, journals, etc).
Doesn't take into account common myths by Cryophallion · 2007-08-30 23:51 · Score: 5, Interesting

So, if there is a myth that a lot of people believe is true, then it will stay up there as it is not challenged. So, it still gets reputation, and therefore more credibility, making it more likely that the myth will be perpetrated.

Also, if someone hasn't noticed something that is wrong on an esoteric entry, it will also be given credibility, and once again be more likely to be considered to be fact.

While you could add voting to the algorithm to have people vote on whether it is true, that still gets destroyed by someone who just votes because they think it's true, not because they have verified it.

Either way, it potentially gives additional credibility to something that may be very wrong.
Seems to work ... by Purity+Of+Essence · 2007-08-30 23:51 · Score: 5, Funny

Seems to work, the entire page turned orange.

--
+0 Meh
hmmm... by PJ1216 · 2007-08-30 23:52 · Score: 5, Funny

They should just call it wiki-karma.
#REDIRECT by Chris+Pimlott · 2007-08-30 23:56 · Score: 4, Insightful

It appears they include #REDIRECT pages; the very first page the random link took me to was Cheliceriformes, with the #REDIRECT line in orange. Seems an easy way to gain trust, once a redirect is created it is hardly ever changed.
I dunno about this system. by Wilson_6500 · 2007-08-30 23:57 · Score: 5, Insightful

Does it take into account magnitude of error corrections? If major portions of someone's articles are being rewritten, that's a good reason to de-rep them. If someone makes a bunch of minor spelling or trivial errors, then that's not necessarily a reason to do so.

And, of course, there is the potential for abuse. If the software could intelligently track reversions and somehow ascribe to those events a neutral sort of rep, that would probably help the system out.

As it stands, they're essentially trying to objectively judge "correctness" of facts without knowing the actual facts to check. That's somewhat like polling a college class for answers and assigning grades based on how many other people DON'T say that they disagree with a certain person in any way.
I suspect this heuristic measures.... by Anonymous Coward · 2007-08-30 23:57 · Score: 5, Insightful

the relative controversy of the item being edited.

If I edit a history page of a small rural village near where I live, I can guarantee that it will remain unaltered. None of the five people who have any knowledge or interest in this subject have a computer.

If I edit an item on Microsoft attitude to standards, or the US occupation of Iraq, I'm going to be flamed the minute the page is saved, unless I say something so banal that noone can find anything interesting in it.

But my Microsoft page might be accurate, and my village history a tissue of lies....
Tuned for Subject Matter by erroneous · 2007-08-30 23:58 · Score: 5, Insightful

Sounds like a worthy start to the process of introducing more trustworthyness into Wikipedia entries, but this maybe needs tuning for content type too.

Afterall just because someone is a reliable expert at editing the wikipedia entries on Professional Wrestling or Superheroes doesn't necessarily mean we should trust their edits on, for instance, the sensitive issues of Tibetan sovereignty.

--
erroneous: look me up in a dictionary
Unpopular but neutral points of view? by Knuckles · 2007-08-31 00:01 · Score: 5, Interesting

I realize that an encyclopedia by definition will always emphasize the established majority opinion about any given subject. But it seems that this tool might strengthen majority opinions beyond what is reasonable. If you happen to edit an article by adding valid but unpopular dissenting points of view, and the other contributors are sufficiently boneheaded, you lose karma (or whatever the tool calls it) for no good reason. This might then easily develop a life of its own, and you are screwed.

--
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Tyranny of the majority by G4from128k · 2007-08-31 00:06 · Score: 5, Insightful

Although this method will certainly help filter pranks and cranks, it won't help if the "consensus" among wikipedia authors is wrong. If a true expert edits a page, but the masses don't agree with the edit, they will undo the expert's addition and give the expert a low reputation. Thus, the trust rating becomes a tool for maintaining erroneous, but popular ideas.

That said, I can't help but believe that this tool is a net positive because it makes points of debate more visible. One could even argue that it literally highlights the frontiers of human knowledge. That is, high-trust (white) text is well known material and highlighted (orange) text represents contentious or uncertain conclusions.

--
Two wrongs don't make a right, but three lefts do.
1. Re:Tyranny of the majority by Anonymous+Brave+Guy · 2007-08-31 00:35 · Score: 4, Insightful
  
  Yes, this system demonstrates the correlation between the content and the majority opinion, not between the content and the correct information (assuming such objectively exists).
  
  Of course, if you take as an axiom that the majority opinion will, in general, be more reliable than the latest random change by a serial mis-editor, then the correlation with majority opinion is a useful guideline.
  
  Something that might be rather more effective, though perhaps less practical, is for Wikipedia to bootstrap the process much as Slashdot once did: start with a small number of designated "experts", hand-picked, and give them disproportionate reputation. Then consider secondary effects when adjusting reputation: not just whether something was later edited, but the reputation of the editor, and the size of the edit.
  
  This doesn't avoid the underlying theoretical flaw of the whole idea, though, which is simply that in a community-written site like a wiki, edits are not necessarily bad things. Someone might simply be replacing the phrase "(an example would be useful here)" with a suitable example. This would be supporting content that was already worthwhile and correct, not indicating that the previous version was "untrustworthy".
  
  --
  If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Goddamn... by gowen · 2007-08-31 00:10 · Score: 5, Funny

How did they pass up the chance to name this algorithm "Truthiness"?

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
It doesn't have to be perfect by KingSkippus · 2007-08-31 00:14 · Score: 5, Insightful

No algorithm, except maybe personally checking every single article yourself, will ever be perfect. I suspect that the stuff you talk about will be very rare exceptions, not the rule. In fact, one of the reasons that it is so rare is because people who know what the actual truth of a matter is can post it, cite it, and show it for all to see that some common misconception is, in fact, a misconception. This is much better than, say, a dead tree encyclopedia where, if something incorrect gets printed, it will likely stay that way forever in almost every copy that's out there. (And, incidentally, no such algorithm can exist, since dead tree encyclopedias generally don't include citations and/or articles' editing histories.)

The goal wasn't to create a 100% perfect algorithm, it was to create an algorithm that provides a relatively accurate model and that works in the vast majority of cases. I don't see any reason this shouldn't fit the bill just fine.