Greatest Task of Web 2.x: Meta-Validation

← Back to Stories (view on slashdot.org)

Greatest Task of Web 2.x: Meta-Validation

Posted by ryuzaki0 on Sunday December 3, 2006 @02:33PM from the vetting-the-metadata dept.

CexpTretical writes "This Technology Review article about Web 2.x problems fails to mention the 800 pound gorilla in the room when it comes to fulfilling the dreams of the Semantic Web — i.e., assumptions about the validity of metadata or tagging schemes. We can add all of the metadata and/or tags we want to web resources but that does not mean that the 'data about the data' honestly or accurately describe the resource or are 'about the data' at all. This is why Google does not place much importance on the metadata already contained in HTML document headers for search ranking, because it cannot be trusted. And to validate it would require more effort than to search and index that data from scratch. Ensuring or verifying the validity of metadata would be a task equal to that of initially creating it, but would have to be repeated on an ongoing basis. Hence all of the talk about 'trusted networks,' which then require trusting the gatekeepers of those networks. Talk about 'semantics.'" Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.

13 of 161 comments (clear)

Min score:

Reason:

Sort:

Meta data by Andrewkov · 2006-12-03 14:37 · Score: 4, Interesting

Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.
The tagging system might be a better example, or at least an example of mostly useless meta information.
Speaking of Slashdot's metadata... by Anonymous Coward · 2006-12-03 14:39 · Score: 5, Insightful

What about the removal of accurate metadata, such as Slashdot's disabling of the "dupe" tag?
1. Re:Speaking of Slashdot's metadata... by Anonymous Coward · 2006-12-03 14:45 · Score: 4, Funny
  
  That's because there have never been any dupes. Ever.
  
  Please move along folks, there's nothing to see here.
2. Re:Speaking of Slashdot's metadata... by ewl1217 · 2006-12-03 15:05 · Score: 5, Funny
  
  A tag isn't useful if it works for every article...
You can't trust the moderation system either by BadAnalogyGuy · 2006-12-03 14:40 · Score: 4, Insightful

Especially here at Slashdot where a certain type of groupthink is very prevalent, it's not so much a matter of whether a comment is insightful or interesting so much as it adheres to the consensus view of the moderators. A non-conforming view is labeled 'Troll'. So in one sense, the metadata provided by the moderation system is useful in that you can tell at a glance how well a comment conforms to the Slashdot zeitgeist just by looking at its moderation score.

However since posts lower than zero do not get displayed automatically, views that are unappealing to the Slashdot community are relegated to obscurity regardless of their validity and correctness.

Linux sucks.
1. Re:You can't trust the moderation system either by TheFlyingGoat · 2006-12-03 14:48 · Score: 4, Interesting
  
  You are definitely correct, but I wonder if this would be the same in a search environment like Google. First, you have a much broader selection of people that can mark meta-data as being accurate or not. Second, people will not see the meta-data without specifically searching for it. This means that the people searching for "swingers in Milwaukee" will most likely be people that don't frown upon such behavior. There are still obvious issues, like people searching for more general controversial terms like creationism/evolution or people that disagree with a certain behavior organizing against certain sites by "moderating" them poorly. I could easily see this happening in politics and religion.
  
  --
  You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
2. Re:You can't trust the moderation system either by aquaepulse · 2006-12-03 14:54 · Score: 4, Insightful
  
  it's not so much a matter of whether a comment is insightful or interesting so much as it adheres to the consensus view of the moderators
  You seem to be arguing against yourself. Moderators are chosen from a large pool according to rules described in moderation guidelines. It stands to reason that if these moderators come to consensus about a post, then that consensus would be descriptive of the post.
3. Re:You can't trust the moderation system either by grcumb · 2006-12-03 15:01 · Score: 5, Insightful
  
  [S]ince posts lower than zero do not get displayed automatically, views that are unappealing to the Slashdot community are relegated to obscurity regardless of their validity and correctness.
  
  Here's a thought: Rather than indulging in self-satisfied name-calling, why not perform some analysis on the moderation system and actually try to provide some evidence for your facile assertion? It's pretty easy to do, precisely because the kind of abuse you claim is rampant here would also be completely transparent, if it were happening.
  
  For my part, I have no inclination to agree with your assertion, because in the 2 years I've been meta-moderating daily, I haven't seen more about 1% of posts[*] that show such symptoms. On the contrary, if my experience is any guide, there's a far more common tendency to content-free comments like yours upward than to mod unpopular, but well-argued, comments downward. The consistency of the data, and the fact that it's semi-randomly selected for me, leads me to believe that it's statistically significant, and that my experience doesn't differ significantly from anyone else's.
  
  YMMV, but the burden of proof does lie with the accuser, so please back your assertion with evidence.
  
  [*] I base that on viewing slightly less than 1 abusive down-mod a week, or 1 in 80-90 moderations.
  
  --
  Crumb's Corollary: Never bring a knife to a bun fight.
Idioms by fossa · 2006-12-03 14:59 · Score: 4, Informative

I thought it was "elephant in the room"? Googlefight!. We're talking orders of magnitude here... Please tell me that lame TV commercial that botched the idiom isn't starting a trend? I think 800 lb gorilla should remain as the Urban Dictionary's "an overbearing entity in a specific industry or sphere of activity" and not expand to the more abstract, from Wikipedia, "an obvious truth that is being ignored"
The difficulty: association is not relation by traindirector · 2006-12-03 15:06 · Score: 5, Insightful

Working with metadata from a non-trusted community is a few orders of difficulty harder than working with trusted metadata. All the examples from non-trusted user groups that I've seen are either 1) only able to track fairly simple data or 2) ambitious but disappointing. I'd put Slashdot's moderation and metamoderation in the first category. Relevance, quality, and a few kinds of description are possible, but these are fairly simple things to track. Most internet resources would require metadata that is much harder to validate to be useful.

A primary example of this that comes to my mind is the current crop of music recommendation services. The idea behind these sites is that they can, through one of various methods, recommend music to you based on what you like. I've experimented somewhat extensively with Pandora and Last.fm, and the difference in the quality of their suggestions is amazing.

Last.fm uses community data for recommendations. It tracks tags that users attach to songs and the collection of artists that each user listens to. Based on what artists you have listened to or which tags you select, it attempts to point out other artists you might like.

Pandora makes recommendations based on musical qualities. The data the service uses comes from the Music Genome Project, which paid people who have studied music to catalogue the musical qualities of songs in their database. Employees listen to songs and select which attributes are applicable to the song from a list of hundreds of attributes. To use the service, you enter some songs and artists that you like, and based on the musical attributes of those songs and artists, it recommends other songs you might like.

The results that the services provide, at least in my case, are like night and day. Last.fm's recommendations are heavily influenced by what's popular and how a common user would categorize an artist or song. They sort-of hit the right areas, but it doesn't get much better than Amazon's recommendations. Pandora's recommendations always seem to be more on target, even though it uses only a few artists or songs that you enter at the start, in contract to Last.fm, which can use my entire play history.

I guess a lot of this can be chalked up to the difference between association and relation - without some type of new innovation, it seems that community-based metadata can only be based on association, which is a far cry short of relation. Yes, it is a type of relation, but a set of data has qualities that a few simple tags from users are not going to be able to touch. It seems to me the next generation of metadata will only be possible when we can figure out a way to get the sort of data that Pandora uses from a community group. It's a daunting challenge that tagging and simple user activities like the Google Image Labeller have just started to slightly touch.
Mod Spam? by quanticle · 2006-12-03 15:17 · Score: 4, Interesting

Here on Slashdot, there is a selection process and a reputation system that determines who has the ability to moderate. How does this "Web 2.0" address the fact that anyone can attach and moderate tags?

--
We all know what to do, but we don't know how to get re-elected once we have done it
Yep. No functionality aside from in-jokes by patio11 · 2006-12-03 15:44 · Score: 4, Insightful

You can't search on them, you don't have any incentive to tag them for yourself (since everyone is limited to the same 5 tags or so), and you can't get "More articles like this". Is it any shocker that they've turned into a veritable festival of in-jokes which provide no information you couldn't get from reading the summary? Heck, after you've read the headline you can provide all the tags:

"Is Linux ready for desktop?"

yes, no, fud, notfud -- and it would be marked omgponies, dupe, and thistagisfreakinguseless if any of those options weren't automatically stripped.

Its almost like tags are designed to be useless here, in a way that they're not with delicious (put the periods in wherever you want them -- I use www.delicious.com and I am so very glad it works). I can use delicious as a "Hmm, I want to read this later" bookmark-shared-across-machines, to categorize Java samples for my own use later, and to do things which are of use to *me*. The social aspect grows naturally from the personal uses, because when you mark Sun's whitepaper as being about Java or this photo on flickr as being of sakura everyone else gets to piggyback on your diligence. But if there isn't any personal use possible then tagging is just textual autoeroticism.

You can mark me fud and omgponies if you want.

--
Help poke pirates in the eyepatch, arr.
Embrace the future. by Kadin2048 · 2006-12-03 22:07 · Score: 4, Funny

I think the tags are great; they let me get my whole article's worth of Slashdot groupthink in just a few seconds of skimming.

For instance: "IT: Vista Designed to Make Malware Easy" is tagged "troll, fud, vista, notfud, microsoft". I mean -- that's it! That's the whole discussion right there. Point, spastic head-nodding, counterpoint, rehash of the original article. Thank you sir, may I have another.

I'm hopeful that on some future "Slashdot Mobile," they'll remove everything but the titles and tags, and display it as a feed. Maybe after that, they'll even get rid of the titles, so you can just see a constant stream of tags.

Forget a boot stamping on the face of humanity; that's the future for you: "microsoft fud notfud troll itsatrap google dupe evil internet hardware nvidia slashvertisement pigpile dupe sun esr fud ubuntu dupe microsoft dupe ... "

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."