Greatest Task of Web 2.x: Meta-Validation

← Back to Stories (view on slashdot.org)

Greatest Task of Web 2.x: Meta-Validation

Posted by ryuzaki0 on Sunday December 3, 2006 @02:33PM from the vetting-the-metadata dept.

CexpTretical writes "This Technology Review article about Web 2.x problems fails to mention the 800 pound gorilla in the room when it comes to fulfilling the dreams of the Semantic Web — i.e., assumptions about the validity of metadata or tagging schemes. We can add all of the metadata and/or tags we want to web resources but that does not mean that the 'data about the data' honestly or accurately describe the resource or are 'about the data' at all. This is why Google does not place much importance on the metadata already contained in HTML document headers for search ranking, because it cannot be trusted. And to validate it would require more effort than to search and index that data from scratch. Ensuring or verifying the validity of metadata would be a task equal to that of initially creating it, but would have to be repeated on an ongoing basis. Hence all of the talk about 'trusted networks,' which then require trusting the gatekeepers of those networks. Talk about 'semantics.'" Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.

20 of 161 comments (clear)

Min score:

Reason:

Sort:

Speaking of Slashdot's metadata... by Anonymous Coward · 2006-12-03 14:39 · Score: 5, Insightful

What about the removal of accurate metadata, such as Slashdot's disabling of the "dupe" tag?
You can't trust the moderation system either by BadAnalogyGuy · 2006-12-03 14:40 · Score: 4, Insightful

Especially here at Slashdot where a certain type of groupthink is very prevalent, it's not so much a matter of whether a comment is insightful or interesting so much as it adheres to the consensus view of the moderators. A non-conforming view is labeled 'Troll'. So in one sense, the metadata provided by the moderation system is useful in that you can tell at a glance how well a comment conforms to the Slashdot zeitgeist just by looking at its moderation score.

However since posts lower than zero do not get displayed automatically, views that are unappealing to the Slashdot community are relegated to obscurity regardless of their validity and correctness.

Linux sucks.
1. Re:You can't trust the moderation system either by aquaepulse · 2006-12-03 14:54 · Score: 4, Insightful
  
  it's not so much a matter of whether a comment is insightful or interesting so much as it adheres to the consensus view of the moderators
  You seem to be arguing against yourself. Moderators are chosen from a large pool according to rules described in moderation guidelines. It stands to reason that if these moderators come to consensus about a post, then that consensus would be descriptive of the post.
2. Re:You can't trust the moderation system either by grcumb · 2006-12-03 15:01 · Score: 5, Insightful
  
  [S]ince posts lower than zero do not get displayed automatically, views that are unappealing to the Slashdot community are relegated to obscurity regardless of their validity and correctness.
  
  Here's a thought: Rather than indulging in self-satisfied name-calling, why not perform some analysis on the moderation system and actually try to provide some evidence for your facile assertion? It's pretty easy to do, precisely because the kind of abuse you claim is rampant here would also be completely transparent, if it were happening.
  
  For my part, I have no inclination to agree with your assertion, because in the 2 years I've been meta-moderating daily, I haven't seen more about 1% of posts[*] that show such symptoms. On the contrary, if my experience is any guide, there's a far more common tendency to content-free comments like yours upward than to mod unpopular, but well-argued, comments downward. The consistency of the data, and the fact that it's semi-randomly selected for me, leads me to believe that it's statistically significant, and that my experience doesn't differ significantly from anyone else's.
  
  YMMV, but the burden of proof does lie with the accuser, so please back your assertion with evidence.
  
  [*] I base that on viewing slightly less than 1 abusive down-mod a week, or 1 in 80-90 moderations.
  
  --
  Crumb's Corollary: Never bring a knife to a bun fight.
3. Re:You can't trust the moderation system either by bunions · 2006-12-03 17:21 · Score: 3, Insightful
  
  the contrarian viewpoint always looks insightful, regardless of it's merits.
  
  The fact that there's a general consensus viewpoint that tends to re-enforce itself is just an artifact of human nature. Slashdot, not being any great exception to the human condition, does what it can to reduce this, and in my eyes does about as decent job as you're going to have done when you let the mob moderate itself.
  
  --
  there is no need to sign your posts. this isn't usenet. your username is right there above your post. stop it.
4. Re:You can't trust the moderation system either by logicnazi · 2006-12-03 21:26 · Score: 2, Insightful
  
  Wow, that's an incredible example. I've been arguing for a long time for a more punative meta-moderation system. That is a system that yanks moderation rights for anyone who gets enough bad meta-moderation from enough different sources (since you can't select what you meta-mod this avoids the stalker problem).
  
  Additionally to reply to your parent the moderation issue is still a problem even without evil abuses like this. Back when I used to post more frequently I noticed that my comments that were equally if not better reasoned but supported views some people wanted to dismiss as obviously false would disappear below threshold sometimes (not always but it never happened with comments with more slashdot mainstream type conclusions). For instance comments expressing non-standard moral conclusions, e.g., killing people isn't itself a harm only the suffering it causes to the living matters.
  
  The system is reasonably good for fairly simple points and I think most people try to moderate fairly. It doesn't scale well to more complex issues or any situation where people think some things are just stupid even though they aren't, e.g., divisive issues like abortion.
  
  --
  If you liked this thought maybe you would find my blog nice too:
5. Re:You can't trust the moderation system either by SpectreHiro · 2006-12-03 23:15 · Score: 2, Insightful
  
  I generally agree with your point, but I'm also reminded of a quote that makes its way around the internet periodically:
  
  "Democracy is two wolves and a lamb voting on what to have for dinner."
  
  You've picked out a particular failing of a republic, and it's a valid point. The thing I never see, and I'd very much like to, is a recommendation for a better system. It's not enough to complain about the state of things. A complaint is worthless if it isn't accompanied by a superior solution.
  
  So, how do we combat groupthink? Is there something better than consensus to evaluate the worth of a post? The current system does a reasonable job of supressing the posts that hold little or no intrinsic value, but how do we do that without also smothering otherwise intelligent posts which hold a dissenting opinion? Of course, there's always anarchy, but that has drawbacks which I think are obvious enough without specific illustration.
  
  Bitch of a question, isn't it? If you've got an answer, I'm sure the slash-mods would love to hear it.
  
  --
  You can't win, Darth. If you mod me down, I shall become more powerful than you could possibly imagine.
The difficulty: association is not relation by traindirector · 2006-12-03 15:06 · Score: 5, Insightful

Working with metadata from a non-trusted community is a few orders of difficulty harder than working with trusted metadata. All the examples from non-trusted user groups that I've seen are either 1) only able to track fairly simple data or 2) ambitious but disappointing. I'd put Slashdot's moderation and metamoderation in the first category. Relevance, quality, and a few kinds of description are possible, but these are fairly simple things to track. Most internet resources would require metadata that is much harder to validate to be useful.

A primary example of this that comes to my mind is the current crop of music recommendation services. The idea behind these sites is that they can, through one of various methods, recommend music to you based on what you like. I've experimented somewhat extensively with Pandora and Last.fm, and the difference in the quality of their suggestions is amazing.

Last.fm uses community data for recommendations. It tracks tags that users attach to songs and the collection of artists that each user listens to. Based on what artists you have listened to or which tags you select, it attempts to point out other artists you might like.

Pandora makes recommendations based on musical qualities. The data the service uses comes from the Music Genome Project, which paid people who have studied music to catalogue the musical qualities of songs in their database. Employees listen to songs and select which attributes are applicable to the song from a list of hundreds of attributes. To use the service, you enter some songs and artists that you like, and based on the musical attributes of those songs and artists, it recommends other songs you might like.

The results that the services provide, at least in my case, are like night and day. Last.fm's recommendations are heavily influenced by what's popular and how a common user would categorize an artist or song. They sort-of hit the right areas, but it doesn't get much better than Amazon's recommendations. Pandora's recommendations always seem to be more on target, even though it uses only a few artists or songs that you enter at the start, in contract to Last.fm, which can use my entire play history.

I guess a lot of this can be chalked up to the difference between association and relation - without some type of new innovation, it seems that community-based metadata can only be based on association, which is a far cry short of relation. Yes, it is a type of relation, but a set of data has qualities that a few simple tags from users are not going to be able to touch. It seems to me the next generation of metadata will only be possible when we can figure out a way to get the sort of data that Pandora uses from a community group. It's a daunting challenge that tagging and simple user activities like the Google Image Labeller have just started to slightly touch.
One thing that I don't see mentioned by ameyer17 · 2006-12-03 15:11 · Score: 3, Insightful

For metadata to be useful at all, there has to be some way to come to a consensus, and the most logical way to come to a consensus is by what the majority thinks. However, there are too many examples where the majority is wrong for metadata to be truly useful in my opinion.
Greatest Task of Web 2.0: Materialization by Dracos · 2006-12-03 15:39 · Score: 2, Insightful

Web 2.0 is an empty buzzword for the evolution of the internet. There is no single event that can be unequivocably be called the atart of "Web 2.0".

According to Daniel Glazman, Tim Berners-Lee has officially given up on XHTML as of last week's W3C Advisory Committee meeting in Tokyo, and then apparently explains what Web 3.0 is supposed to be.

TBL is apparently not the visionary we all thought he was. Apparently no one in the W3C can (or is willing to) figure out how to relegate HTML to the junk heap, like a 286 computer: it was a good idea at the time, but newer technology has come along. Eventually, someone will want to see one in a museum. Contrary to popular reports, the W3C has not fixed itself, but merely rolled back the clock on itself a decade or so.

After 8 years, what do all the developers who embraced XHTML get for our efforts? Our smorgasboard of web standards becomes a (tag) soup kitchen once again.

Web 2.0 is a fleeting concept with no substance, it's existence can only be inferred by serruptitiously attributing semi-related events to its influence. Now that the inventor of the WWW has bought into this folly, and simultaneously abandoned one of the W3C's greatest achievements, how can anyone put any stock in what he or anyone else at W3C says?

I held out longer than most in my hopes that web standards could be straightened out, but now the W3C is dead by its own hand, after 6 or more years of atrophy, manic depression, and schizophrenia.
Yep. No functionality aside from in-jokes by patio11 · 2006-12-03 15:44 · Score: 4, Insightful

You can't search on them, you don't have any incentive to tag them for yourself (since everyone is limited to the same 5 tags or so), and you can't get "More articles like this". Is it any shocker that they've turned into a veritable festival of in-jokes which provide no information you couldn't get from reading the summary? Heck, after you've read the headline you can provide all the tags:

"Is Linux ready for desktop?"

yes, no, fud, notfud -- and it would be marked omgponies, dupe, and thistagisfreakinguseless if any of those options weren't automatically stripped.

Its almost like tags are designed to be useless here, in a way that they're not with delicious (put the periods in wherever you want them -- I use www.delicious.com and I am so very glad it works). I can use delicious as a "Hmm, I want to read this later" bookmark-shared-across-machines, to categorize Java samples for my own use later, and to do things which are of use to *me*. The social aspect grows naturally from the personal uses, because when you mark Sun's whitepaper as being about Java or this photo on flickr as being of sakura everyone else gets to piggyback on your diligence. But if there isn't any personal use possible then tagging is just textual autoeroticism.

You can mark me fud and omgponies if you want.

--
Help poke pirates in the eyepatch, arr.
Re:Screw the meta-data validation... by timeOday · 2006-12-03 15:54 · Score: 2, Insightful

What I want from Web 2.0 is micropayments, by which I mean a form of digital cash with no more than 1% transaction fee down to a minimum transaction fee of 1 cent. I suspenct all the web content that's free now would still be free, but the ability to make money straight from viewers of a web page would be a revolution.
You can't trust Cowboy Neal either by Anonymous Coward · 2006-12-03 15:59 · Score: 1, Insightful

Interesting. Maybe some "imtelligent poster" should study what groupthink* really is, as opposed to what this forum thinks it is.

*Not just definitions, but mechanisms, and scope. Throw in statistics and psychology for extra credit. Serve to an audiance that will go "ho hum".
Wrong, bucko! by Anonymous Coward · 2006-12-03 16:20 · Score: 1, Insightful

After 8 years, what do all the developers who embraced XHTML get for our efforts? Our smorgasboard of web standards becomes a (tag) soup kitchen once again.

No. What we get is the XHTML 1.0 and XHTML 1.1 standards to work with. For the vast majority of Web-based tasks, those are more than suitable. Being based on XML, they put a great deal more emphasis on correctness and consistency. While this puts an increased burden on the developer of Web sites and applications, it does often lead to far higher-quality pages. In addition, those standards have helped out browser developers extensively.

We had a client with a site mostly written in HTML 4.0. It didn't display well with several browsers, including Opera. So we did pretty much a straight conversion to XHTML 1.1. Now their site works perfectly fine with every browser we tried, including more picky browsers like Amaya, and text-based browsers like Links, Lynx, and w3m.

What we've seen of "Web 2.0" has been crappy. It's built upon layers of shit like JavaScript and Flash. For any serious Web page, those technologies are often best avoided. They bring nothing but browser incompatibilities and hassle. So we find that XHTML 1.1 works very well, with the result being web pages that display in virtually every browser out there, even ones as old as Netscape Navigator 4.x.
Just Asking For It by mattwarden · 2006-12-03 16:39 · Score: 2, Insightful

Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.

Why, oh why, would you include that at the end of the summary? Even if there weren't horrible issues with the moderation system (there are), this particular audience is going to rip that comment apart.
Re:Slashdot's moderation is pretty good by Doc+Ruby · 2006-12-03 17:53 · Score: 2, Insightful

Er, you've posted 23 times. You don't have anywhere near enough experience to make the kinds of claims (and dismissals) that you're making.

Which is reflective of the quality of discourse on Slashdot. Kinda fun, but far from rigorous enough to be taken seriously. The shabby meta/moderation system reinforces that low quality.

--
--
make install -not war
Re:Mod Spam? by PCM2 · 2006-12-03 18:54 · Score: 3, Insightful

Here on Slashdot, there is a selection process and a reputation system that determines who has the ability to moderate.

Is that true? My understanding was that any registered user with an account older than X period of time was eligible to moderate.
If there really is some sort of reputation system, I'm not sure I approve of that. For example, I've been reading Slashdot for close to 10 years. Check out my account number. Presumably I have a pretty good "reputation." But then again, I love a really good troll.(*) I've been known to post a few, too. (Ssshh!) Based on those facts, should I really be allowed to moderate more than somebody else, just because my "reputation" is ostensibly more established?
Wait ... did I say that? Or only think it?
(*) It's a pity there are so few really good trolls anymore.

--
Breakfast served all day!
Metadata, Ajax and Trusted by logicnazi · 2006-12-03 20:18 · Score: 2, Insightful

First of all it just isn't true that slashdot moderation is an example of useful metadata from an untrusted source. The *presenter* of the metadata, i.e., slashdot, is a trusted source. When we see a comment with moderation 5 we know the slashdot system has moderated it 5 and that some random spammer didn't just lie and give it moderation 5. Sure this metadata is created based on 'untrusted' input but that is a different matter entirely and in reality the sources are sorta trusted because only accounts who contribute sufficiently get to moderate. The tagging thing might be an example of a useful app where the metadata is formed from untrusted input but either way the example isn't quite on target.

As for the issue of metadata on the web it is a serious concern and search engines can't continue to just ignore it. As ajax and other dynamic presentation technologies become more and more common less and less of the content on the web will be encoded in simple HTML. Sure everyone who writes up some fancy ajax site and isn't an idiot will leave some html files around for google to index but this doesn't solve the problem. If everyone who visits the site sees something other than the info in the HTML then the HTML itself has become the metadata.

This problem is solvable since, as the success of google itself indicates, if the data is being used by the end user for some significant purpose the authors stay honest. The reason websites sometimes give bogus meta tags is because it doesn't affect the user's experience in the least. If we get something like the semantic web where the users are actually making use of the metadata then things are no different than they are now.

I hope this is what happens as the other option where google starts learning to crawl through ajax calls is much less pleasant. It was bad enough when all ruby actions were gets and google would trigger all sorts of things to happen in your app. It will be far worse if they are deliberately trigger all the JS scripts on your page in order to search effectively. And they *need* to be able to search effectively as that is the heart of why the web works.

Alternatively maybe google could start incentivizing accurate metadata descriptions of *other* pages (via outgoing links) by giving your web page a boost in the rankings. Thus, like wikipedia, perhaps enough good contributions would outweigh the bad ones.

--
If you liked this thought maybe you would find my blog nice too:
Web 2.0 is schizophrenic by idlake · 2006-12-03 20:22 · Score: 2, Insightful

On the one hand, people are trying to sell Web 2.0 as the "semantic web", on the other hand, AJAX is a big part of Web 2.0 apps and makes it harder and harder to actually get at other people's semantic data.

In the end, the whole thing is just marketing hype. Web 2.0 is just the haphazard collection of messy technologies people happen to be using on the web in 2006, and don't expect things to get any better in the next few years either: the W3C, Adobe, and Microsoft will see to it that things remain messy and complex, because, heck, if we actually made the technologies clean and simple, how would these companies and the swarm of overpaid and underqualified consultants make a living?
Re:Mod Spam? by CastrTroy · 2006-12-04 01:39 · Score: 2, Insightful

5) That the user is a positive contributor, meaning that they have a non-negative karma.
This is the kicker. In order to be a moderator, you must have a positive karma, which means you must post comments that contribute to the slashdot groupthink. Anybody can think up formulaic posts in order to get their karma up, but you rarely see this, because you don't get modpoints that often, and trying to rig the entire moderation system would be hard, impossible, or simply just not worth anyone's time, as it would require many accounts. So, the moderation system just enforces the groupthink, because the only people with the ability to mod are the people that have the same views as everyone else.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.