Slashdot Mirror


Greatest Task of Web 2.x: Meta-Validation

CexpTretical writes "This Technology Review article about Web 2.x problems fails to mention the 800 pound gorilla in the room when it comes to fulfilling the dreams of the Semantic Web — i.e., assumptions about the validity of metadata or tagging schemes. We can add all of the metadata and/or tags we want to web resources but that does not mean that the 'data about the data' honestly or accurately describe the resource or are 'about the data' at all. This is why Google does not place much importance on the metadata already contained in HTML document headers for search ranking, because it cannot be trusted. And to validate it would require more effort than to search and index that data from scratch. Ensuring or verifying the validity of metadata would be a task equal to that of initially creating it, but would have to be repeated on an ongoing basis. Hence all of the talk about 'trusted networks,' which then require trusting the gatekeepers of those networks. Talk about 'semantics.'" Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.

5 of 161 comments (clear)

  1. Idioms by fossa · · Score: 4, Informative

    I thought it was "elephant in the room"? Googlefight!. We're talking orders of magnitude here... Please tell me that lame TV commercial that botched the idiom isn't starting a trend? I think 800 lb gorilla should remain as the Urban Dictionary's "an overbearing entity in a specific industry or sphere of activity" and not expand to the more abstract, from Wikipedia, "an obvious truth that is being ignored"

  2. The Great Google Metadata Myth by MisterBad · · Score: 3, Informative

    So, I'm really dubious about one of the myths about Google and metadata: that Google doesn't use metadata because it's unreliable.

    Google does, in fact, use metadata -- tons of it. Google uses explicit metadata built into headers (like the description, robot control); it uses the rel-license microformat; and it uses titles and h1 headers. It also uses some crucial metadata that's not self-reported by the Web site -- namely, the number and text of links inbound towards a page. It also uses metadata in HTTP headers.

    Google also uses lots of data that is unreliable or could be dishonest. After all, there's a huge dark business of blackhat SEO that has its sole intention to trick Google's bots into thinking pages are more important (or are on a different subject) than they actually are. There is no particular part of an HTML page or any other Web resource that cannot be a lie. Web spiders have to deal with this all the time, and they have to balance the information they get from different data sources to determine what's true and what's not.

    It's true that Google's search results don't depend as heavily on the specific meta keywords the way many first-generation search engines did. But I think that's more a consideration of the remarkable naivete of early search engines than anything else.

    --
    Evan Prodromou | evan@prodromou.name | http://evan.prodromou.name/
  3. Re:Yep. No functionality aside from in-jokes by ben+there... · · Score: 2, Informative
    You can't search on them

    You (sort of) can. Go to http://www.slashdot.org/tags/foo
    and you can't get "More articles like this".

    Click the tags that are listed, rather than clicking the arrow. If the tags were meaningful you'd get similar articles.
  4. Re:You can't trust the moderation system either by Eli+Gottlieb · · Score: 2, Informative

    Ironically, you've been modded +5 Insightful. The groupthink thinks there's a groupthink.

    Anyway, you should be happy that we have Slashdot's moderation system. Here, content-free jokes and trolls get modded up and relatively anyone with a long, reasoned-out post can receive some upmodding (though not necesarrily to +5). On sites like Digg and Reddit, disagreeing with the consensus opinion gets you modded deep into the bowels of hell, because everyone has a mod point for each post and the site places no caps or floors on moderation values.

  5. Metadata is a great idea... by zappepcs · · Score: 2, Informative

    The trouble is that if there was a assured way to implement it, it would already have been implemented. Metadata and tags are simply the 'killer app' for web 2.0

    Despite all that has been said in the comments and elsewhere, there simply is no good implementation of metadata for the Internet that applies to all types of data and all instances of data sharing.

    If you want to be a hero, figure this little problem out and the world will beat a path to your door... so to speak.