Slashdot Mirror


Greatest Task of Web 2.x: Meta-Validation

CexpTretical writes "This Technology Review article about Web 2.x problems fails to mention the 800 pound gorilla in the room when it comes to fulfilling the dreams of the Semantic Web — i.e., assumptions about the validity of metadata or tagging schemes. We can add all of the metadata and/or tags we want to web resources but that does not mean that the 'data about the data' honestly or accurately describe the resource or are 'about the data' at all. This is why Google does not place much importance on the metadata already contained in HTML document headers for search ranking, because it cannot be trusted. And to validate it would require more effort than to search and index that data from scratch. Ensuring or verifying the validity of metadata would be a task equal to that of initially creating it, but would have to be repeated on an ongoing basis. Hence all of the talk about 'trusted networks,' which then require trusting the gatekeepers of those networks. Talk about 'semantics.'" Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.

4 of 161 comments (clear)

  1. Speaking of Slashdot's metadata... by Anonymous Coward · · Score: 5, Insightful

    What about the removal of accurate metadata, such as Slashdot's disabling of the "dupe" tag?

    1. Re:Speaking of Slashdot's metadata... by ewl1217 · · Score: 5, Funny

      A tag isn't useful if it works for every article...

  2. Re:You can't trust the moderation system either by grcumb · · Score: 5, Insightful
    [S]ince posts lower than zero do not get displayed automatically, views that are unappealing to the Slashdot community are relegated to obscurity regardless of their validity and correctness.

    Here's a thought: Rather than indulging in self-satisfied name-calling, why not perform some analysis on the moderation system and actually try to provide some evidence for your facile assertion? It's pretty easy to do, precisely because the kind of abuse you claim is rampant here would also be completely transparent, if it were happening.

    For my part, I have no inclination to agree with your assertion, because in the 2 years I've been meta-moderating daily, I haven't seen more about 1% of posts[*] that show such symptoms. On the contrary, if my experience is any guide, there's a far more common tendency to content-free comments like yours upward than to mod unpopular, but well-argued, comments downward. The consistency of the data, and the fact that it's semi-randomly selected for me, leads me to believe that it's statistically significant, and that my experience doesn't differ significantly from anyone else's.

    YMMV, but the burden of proof does lie with the accuser, so please back your assertion with evidence.

    [*] I base that on viewing slightly less than 1 abusive down-mod a week, or 1 in 80-90 moderations.

    --
    Crumb's Corollary: Never bring a knife to a bun fight.
  3. The difficulty: association is not relation by traindirector · · Score: 5, Insightful

    Working with metadata from a non-trusted community is a few orders of difficulty harder than working with trusted metadata. All the examples from non-trusted user groups that I've seen are either 1) only able to track fairly simple data or 2) ambitious but disappointing. I'd put Slashdot's moderation and metamoderation in the first category. Relevance, quality, and a few kinds of description are possible, but these are fairly simple things to track. Most internet resources would require metadata that is much harder to validate to be useful.

    A primary example of this that comes to my mind is the current crop of music recommendation services. The idea behind these sites is that they can, through one of various methods, recommend music to you based on what you like. I've experimented somewhat extensively with Pandora and Last.fm, and the difference in the quality of their suggestions is amazing.

    Last.fm uses community data for recommendations. It tracks tags that users attach to songs and the collection of artists that each user listens to. Based on what artists you have listened to or which tags you select, it attempts to point out other artists you might like.

    Pandora makes recommendations based on musical qualities. The data the service uses comes from the Music Genome Project, which paid people who have studied music to catalogue the musical qualities of songs in their database. Employees listen to songs and select which attributes are applicable to the song from a list of hundreds of attributes. To use the service, you enter some songs and artists that you like, and based on the musical attributes of those songs and artists, it recommends other songs you might like.

    The results that the services provide, at least in my case, are like night and day. Last.fm's recommendations are heavily influenced by what's popular and how a common user would categorize an artist or song. They sort-of hit the right areas, but it doesn't get much better than Amazon's recommendations. Pandora's recommendations always seem to be more on target, even though it uses only a few artists or songs that you enter at the start, in contract to Last.fm, which can use my entire play history.

    I guess a lot of this can be chalked up to the difference between association and relation - without some type of new innovation, it seems that community-based metadata can only be based on association, which is a far cry short of relation. Yes, it is a type of relation, but a set of data has qualities that a few simple tags from users are not going to be able to touch. It seems to me the next generation of metadata will only be possible when we can figure out a way to get the sort of data that Pandora uses from a community group. It's a daunting challenge that tagging and simple user activities like the Google Image Labeller have just started to slightly touch.