Slashdot Mirror


The Anti-Thesaurus: Unwords For Web Searches

Nicholas Carroll writes: "In the continual struggle between search engine administrators, index spammers, and the chaos that underlies knowledge classification, we have endless tools for 'increasing relevance' of search returns, ranging from much ballyhooed and misunderstood 'meta keywords,' to complex algorithms that are still far from perfecting artificial intelligence. Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."

5 of 148 comments (clear)

  1. I search for 'slash' and 'dot' and end up *here*?! by Overcoat · · Score: 3, Interesting
    Is the phenomenon of people naming their website something that has nothing to do with the content of the website so widespread that it necessitites a new metadata tag and the consequent alteration of search engines to recognize it?

    Google seems to do a good enough job of filtering out irrelevant responses as it is.

  2. Re:Sounds Good But... by Krimsen · · Score: 4, Interesting

    You are basing this on the fact that all people are consumers and all they are searching for are goods and services. What if I am searching the web for info on the DMCA and someone's webpage was called "DMCA" -short for "David, Michael, Cathy and Andrea" (or whatever) If they find that a lot of people are coming across the page accidentally, they can lower the relevance on the page on searches for "DMCA"...

  3. Better Metadata by nyjx · · Score: 4, Interesting
    While the idea would probably do some good if widely adopted what's really needed is to reduce the need for text based indexing of web sites but increasing the amount of explict semantic information about its content.

    Marking up pages with information about the meaning of the terms on them is the main thrust of the work on semantic web - see http://www.daml.org/ (for DAML - the DARPA Agent Markup Language), http://www.semanticweb.org/ (One of the main information sources) and finally the new W3C activity on the subject: http://www.w3.org/2001/sw/.

    How far, how fast it will go is another matter but there's certainly a lot of interest in creating a more "machine readable" web.

    --
    .sig
  4. Re:Proposal won't work: No incentive! by Nate+Eldredge · · Score: 5, Interesting
    I work as a sysadmin for a computer science department. Until recently, the system staff would frequently get messages along the lines of

    From: frankie3327@aol.com
    To: staff@cs.here.edu
    Subject: help!

    i have a lexmark 4590 and it wont print in color.
    it only makes streaks. also the paper always
    jams. how do i fix it? please reply soon!

    The senders never had any connection to the college or the department. We'd reply telling them we had no idea what they were talking about, and that they should seek help elsewhere. It was rather annoying.

    We eventually figured it out. The department web site maintains a collection of help documents for users of the systems. One of them talked about how to use the department's printers, what to do if you have trouble, etc. At the bottom it listed staff@cs.here.edu as the contact address for the site.

    You've probably guessed it by now. That page came up as one of the top few hits when you searched for "printing" on one of the major search engines (I forget which one). Apparently lusers would find this page, notice that it didn't answer their question, but latch on to the staff email address at the bottom, as if we were an organization dedicated to helping people worldwide with their printers. Furrfu!

    I think we reworded the page to emphasize that it only applied to the college, and we haven't received any more emails lately. But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.

    So in answer to your question: When a search engine returns a page that doesn't answer the user's question, the user will often complain to the webmaster. That's a clear incentive to the webmaster not to have the page show up where it's not relevant. Also, it's not the goal of every site simply to be read by millions of people; some would rather concentrate on those to whom it's useful.

  5. The Semantic Web by mike_sucks · · Score: 5, Interesting

    Surely this kind of issue is what Tim Berners-Lee and the W3C is trying to address with the Semantic Web.

    The problem with content on the web today is that while it is perfectly readable by humans, it is incomprenesible to machines. If Tim and Co get their way, and I for one would love to see the Semantic Web catch on, then we can get rid of kluges like the Anti-Thesaurus, HTML meta keywords and the like.

    --
    -- "So, what's the deal with Auntie Gerschwitz et all?"