Slashdot Mirror


The Anti-Thesaurus: Unwords For Web Searches

Nicholas Carroll writes: "In the continual struggle between search engine administrators, index spammers, and the chaos that underlies knowledge classification, we have endless tools for 'increasing relevance' of search returns, ranging from much ballyhooed and misunderstood 'meta keywords,' to complex algorithms that are still far from perfecting artificial intelligence. Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."

6 of 148 comments (clear)

  1. Isn't that what - is for? by pen · · Score: 2, Informative
    If I'm searching for something and the wrong sites come up, I simply look for a keyword that is present on most of the sites I don't need that wouldn't be present on the sites I do need, and then add it to the exclusion list.

    For example, if I'm looking for info on a Toyota Supra and too many Celica-related pages come up, I'll type:

    toyota supra -celica

    On a related note, does anyone feel that Google's built-in exclusion list of universal keywords (a,1,of) is really aggravating when Google excludes those words in phrases?

  2. Re:Proposal won't work: No incentive! by Ex+Machina · · Score: 4, Informative

    But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.

    http://www.robotstxt.org/wc/exclusion.html

  3. robots.txt ? by Atrax · · Score: 3, Informative

    did you have the page disallowed for search engines? if something is for internal use only, you really ought to have dropped in a robots.txt to exclude it altogether.

    if more people used robots.txt, a lot of 'only useful to internal users' sites would drop right off the engines, leaving relevant results for the rest of the world...

    just a thought......

    --
    Screw you all! I'm off to the pub
  4. Re:How about this? by 21mhz · · Score: 4, Informative

    This is where the Google's PageRank(tm) system chimes in: an Alan Turing biography linked by half a hundred sites, each having own decent ratings, will be rated undoubtedly higher than a porn site that just listed "alan turing britney spears anthrax riaa cowboyneal" in their meta keywords and is linked by a handful among millions sites alike. Use the great cross-linking fabric of the Web, Luke.

    Disclaimer: I'm in no way associated with Google.

    --
    My exception safety is -fno-exceptions.
  5. What about !keyword? by Ed+Avis · · Score: 3, Informative
    I thought we already had this by prefixing keywords with a ! sign. For example, the BSD FAQ used to have the line:
    Keywords: FAQ 386bsd NetBSD FreeBSD !Linux

    Presumably the same could be done for <meta name="keywords"> in HTML.

    --
    -- Ed Avis ed@membled.com
  6. mod_rewrite reference, examples by Dr.+Awktagon · · Score: 3, Informative

    Well some docs are here, and the mod_rewrite reference is here.

    Here is a goofy example that does a redirect back to their google query, except with the word "porn" appended to it. As an added bonus, it only does it when the clock's seconds are an even number. (Or do the same test to the last digit of their IP address). Replace the plus sign before "porn" with about 100 plus signs and they won't see the addition because each plus sign becomes a space. The "%1" refers to their original query.

    RewriteEngine On
    RewriteCond %{TIME_SEC} [02468]$
    RewriteCond %{HTTP_REFERER} google\.com/search [NC]
    RewriteCond %{HTTP_REFERER} [?&]q=([^&]+)
    RewriteRule . http://www.google.com/search?q=%1+porn [R=temp,L]

    Here's another one that checks the user-agent for an URL, and then redirects to it. This keeps most spiders and stuff off your pages since they usually put their URLs in the User-Agent:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} "(http://[^ )]+)"
    RewriteRule . %1 [R=permanent,L]

    Anything you can think of is possible. I think you can even hook it into external scripts.