Slashdot Mirror


The Anti-Thesaurus: Unwords For Web Searches

Nicholas Carroll writes: "In the continual struggle between search engine administrators, index spammers, and the chaos that underlies knowledge classification, we have endless tools for 'increasing relevance' of search returns, ranging from much ballyhooed and misunderstood 'meta keywords,' to complex algorithms that are still far from perfecting artificial intelligence. Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."

148 comments

  1. Sounds Good But... by TMacPhail · · Score: 3, Insightful

    This sounds like a good plan but i dont think anyone would be willing to risk having their page show up lower in a search when someone was intending to find it. Plus anyone that finds the page in a search by accident is just a new potential customer.

    1. Re:Sounds Good But... by /Wegge · · Score: 2, Interesting
      Plus anyone that finds the page in a search by accident is just a new potential customer.

      On the other hand, any potential customer who find the page as a result of a broader match than warranted by the page might also remeber the site as one that doesn't have what he needs. I don't claim to understand mainstream consumerism, but in my professional capacity, I tend to avoid companies that tries to make a followup sale on a completely unrelated issue.

      --
      //Wegge
    2. Re:Sounds Good But... by Krimsen · · Score: 4, Interesting

      You are basing this on the fact that all people are consumers and all they are searching for are goods and services. What if I am searching the web for info on the DMCA and someone's webpage was called "DMCA" -short for "David, Michael, Cathy and Andrea" (or whatever) If they find that a lot of people are coming across the page accidentally, they can lower the relevance on the page on searches for "DMCA"...

    3. Re:Sounds Good But... by TMacPhail · · Score: 1

      Ok, so this hypotheical "David, Michael, Cathy and Andrea" site might get more hits than they wanted. If they were not intending to sell something then they probably dont actually care about the number of hits that they get. In all likeliness the site is a free site hosted by geocities or some other similar service. In this case it would be considerate of them to use DMCA as a nonword for the meta tag but they would have no responsibility to actually place it as one. For the sites who are trying to gain buisness through the web, they usualy like the extra hits because it creates potential customers. Or at least someone who might happen to mention something they saw to another potential customer.

    4. Re:Sounds Good But... by Krimsen · · Score: 2, Insightful

      Agreed on all points. I guess this concept of nonwords really is kind of dependent on people putting some effort towards something that doesn't immediately benefit them. Eventually "What goes around, comes around" and if eveyone uses the non-words, searches will become better. However, I'm not so sure that people are willing to put effort into something that they won't see return from right away.

    5. Re:Sounds Good But... by jaavaaguru · · Score: 3, Insightful

      If David, Michael, Cathy and Andrea were paying per megabyte for the bandwidth used by their site (for instance if they required what some ISPs consider to be premium services such as ASP or PHP) they would not want everyone who was looking for DMCA information to view their site, since that would most likely more than double their bandwidth consimption. With a frequently searched for word such as DMCA being used as a nonword for their site, they are both saving their own money and the performance of their ISP's network and servers. Another example would be if someone's surname is the same as that of a commercial organisation. They do not want all of that organisation's customers wandering into their site by accident.

    6. Re:Sounds Good But... by FleshWound · · Score: 1

      Wow..."Flamebait"?

      We've got some real winners modding around here as of late... *sigh*

    7. Re:Sounds Good But... by cetan · · Score: 1

      Since this summer (or winter if you're below the equator) there has been a concerted effort to completely f the moderation system. Someone(s) or something(s) are specificly targeting good posts and modding them down. Individual posters have been targeted as well.

      The only thing to do about it is to metamoderate and make sure lame behavior like modding your parent post "Flamebait" get's marked "Unfair"

      --
      In Soviet Russia...michael would be rotting in Siberia!
    8. Re:Sounds Good But... by Bobo+the+Space+Chimp · · Score: 1

      > should be a metadata standard allowing webmasters
      > to manually decrease the relevance of their pages
      > for specific search terms and phrases."

      Last time I checked, the problem was stopping XXX BRITNEY NIPSLIP from turning up as the result to "+car +transmission +repair".

      --
      I am for the complete Trantorization of Earth.
  2. How about this? by NitsujTPU · · Score: 4, Insightful

    Just shitlist any site that is obviously reaching for hits? If a porn site has the words "Alan Turing" in its metadata and doesn't mention anything about Turing later in the site, list them as not being allowed to participate in your search.

    Hell, an engine that did that would almost be useful.

    1. Re:How about this? by H310iSe · · Score: 3, Funny
      from webmonkey on search engine foolin' software:

      You can guess why: Search engine developers buy copies of the same software, learn how to recognize its output, and then demote your site or block it altogether when they spot that pattern in your pages.


      no hard "this site was banned" but it seems there are some who do demote/block if they catch you putting garbage in your keyword list.

      PS if any porn site puts 'alan turing' in their keywords I would actually want to go there - shows some imagination to say the least, gotta give them props for that...
      --
      closed minded is as closed minded does
    2. Re:How about this? by 21mhz · · Score: 4, Informative

      This is where the Google's PageRank(tm) system chimes in: an Alan Turing biography linked by half a hundred sites, each having own decent ratings, will be rated undoubtedly higher than a porn site that just listed "alan turing britney spears anthrax riaa cowboyneal" in their meta keywords and is linked by a handful among millions sites alike. Use the great cross-linking fabric of the Web, Luke.

      Disclaimer: I'm in no way associated with Google.

      --
      My exception safety is -fno-exceptions.
    3. Re:How about this? by Anonymous Coward · · Score: 0
      Your scheme can't be automised. This scheme can. Your scheme doesn't scale to the needs of a modern search engine. This one does.

      Hey - I'm fucking drunk and I know this. Now I'm gonna have a shower and go to bed yes yes

  3. You know this is going to happen by Satai · · Score: 4, Funny
    I can see it now. To Do lists are being written up as we speak...

    1. Increase relevance for Penis Enlargement.
    2. Decrease relevance for Bullshit.


    1. Re:You know this is going to happen by leuk_he · · Score: 2

      just as webmasters used to spend hundreds of largely-wasted hours trying to manipulate SEs through the META KEYWORDS tag

      You are right. Any system where the webmasters have an impact on search relevance will be beaten. Hey they even found a way to beat google. Just create a fake front end that looks serious with one button "naked pictures". The system he describes works best for altavista (6 months ago) like systems.

      Even /. got lots of trolls spending 1000ths of hours whose biggest effort is to lead you to goatse.sx.

    2. Re:You know this is going to happen by DoorFrame · · Score: 1

      goatse.cx, not .sx.

      important distinction.

    3. Re:You know this is going to happen by Kyobu · · Score: 1

      Even several hundred "1000ths of hours" don't really add up to all that much.

      --
      Switch the . and the @ to email me.
  4. I search for 'slash' and 'dot' and end up *here*?! by Overcoat · · Score: 3, Interesting
    Is the phenomenon of people naming their website something that has nothing to do with the content of the website so widespread that it necessitites a new metadata tag and the consequent alteration of search engines to recognize it?

    Google seems to do a good enough job of filtering out irrelevant responses as it is.

  5. Proposal won't work: No incentive! by dstone · · Score: 1, Redundant

    Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases.

    Okay, pretend I'm a webmaster. What's my incentive to have my page show up LESS in anyone's search results?!

    If someone didn't want my site, why do I care if they get it? And if someone wants my site, I don't want to take any chance with an "anti-thesaurus" that might end up excluding my site!

  6. mod_rewrite is your friend by Dr.+Awktagon · · Score: 4, Insightful

    Well it's not as good/effective an idea as what this fellow is suggesting, but you can have a lot of fun with people based on their Referer fields. for instance, use it to just bounce them back to their queries, or bounce them to a different query (one for porn sites is always fun), or bounce them to a more relevant page, or fuck with them however you like. If you've ever had to set up Apache to block people from linking your images, you already know how to do it.

    1. Re:mod_rewrite is your friend by ConsumedByTV · · Score: 2

      If you've ever had to set up Apache to block people from linking your images, you already know how to do it.

      Can you point me to a good howto?

      --


      "Not my manner of thinking but the manner of thinking of others has been the source of my unhappiness." - M
    2. Re:mod_rewrite is your friend by Zocalo · · Score: 1

      There's a pretty good "howto" thing here that should get you started.

      --
      UNIX? They're not even circumcised! Savages!
    3. Re:mod_rewrite is your friend by Anonymous Coward · · Score: 0
      That article was written by a moron. Yes, you can somewhat stop people from putting image requests to your server in their pages, but you can't stop people from snarfing your images. Even the casual Internet Exploder user can save a page with all the images and it will happen all automatically and with all the proper referrers. In Netscape, right click, save image, and you don't have to go digging through your cache.

    4. Re:mod_rewrite is your friend by pricedl · · Score: 1

      Yes, you can somewhat stop people from putting image requests to your server in their pages, but you can't stop people from snarfing your images.

      So now they have my images, and put them on their own site, which doesn't cost me any bandwidth. Sounds like a good thing to me.

      The reason to stop them linking to images on my server is to save me bandwidth, not to prevent people from stealing my images. (That's what copyright is for. :-)

  7. A bit negative? by ukryule · · Score: 2, Interesting

    Wouldn't it be better to put more effort into describing what a site IS about, rather than what it ISN'T?

    After all, if you describe your site, a good search engines will use this information well (so you shouldn't get too many erroneous hits). However, if you list your non-words, a bad search engines will just see this list and treat them as keywords!

    1. Re:A bit negative? by Anonymous Coward · · Score: 0

      The Anti-Thesaurus is a great idea! You are right that until the major search engines support the feature, it would actually be worse to add anti-keywords.

      If I had a site "appleman.com" dealing with tasty apples, and I happen to mention computers somewhere on the page, I might get lots of people looking for "apple.com". If I could list "computer" as an anti-keyword, I'd save a lot of bandwidth. Until we get anti-keywords, putting "This is not Apple Computer, try apple.com instead" on your page would just get more traffic.

  8. Turning lemons into lemonade by Walter+Bell · · Score: 2, Interesting

    When I first read this, it seemed like a good idea. However, it quickly dawned on me that this is a solution in search of a problem. How many people are actually complaining about too many hits to their web site?

    Please forgive me for mentioning capitalism on Slashdot, but a website that receives many misdirected hits is perfect for targeted marketing. Think of the possibilities: if your web site is getting mistaken hits for "victor mousetraps," sell banner ads for "Revenge" brand traps and make a killing on the click-throughs. With a little clever Perl scripting, determine which banner ad to show based on which set of "wrong keywords" show up in the referer. Companies will pay a lot of money for accurately targeted advertisements. Selling these ads would undoubtedly pay the whole bandwidth bill and probably make a profit to boot.

    So no, unwords are not necessary. Unless you're running a website off a freebie .edu connection and aren't allowed to make a profit off of it. Otherwise you're just throwing money away.

    ~wally

    1. Re:Turning lemons into lemonade by Anonymous Coward · · Score: 0

      misdirected hits [...] accurately targeted advertisements.

      How about accurate hits and misdirected advertisements? I don't give a shit about the commercial web.

    2. Re:Turning lemons into lemonade by utdpenguin · · Score: 0
      An interesting idea, but I wonder about implementtion: Dear revenge moustraps, a lot of people come to my site by pure accident. They are not looking for my site. They are not looking for you product. They are looking for your competitor's product by name. This is a great advertising oportunity.


      Or for those looking for People looking for Hannibal: Dear Canibals society, I hear you are haveing a member drive . . .


      Or, bet yet, for "stalking onthe internet" : dear
      equal rights for perverts society . . . .

      --
      In Soviet Russia you dant have to put up with these crappy jokes
    3. Re:Turning lemons into lemonade by smackmonkey · · Score: 0

      Welcome to the Soviet Union of America, comrade.

      --

      --
      CNN declares War on Islam!
      Left-wing America declares War on its Civil Liberties!
    4. Re:Turning lemons into lemonade by Stultsinator · · Score: 1

      While it may take a leap of logic to want to do this for external search engines, I ran into this problem when building the search engine for our e-commerce site.

      At first we just allowed our out-of-the-box search engine package to index our catalog, but the problem we kept running into was the relavance of the results (for example returning VCR stands ahead of an actual VCR when the search was "VCR".)

      So to solve this our merchandizers manually added keywords to each group of products that amounted to a thesaurus. We coded the indexing to place a weighted value for these keywords ahead of the title words, and those ahead of body text.

      It's actually a bigger problem than most geeks realize (as our CEO pointed out.) We were trying to return not just pages that corresponded to the search string, but to the intent of the user. That takes a little more thought on the part of the search engine coders and the implementers.

    5. Re:Turning lemons into lemonade by metamatic · · Score: 1

      How many people are complaining about too many hits?

      Well, speaking personally, I don't want people arriving at my web site unless they're actually looking for the content that's on it. That's because I pay for bandwidth.

      I also know plenty of people who have web sites for their friends, but have ended up being pestered by online perverts after they ended up in search engine listings.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    6. Re:Turning lemons into lemonade by guinsu · · Score: 2

      Yes, this isn't something the typical dishonest commercial web site would ever do (the marketing dept. would have a fit), but for an information site, (the type that provides real contest) it would be great. And it would save time for people who were searching for information not products.

    7. Re:Turning lemons into lemonade by Walter+Bell · · Score: 1

      There's nothing dishonest about targeted advertising. Why do you think you get coupons in the mail for Wonder bread after you've bought a loaf of Butternut with your supermarket discount card? (Although the practice can sometimes raise privacy concerns, it doesn't in the "victor mousetrap" case.)

      Why would anyone want to pay for their bandwidth if they could easily get commercial sponsors to pay for it?

      ~wally

    8. Re:Turning lemons into lemonade by andy@petdance.com · · Score: 2
      How many people are complaining about too many hits?

      Me, definitely.

      I have a section of my site related to Steve Albini's bands, including Big Black. I get tons of hits looking for things like:

      • big black boys fucking white girls
      • big black tits
      • big black nigger dick
      • big black dick fuck white pussy
      • fuck me with that big black dick
      • big black women who shit
      • big black nude guys
      • shake that big black ass
      • beautiful big black booty
      • big black asses in a skirt
      • big black asses in London
      • big black booty in leather pants
      • big black rumps
      • first big black cock in her pussy
      • kiss my big black booty

        and my favorite...
      • black men with big black fuck sticks
      Maybe the short answer is that we need a <META KEYWORD="non-porn"> tag.
    9. Re:Turning lemons into lemonade by vectro · · Score: 1
      Quoth the poster:
      Why would anyone want to pay for their bandwidth if they could easily get commercial sponsors to pay for it?

      Perhaps they don't like advertising? Perhaps they think that american culture is toxic, and that one of the main causes of the destructive consumeristic society they live in is the spread of advertising onto virtually every surface?

      Or, perhaps they just think they will get better service if their provider is beholden to them, and not to some advertiser.

  9. Bad planning by ahoehn · · Score: 5, Funny

    Not such a bright idea to whine about too much traffic on your website and then get a link to your site from a slashdot article.

    --
    Mod my comments down. It'll be fun.
  10. You're going to have to excuse me... by Telek · · Score: 1, Flamebait

    If I think that this is just a retarded stupid idea.

    The people whose web pages are being thrusted to the top of the query lists are the people who are polluting the metadata and other tags for the sole purpose of getting their sites higher in the search lists

    So lemmy get this straight: you want all good and honest people (who aren't causing the problem in the first place) to opt-out of common searches (which they'd never want to do), and this will thus remove the legitimate entries from the pool of queries, returning an even more polluted list from your search engine.

    am I missing something here?

    Although there are a few people who would be helped by removing absolutely irrelivant queries, the vast majority would actually suffer if they used this.

    --

    If God gave us curiosity
    1. Re:You're going to have to excuse me... by vidarh · · Score: 2

      No, he wants them to opt out of searches that they know have no relevance to the content, and where they know that they users who get there will just get annoyed and go somewhere else anyway. For people trying to make money on the web, this is a way to reduce bandwidth costs, and to be able to better target people actually interested in what they provide (and thus more willing to pay or click on ads).

  11. The US Gov't Won't Like It by Asahi+Super+Dry · · Score: 1

    when it realizes that all the TERRORISTS have to do is put the following bit in their HTML: to conceal their web-based activities....

  12. Better Metadata by nyjx · · Score: 4, Interesting
    While the idea would probably do some good if widely adopted what's really needed is to reduce the need for text based indexing of web sites but increasing the amount of explict semantic information about its content.

    Marking up pages with information about the meaning of the terms on them is the main thrust of the work on semantic web - see http://www.daml.org/ (for DAML - the DARPA Agent Markup Language), http://www.semanticweb.org/ (One of the main information sources) and finally the new W3C activity on the subject: http://www.w3.org/2001/sw/.

    How far, how fast it will go is another matter but there's certainly a lot of interest in creating a more "machine readable" web.

    --
    .sig
    1. Re:Better Metadata by Chris+Croome · · Score: 1

      It seems to be a chicken-and-egg situation at the moment -- I'm doing quite a lot of work producing Dublin Core metadata in XHTML and RDF format for a content management system, however no search engines yet support the indexing or searching of this metedata.

      When they do then a proposal like this might make (some) sense.

      --
      Check out MKDoc a mod_perl CMS
    2. Re:Better Metadata by nyjx · · Score: 2
      I think the semantic web effort has the same problem - no incentive to mark up if there are no search engines / agents to read the stuff. No incentive to build the agents if it isn't out there.

      --
      .sig
    3. Re:Better Metadata by Alomex · · Score: 2

      The problem is not so much to understand the content of a page. That can be done in many instances. It is not that hard to understand if a page is talking about a river "bank" or a money "bank". Usually there are enough quotes and links within the page to allow for this automated differentiation.

      The real problem is at the other side, when the user fires Google and enters the standard 2-4 query terms "bank australia". There is a lot less information there for a computer to decide that the user is looking for a bank in Australia.

      Metadata on the web pages is pretty much useless for understanding what the user wanted.

  13. search issues by jahjeremy · · Score: 2, Interesting
    Tbe problem stems from the basic lack of data tagging standardization on the internet. HTML is formative rather than indicative of the types of data that are present. While META keywords are useful, validation is a problem using this method, given the huge number of pages and the propensity of some webmasters to fill this section with irrelavent garbage.

    The main power technique, at least on google, is utilizing quotes and AND/OR to limit search results. Rather than spewing a line of text, enclosing specific "phrases" often gives more accurate results.

    Then again, I have been able to simply cut n' paste error messages into the groups.google.com form and immediately receive accurate, useful hits. I think that though the internet and webpages and generally disorganized and uncentralized, an outside entity can impose order given enough bandwidth, time, energy and intelligence. In the future, web services, probably based on CORBA and SOAP, will allow sites to return messages to searchers or indexing services, thus doing away with a lot of the mystery in the current system.

    All that said, I have had excellent luck with google finding about 95% of all the information I have searched for in the past couple months, showing that a well-written spider and intelligent classification and rating can circumvent the problem of so much untagged, nebulous information.

    The internet is something like the world's largest library where anyone can insert a book and random organizers may (if they wish!) go through and make lists, hashes and indexes of the information for their own card catalogs. Right now, each search service maintains its own separate list! The crawler is like a super-fast librarian who can puruse the book. The coming paradigm will be fewer, more accurate and useful catalogs along with books that "insert themselves" into these schemes intelligently and discretely after a validation of informational content.

    1. Re:search issues by funky+womble · · Score: 1
      Google does well because it pays attention to the text *inside the hyperlink to pages*. For example, this link for news for nerds, stuff that matters means that google searches on news, nerds, stuff, and matters, are more likely to show /.

      Once you've thrown out the 'click here' and 'this link' junk, this is far more reliable than using meta tags, and often more reliable than looking for keywords within the page itself.

    2. Re:search issues by John+Hasler · · Score: 1

      "While META keywords are useful, validation is a problem using this method, given the huge number of pages and the propensity of some webmasters to fill this section with irrelavent garbage."

      Search engines should reduce the relevance of pages with huge META sections.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  14. Not even slashdotted by evil_roy · · Score: 0, Redundant

    I reckon his site can handle the superfluous hits.

  15. Sounds Good by Kira-Baka · · Score: 1

    My friend found that one of the highest things people were finding his webcomic by was "Digimon Porn"... And his comic has no "digimon" or "porn" about it...

    1. Re:Sounds Good by pen · · Score: 1, Funny

      my site gets at least 50 hits a month from searches for "swedish porn". also, "amputated penis", "charcoal underwear", and "president bush daughter".

    2. Re:Sounds Good by c=sixty4 · · Score: 1
      Ah, the joys of analog. I regularly look though my log files for interesing stuff. Stuff people have been looking for and finding my web site (not as perverted as indicated) include:
      • "Long fingernail and long toenail fetish"
      • "mime nude photos"
      • "16 year old boys whith arm pit hair"
      • "easy and fast directions to make crack cocaine in the microwave"
      • "but she was my student why did i have impure thoughs"
      • "nude cartoons inspector gadget"
      • "secrets on how to suntan through your computer"
      --
      "The good die first." "Most of us are morally ambiguous, which explains our random dying patterns." --- MST3K
  16. The Wayback Machine by wormyguy1 · · Score: 1

    With all the terrabytes a day coming into the Wayback Machine (http://web.archive.org), plus the tons and tons of stuff they have from ancient times (as far back as 1996!) it would be awsome of it was searchable. Even some kind of mundane type of search. Sure, Google's index is great, but this blows Google way out of the water. I've found sites in there I made in middle school and never wanted to see again, but data is data.

    --
    NerfOnline - Because Nerf Guns aren't just for kids -
  17. Isn't that what - is for? by pen · · Score: 2, Informative
    If I'm searching for something and the wrong sites come up, I simply look for a keyword that is present on most of the sites I don't need that wouldn't be present on the sites I do need, and then add it to the exclusion list.

    For example, if I'm looking for info on a Toyota Supra and too many Celica-related pages come up, I'll type:

    toyota supra -celica

    On a related note, does anyone feel that Google's built-in exclusion list of universal keywords (a,1,of) is really aggravating when Google excludes those words in phrases?

    1. Re:Isn't that what - is for? by vidarh · · Score: 2
      That is completely different.

      The suggestion was intended to tell the search engines what words on your site aren't relevant for search purposes. So a site primarily about Toyota Celicas, but that mention Supra a couple of places might want add Supra to their "nonwords" entry, to avoid confusing people looking for info about Supras.

      So if the suggestion were in use by most people, you might not have to add "-celica" to your search, as it would be easier for the search engine to exclude pages that contain the word "Supra" but that isn't relevant for your search.

      It's in no way a perfect idea. But if enough people use it it may have some value.

  18. Mike Bouma, open source hero dead at 36 by Anonymous Coward · · Score: 0

    I just heard some sad news on talk radio - open source hero Mike Bouma was found dead in his San Francisco home this morning. There weren't any more details. I'm sure everyone in the Slashdot community will miss him - even if you didn't enjoy his work, there's no denying his contributions to the open source comunity. Truly an American icon.

  19. That's not going to help bandwidth by Rosco+P.+Coltrane · · Score: 3, Funny

    If you replace <meta="keywords" content="mickey mouse"> by <meta="nonwords" content="bestiality mouse-fucking zoophilia kinky ....>, you might draw more Disney lovers and less perverts to your site, but I suspect your HTML file will grow quite a lot bigger ...

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    1. Re:That's not going to help bandwidth by pogen · · Score: 1
      If you replace <meta="keywords" content="mickey mouse"> by <meta="nonwords" content="bestiality mouse-fucking zoophilia kinky ....>, you might draw more Disney lovers and less perverts to your site

      Mommy, what does "view source" mean, and why is the computer swearing at me?

    2. Re:That's not going to help bandwidth by Kanasta · · Score: 3, Insightful

      Yes, unless the same Disney lovers use filtering software, which probably won't be incredibly impressed by the number of banned words in your HTML...

  20. Other Uses by Solidblu · · Score: 0

    It not only could it be used to make some pages better but it would also be interesting to see how it would dumb down legal jargon such as laws to see if the average person can read them without banging thier head against the wall repeatedly over a parking ticket

  21. Re:Proposal won't work: No incentive! by Nate+Eldredge · · Score: 5, Interesting
    I work as a sysadmin for a computer science department. Until recently, the system staff would frequently get messages along the lines of

    From: frankie3327@aol.com
    To: staff@cs.here.edu
    Subject: help!

    i have a lexmark 4590 and it wont print in color.
    it only makes streaks. also the paper always
    jams. how do i fix it? please reply soon!

    The senders never had any connection to the college or the department. We'd reply telling them we had no idea what they were talking about, and that they should seek help elsewhere. It was rather annoying.

    We eventually figured it out. The department web site maintains a collection of help documents for users of the systems. One of them talked about how to use the department's printers, what to do if you have trouble, etc. At the bottom it listed staff@cs.here.edu as the contact address for the site.

    You've probably guessed it by now. That page came up as one of the top few hits when you searched for "printing" on one of the major search engines (I forget which one). Apparently lusers would find this page, notice that it didn't answer their question, but latch on to the staff email address at the bottom, as if we were an organization dedicated to helping people worldwide with their printers. Furrfu!

    I think we reworded the page to emphasize that it only applied to the college, and we haven't received any more emails lately. But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.

    So in answer to your question: When a search engine returns a page that doesn't answer the user's question, the user will often complain to the webmaster. That's a clear incentive to the webmaster not to have the page show up where it's not relevant. Also, it's not the goal of every site simply to be read by millions of people; some would rather concentrate on those to whom it's useful.

  22. Why not using the refferer heder of HTTP by Advocadus+Diaboli · · Score: 1
    I can understand the author of the proposal, but I'm afraid that his proposal won't help the usual web searcher.

    So I would suggest that he could think about checking the refferer as this site is showing and maybe directs all users that come from a search engine to a page where he offers a search engine that is limited to his site. Since the referrer also includes the whole search string he could maybe even use it to fill out his search form.

    I would even prefer this method because it often happens to me that I enter a site via link from a search engine and then I find out that the result page is just a part of a frameset and its missing properties like Javascript variables. If I would redirect search engine users to a defined starting point on my site they would have less troubles (Don't start a disscussion about the sense and use of frames here :-) )

  23. PROLOG HELP!!! by clinko · · Score: 0, Offtopic

    Someone quick!, I have a program due in PROLOG in about 5 hours!

    ok, I just need to convert a string to all caps so I can compare it to its reverse (simple palindrome program)

    I've gotten everything to work except converting the string to all caps, or all lowercase, or finding a caseless compare statment. 1 of the 3 will work and save my ass.

    Thanks for the help!!!

    1. Re:PROLOG HELP!!! by Anonymous Coward · · Score: 0

      That's easy: it's all about defining requirements.

      First thing the program does is print: "this program requires the state of the capslock to be on at any time. The shift key should never be used."

  24. Of course... by Dog+and+Pony · · Score: 1

    ... you could just get people to switch to Google instead.

    1. Re:Of course... by mojo-raisin · · Score: 1

      wow. that's a cool site. thanx.

  25. I thought of a similar idea and worded it as such: by Bakajin · · Score: 1

    On my idea notepad I said this:

    "Technique to negate words in a document for increased searching. For instance, include files that cause a phrase like 'How we converted to XHTML 1.0' to show up on every page. Only the page with actual information, should show up in search, not every page with the include file."

  26. Re:I thought of a similar idea and worded it as su by Bakajin · · Score: 1

    To further clarify, search engines should search for patterns of words wich indicate it is being over-used. May be very difficult, but I think recognizing include files/libraries might be feasible.

  27. Not quite correct. by Anonymous Coward · · Score: 0
    After reading the value / reference calling thread, I checked out the section "Wish you were here". And found two errors:

    Extensions: Unless you are modifying the java interpreter, even the 'core' libraries (on my platform, anyway) must be in the classpath. So 'extending' the language consists of putting a jar file in the classpath. C# has the same thing, called the global assembly cache. - now, before you say, yes, but you have to add a reference to it, I want you to remember that you have to reference every assembly you use, including System.dll - there is a (customisable) set of references appended by default by the c# compiler.

    Dynamic class loading: you skip over Reflection everywhere, as far as I can see, and here is no exception: I have written an app that finds all the .dll's in a directory, instatiates each class in those dll's that implement an interface or have a certain (custom) attribute, and then calls methods and responds to events from those classes. It is possible, using reflection's emit classes to have your code write those classes before calling them. I have used this same thing to accept url's of web services to call them dynamically (for testing). How is it possible you missed something so major to the language? (check out Assembly.Load(), Object.GetType(), and Type.Invoke..)

    It makes me wonder if I can trust the research done on the rest of the article. Thanks for the effort, much of it is very well written... but if I can't trust it all, it's not much use to me.

    Sincerely, Mike Bouma

    1. Re:Not quite correct. by Anonymous Coward · · Score: 0

      Dude - I heard you were dead! What gives?

  28. Interesting by Anonymous Coward · · Score: 0
    Some quotes from the article I find disturbing: "Federal agencies are imposing a stricter standard in reviewing hundreds of thousands of Freedom of Information Act requests from the public each year; officials no longer have to show that disclosure would cause "substantial harm" before rejecting a request. Watchdog groups say they have already started to see rejections of requests that likely would have been granted before."

    "Officials acknowledge that there are very few examples of terrorists actually using public records to glean sensitive information, but they say that the terrorist attacks prove the need for extraordinary caution."

    "We have to get away from the ethos that knowledge is good, knowledge should be publicly available, that information will liberate us," said University of Pennsylvania bioethicist Arthur Caplan. "Information will kill us in the techno-terrorist age, and I think it's nuts to put that stuff on Web sites.

    "Indeed, chemical and water industry groups are lobbying the Bush administration to curtail regulations providing public access to the operations of public facilities, data that environmentalists say are critical to ensuring safety."

  29. Filenames as an unname by t0qer · · Score: 1

    I use filenames all the time on google to find what I want. Sometime's I get lucky and find the file in a directory, with many other files related to the files I am looking for. Another added bonus is I don't have to wade through annoying banner ads or popup windows.

  30. Then what? by Anonymous Coward · · Score: 0
    So when all this information is destroyed/access limited and no one has "how to" instructions for committing any act of violence, then what will we destroy when the violence persists? Each other?


    If someone wants to commit a violent act, they can easily succeed WITHOUT a "how to" manual. They may not get away with it but that hardly matters if the violence results in deaths.


    Take away documentation on bridges, buildings, weapons and whatever you want. They'll ALWAYS figure out another means of attack that wasn't considered.


    In fact, the current state of affairs can be considered a side effect to their attack that the terrorist probably hadn't considered but is surely welcome news to them regardless. Terrorism has infected America and its affect is spreading from within. Terrorists attack our way of life. We'll destroy our way of life by trying to protect ourselves from another such attack.


    How about this: Let's just completely dispose of the Bill of Rights, right now, in the name of national security! I mean, really, we may all die because of the freedoms it allows. Do away with freedom and we'll live forever. Freedom isn't all that it's made out to be anyway. Take Cuba and China for example. They're wonderful places to live. All the people throughout history that died fighting for their freedom must have been idiots, huh? The people that died for America's freedom and ultimately the Constitution and Bill of Rights. What a waste when all they've done is ensure our death at the hands of someone that has learned to build a bomb from publicly available information.


    I prefer to die free, fighting for freedom, than to "live" shackled and bound.


    The problem isn't' information availability. The problem is how we treat each other that can infuriate someone to the point of hatred.

  31. Why this is redundant, and overly subjective by K-Man · · Score: 2

    Given a particular word on a particular website, it's fairly easy to decide if it's relevant or not. How? By looking for links to that website from other websites which mention the same word. That's the idea behind Teoma and a number of other search algorithms. Sites which "unintentionally" get hits for unrelated topics simply don't register on these engines. Link analysis provides much more accurate metadata, because it's based on other people's opinions.

    Another problem with metadata in general, of which spam is but one symptom, is the fact that creators of content often have no idea of how their content appeals, or fails to appeal, to other people. Did Mahir have any idea that his name would become a top-ranked search term? Does anyone have any idea how his content should be ranked for a given search term (besides number one, of course)?

    What is the number one piece of metadata found in spam messages? This is not spam.

    --
    ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  32. Domain names by Breace · · Score: 1, Offtopic

    On a related subject, I've been looking for a domain name that is a) easy to remember and b) does not generate a zillion hits if you type the name in a search engine. (and c) is not a silly long string of words).

    It's funny how most people thing that common word domains are valuable, but forget that if you have a name that, when typed into a search engine, jumps out as the only result is pretty valuable too. Especially if it sounds like it is spelled.

    Maybe not the best example, but since the 4 letter TLD's are practically all gone, I was going to register duxo.com. Unfortunately one of the many domain hogs got it the day I was going for it. :o(

    I got an other one though, but it's not up yet so I won't tell what it is! ;o)))

    1. Re:Domain names by t · · Score: 1
      You know, I've noticed this kind of behaviour before. It's too coincidental to attribute to chance. I suspect that there are people who monitor what domain names people are querying and then registering them in the hopes of reselling them. Does anyone know about this?

      t.

  33. Re:Proposal won't work: No incentive! by Ex+Machina · · Score: 4, Informative

    But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.

    http://www.robotstxt.org/wc/exclusion.html

  34. This will never work by SendBot · · Score: 1

    More hits is almost NEVER a bad thing for a site's main purpose (getting people to see it, and hopefully take an interest in what's there)

    For just the same reason as the automotive industry has made clean fuel vehicles standard, and the very way our capitalist world operates. For the time (money) it takes to implement this thing to make the world a better place, the costs can not be substantiated. Granted, if a lot of sites did this, there would be more time for everyone to spend playing with their dog rather than dig through irrelevant search results. But Joe webmaster's company is never going to pay him to do it, and he's not going to spend his free time doing it when he could be spending time with his dog.

    That's the way the world is working right now, and people who want to change the world to a better place will probably spend their time doing other things rather than putting unwords in their web documents.

  35. Re:Proposal won't work: No incentive! by -brazil- · · Score: 1
    What's my incentive to have my page show up LESS in anyone's search results?!


    Saving bandwidth, perhaps? For a hobbyist's website hosted cheaply (and thus having a low transfer limit), it might be quite desirable not to attract too many visitors who aren't actually interested in the site's contents. Of course, that's not a very common scenario, good search engines will give such sites a low priority anyway because they're not linked to very often.

    --

    The illegal we do immediately. The unconstitutional takes a little longer.
    --Henry Kissinger

  36. A part they left out of the story; by vectus · · Score: 4, Funny

    Webmasters, however, should be careful with these new "anti-words", as when they mix with their word counterpart, a gigantic explosion results.

    1. Re:A part they left out of the story; by jrboynton · · Score: 1

      This is why XHTML requires all attribute values to be quoted.

  37. It might help only minimally with spammers by Billly+Gates · · Score: 0, Offtopic
    I use to have an account here on slashdot with an email address. I also had one at zdnet's talkback with my email address on it as well. I got constantly spammed and it annoyed the hell out of me. I read here on slashdot that spammers use bot machines running perl scripts which just read slashdot and zdnets posts for email addresses and then sends them to a database which spams them around the clock. Another really bad place is newsgroups. Sadly mostly pedophiles and pornographers just spam the hell out of anyone who posts on these groups thinking everyone uses them for just porn.



    In the old days of the internet back when it was run by the government, you could be literally be expelled from using it if you ever did this. Now its a standard practice and many schools ban the newsgroups. This very fabric of how the internet got started and contains valueable learning materials. Why? Well thank these porn spammers! Boy, does that piss me off more then anything else. Anyway I think the indexing metadata is a good one for web searching. It will make searching for valueable data alot easier and give AOL users a reason to switch. You might hate AOL but the users I know who use it say everything is organized right in front of you at your fingertips. No searching needed. If you ever needed to do a search for something specific you can always find what you need immediately. This is quite difficult with the world wide web unless you know exactly where to look.

  38. I can see it now... by dun0s · · Score: 2, Insightful

    Porn sites who promote (through a variaty of means) the words "free, porn, sex" and the like and then demote "pay, fee, membership, credit card".

    This proposal will not make the indexing of sites more reliable. If anything it will add to the common confusion associated with meta keywords. Yes it is quite a nice idea in theory but I can't see anyone wanting to exclude words from being searched. The main point in the proposal was that the author felt guilty about pulling in people who had entered search terms that appeared on his page. One would ask why he is publishing information on the internet if he doesn't want people to look at it. A better solution would be to get people to use search engines properly. As an example I will use the stalking on the internet term. If people put these words into google and come up with his page then prehaps they should have modified their query to something like "stalking on the internet" and they may not have found his page. On the other hand if his page contains the phrase "stalking on the internet" it migh be just what the seaker was looking for.

    To this proposal I say nay. or prehaps oink.

  39. robots.txt ? by Atrax · · Score: 3, Informative

    did you have the page disallowed for search engines? if something is for internal use only, you really ought to have dropped in a robots.txt to exclude it altogether.

    if more people used robots.txt, a lot of 'only useful to internal users' sites would drop right off the engines, leaving relevant results for the rest of the world...

    just a thought......

    --
    Screw you all! I'm off to the pub
    1. Re:robots.txt ? by SilencedScream · · Score: 1, Interesting

      Thats just it though. You say to use robots.txt to have it excluded from search engines but that would exclude it all together. With this new metatag it would only have excluded search engine from returning the page for say a search on "printers" but still return a result for " tech support" If that indeed is what the page was intended for. I think this is a great idea.

    2. Re:robots.txt ? by Atrax · · Score: 1

      missing the point. the post talked about *internal* pages - this isn't a page that should really even be looking for a search engine listing, really, apart from perhaps some altruistic urge. an outside user should end up at the real support site, not a university CS dept., which is *just* for the university. it's a pretty big issue, and i was only pointing at a small part of it...

      j

      --
      Screw you all! I'm off to the pub
    3. Re:robots.txt ? by Anonymous Coward · · Score: 0

      He specifically said, "We don't care whether anyone can find it from the Internet. Our real users know where to look for it." He should be using robots.txt to keep the spiders out, period.

    4. Re:robots.txt ? by ameoba · · Score: 2

      The only problem is that the site WAS helpful to people other than internal users. Third-party troubleshooting information is often a useful resource, particularly for older hardware. Publishing information like this where any ol' luser can find it is about the same a releasing Free software; it doesn't make much more work from you, but it helps the community.

      And, if the admin had a clue, a simple "WTF did you get my addy?" emailed to joe6paq@aol.com would probably have explained everything.

      --
      my sig's at the bottom of the page.
  40. The Semantic Web by mike_sucks · · Score: 5, Interesting

    Surely this kind of issue is what Tim Berners-Lee and the W3C is trying to address with the Semantic Web.

    The problem with content on the web today is that while it is perfectly readable by humans, it is incomprenesible to machines. If Tim and Co get their way, and I for one would love to see the Semantic Web catch on, then we can get rid of kluges like the Anti-Thesaurus, HTML meta keywords and the like.

    --
    -- "So, what's the deal with Auntie Gerschwitz et all?"
    1. Re:The Semantic Web by Alomex · · Score: 3, Insightful

      Surely this kind of issue is what Tim Berners-Lee and the W3C is trying to address with the Semantic Web.

      Indeed, but how close are they from achieving anything of significance? Ai has been working on a Universal Onthohology for ages and gotten nowhere.

      The fact that Berners-Lee agree that it would be a "cool thing to have" does not make it any more likely to happen (by the way, TB-L first proposed the semantic web almost five years ago).

    2. Re:The Semantic Web by poot_rootbeer · · Score: 1


      The problem with the Semantic Web is that humans, in general, write web pages to be readable by humans, not by machines.

      This is not likely to change anytime soon.

    3. Re:The Semantic Web by Zspdude · · Score: 2, Interesting

      What you're suggesting, is that rather than trying to make machines as linguistically competent as we are, we should instead adjust to fit their convenience. (I'd never have thought I'd see the day that we began to negotiate compromises with machines, but that's offtopic). The problem is, that besides it being very useful and effecient, it would restrict the versatility of our communication, and make surfing a lot less fun. No longer would we ever find great web sites by accident. Where would we be without our great and ambiguous language, which allows us to say: Time flies like an arrow. And yet does not exclude Fruit flies like a banana. Go figure.

      --
      What's in a Sig?
    4. Re:The Semantic Web by mike_sucks · · Score: 1

      "What you're suggesting, is that rather than trying to make machines as linguistically competent as we are, we should instead adjust to fit their convenience."

      No, not at all. It's easy to retro-fit a web site with RDF metadata about the content of that site and requires no human-visible changes to the site. Metadata can be stored in HTML meta tags or perferably in seperate RDF description files. None of this effects the way people surf the Web, and unless they have a good browser they won't even know the additional metadata exists.

      In addition, using SW-friendly content in web pages (like strict XHTML, using CSS for all style, use of other XML dialects like SVG, MathML, CML and so on) only lends to machine comprehension while not detracting a single iota from human comprehension.

      It's possible to have web content that is both human and machine comprehsible, but it unfortunately takes a little more effort than making content that is just human readable.

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
  41. Could have done with this years ago by Curl+E · · Score: 2, Funny

    A long time ago (in a galaxy far away) I kept a playlist of my radio show. I had one page per month. One month I played Prono For Pyros "Pets" twice. Guess which web page in our department had the highest hit count for the next year...

    --
    Backups are for wimps. Real men post their data in comments and have slashdot mirror it
  42. What about !keyword? by Ed+Avis · · Score: 3, Informative
    I thought we already had this by prefixing keywords with a ! sign. For example, the BSD FAQ used to have the line:
    Keywords: FAQ 386bsd NetBSD FreeBSD !Linux

    Presumably the same could be done for <meta name="keywords"> in HTML.

    --
    -- Ed Avis ed@membled.com
  43. I like the idea by Florian+Weimer · · Score: 2

    In some jurisdictions, you get into trouble if a search engine refers to one of your pages when you enter a trademark (and you are not entitled to use that trademark). This way, you could easily tell search engines not to list your pages when such a trademark is present in the query. Complying with court orders wouln't be a major problem any more.

    However, you could show some information if people visit with a certain Referrer header, directing them to more useful pages. This works in the majority of cases, and it doesn't need much cooperation from the search engines.

  44. Re:I r0X0r!! by smackmonkey · · Score: 0

    Did she squirt?

    --

    --
    CNN declares War on Islam!
    Left-wing America declares War on its Civil Liberties!
  45. Re:Proposal won't work: No incentive! by dun0s · · Score: 0, Redundant

    Isn't this what robots.txt is for? You disallow all search engines apart from your own from indexing pages that you don't really think people outside your department will want to see. Think how long it would take to put excluded words into every page of your site when a single line in robots.txt would suffice :)

  46. Strings convey no meaning out of context by SgtChaireBourne · · Score: 1
    Whether you put them in meta elements (keyword, antithesaurus) or in the body of the document, strings by themselves have no meaning, no connection to the concept which they represent.

    Take for example a search for the string tar, which will yield documents containing:
    tar -zxf update.tgz, or cp update.tar update.old, or roofing tar , or jeg tar en øl nu

    Each instance of tar above has a different meaning, but the same spelling. When you get into misspellings, spelling variations, and conjugation, then the actual concept is even harder to associate with a given range of strings.

    Even Google searches are for strings and not concepts, but Google's ranking algorithm relies on which pages get the most links from pages that also get the most links. However, you'll still get different results for color vs. colour and tyre vs tire. Because the algorithm only reflects how people have chosen their links, it does, from time to time give unusual associations. ;)

    --
    Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
  47. Exclude genealogy pages; nonsearch tag by texchanchan · · Score: 1
    1. When you search on almost any name of European origin, hundreds of genealogy pages show up. Including -genealogy -rootsweb -descendants only works to a certain extent. Many people would be grateful for genealogy page exclusion.

    2. Some sites have menus on each page listing every topic on the site. You search on a word and get every page in the site returned, including those that mention the topic only in the menu. A tag such as this <nonsearchable> </nonsearchable> surrounding the menus might aid in solving this problem.

  48. The Wrong Tree by karb · · Score: 2

    Unfortunately, these problems are always better solved by stronger search engines. Even though it is several orders of magnitude harder for a search engine to figure out that those things aren't important, it's several orders of magnitude easier to get google to do it than it is to convince 10 million web page maintainers to do it.

    --

    Jack Valenti and the MPAA are to technology as the Boston strangler is to the woman home alone

  49. The Load on Search Engines. by Anonymous Coward · · Score: 0

    I believe that most search engines would implement
    this by not indexing those words for that page.
    It is the only way to do it without increasing the
    load on SE. The other way, no matter how efficiently implemented, would add processing needed to produce results. This means more machines need to be added to the clusters.

    Very few webmasters complain about users finding
    their site because bad search results.
    Most of them are happy to have traffic.

  50. Why this won't work by fleener · · Score: 2

    Most web sites don't have meta tags, but most web designers do want their clients to see impressive hit counts in their traffic reports. Ummm, so who thinks web designers are going to take the time and trouble to add a feature that will decrease traffic?

  51. Re:Sounds Good But..... Useful? by The+Purple+Wizard · · Score: 1
    i dont think anyone would be willing to risk having their page show up lower in a search

    Oh you capitalist-thinkers. Spare a thought for Geocities/ Hypermart users who have to start shelling out money if they cross a certain hit threshold.

  52. Unmarketing by istartedi · · Score: 2

    there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."

    So, in other words... businesses will want to reduce their exposure on the web? I don't think so.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  53. This is backwards by Nick+Arnett · · Score: 2, Insightful

    Picking out the "irrelevant" words is much harder than creating tags that contain the most relevant ones, which is the main point of meta-tags. Most of us have brains that are trained to pick out what is important, not the opposite, so few people would bother to implement this. Language is hard, computers are dumb and few people have been willing to "explain" language to them to make search smarter. In other words, nothing like works on a significant scale if much effort has to go into it. Tagging important words can be semi-automated with summarization software, which will accomplish much more in terms of relevancy ranking than tagging the ones to ignore. And by the way, this proposal misunderstands robots.txt. The point isn't to conceal the existence of pages, it is to tell *robots*, not people, to stay away from them. (I'm the owner of the mailing list for it.

  54. sorry by leuk_he · · Score: 1

    too much ... makes one blind. 8-)

  55. Just over do it!Re:The Wrong Tree by leuk_he · · Score: 2

    stronger search engines

    The more traditional search engines (not google?) have protections against sites that do extreme things to get to 1 in the hitlist. They have protections against repeating 1 word a lot of times. (META="sex, sex,sex"). Repeating your "exwords" in the normal meta tag so many times should trigger the search engine "spam alert" and decrease the search relevance.

  56. No more meta tags by surfnerd · · Score: 1
    Many search engines today use very little of the actual text on a webpage for indexing. The "good" ones use the title and the anchor text of the pages that link to a given page as the main scoring features for page relevancy. Only when there are very few hits will a search engine resort to using the actual content text on the page and it is even less likely to use the meta data.

    There were a couple of interesting papers at the ACM's SIGIR this year that use only the anchot text that points to a webpage to get a description of the pointed to page and they could do some cool things like language translations with just that data.

  57. Re:I search for 'slash' and 'dot' and end up *here by Relic+of+the+Future · · Score: 1

    Does Google even use metadata? I thought their big thing was external linking.

    --
    Those who fail to understand communication protocols, are doomed to repeat them over port 80.
  58. Invisible pages for the pissed-off by spaceyhackerlady · · Score: 1

    I know of at least one web page that has been very carefully constructed so that search engines won't find it, but people who know what they're looking for will find it easily.

    With no subject-specific keywords, however, unless you do know what the author is talking about, you won't have any idea what she's so pissed off about.

    No, don't ask: I am routinely pissed off for the same reason, and will not post the URL here.

    I wouldn't mind if searches for my name brought up my current web page, rather than the one I had in 1995. But that's another matter.

    ...laura

    1. Re:Invisible pages for the pissed-off by t · · Score: 1
      Why don't you just put spaces in the keywords? Like saying "This page is about S n o o p y. Or if you were ranting in a blog and didn't want to get perv hits you could easily bitch about "That a s s h o l e emailed me yesterday!" etc... I don't expect search engines to every want to fix that ever.

      t.

  59. Biblio entries by DaoudaW · · Score: 1

    Matteo Ricci (he's listed in a bibliography; there is no info to speak of)

    While I have occasionally found a source I needed from a hit on a bibliographic entry, one of my pet-peeves, even on Google, is long lists of nothing but bibliographic entries. Usually it's a pretty clear sign that there isn't much on the topic available on the Internet, but sometimes I just need to change my search terms slightly.

    But I think nonword is a bad idea. If the website's editors decide to keep a word, and Google's page-rank technology shows it to me, I'm willing to check it out.

  60. mod_rewrite reference, examples by Dr.+Awktagon · · Score: 3, Informative

    Well some docs are here, and the mod_rewrite reference is here.

    Here is a goofy example that does a redirect back to their google query, except with the word "porn" appended to it. As an added bonus, it only does it when the clock's seconds are an even number. (Or do the same test to the last digit of their IP address). Replace the plus sign before "porn" with about 100 plus signs and they won't see the addition because each plus sign becomes a space. The "%1" refers to their original query.

    RewriteEngine On
    RewriteCond %{TIME_SEC} [02468]$
    RewriteCond %{HTTP_REFERER} google\.com/search [NC]
    RewriteCond %{HTTP_REFERER} [?&]q=([^&]+)
    RewriteRule . http://www.google.com/search?q=%1+porn [R=temp,L]

    Here's another one that checks the user-agent for an URL, and then redirects to it. This keeps most spiders and stuff off your pages since they usually put their URLs in the User-Agent:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} "(http://[^ )]+)"
    RewriteRule . %1 [R=permanent,L]

    Anything you can think of is possible. I think you can even hook it into external scripts.

    1. Re:mod_rewrite reference, examples by rabidcow · · Score: 1

      This keeps most spiders and stuff off your pages since they usually put their URLs in the User-Agent:

      Why not just use robots.txt? Either way you're relying on the spider operator to write their bots in a particular way.

  61. Irrelevant visitors are often the best by Anonymous Coward · · Score: 1, Interesting

    It's even worse than a lack of incentive to decrease relevance. There's actually a strong incentive not to: advertising.

    CPM ads pay the same regardless of relevence. CPC ads tend to pay *even more* for visitors who aren't interested in your content, since they're more likely to click on the ad on the way out.

  62. Preventing image by Anonymous Coward · · Score: 0
    Sound advice for those fortunate enough to be running Apache... but due to circumstances beyond my control, I'm on IIS 5.0.

    I googled around a bit and found a Java applet and browser plugin that can do this, but does anyone know of a straight-up IIS service-level configuration method of disabling "image theft," much like the method for apache described in the howto above?

    Links to FAQs, HOWTOs appreciated!

  63. very useful for single site search engines by jrboynton · · Score: 1

    For a search engine at a single site, this is very useful. You watch the queries and results. If a page doesn't show up, but it should, you add the search terms to the keywords. If it shows up, but you don't want it to, what do you do? Create an anti-keyword field.

  64. OT: Re:Sounds Good But... by SnapShot · · Score: 1

    I don't have mod points right now, but has anyone else noticed that if you use a wheel mouse under windows, you do your mod, and then you "wheel down" to click the moderation button. If you don't remember to click away from the mod box, you end up given the poor person a completely different mod than you intended.

    Maybe this is only an Opera issue?

    --
    Waltz, nymph, for quick jigs vex Bud.
    1. Re:OT: Re:Sounds Good But... by Anonymous Coward · · Score: 0

      It's actually functioning as intended. The wheel usually does the equivalent of pressing the up/down arrow several times (e.g. 3). The interpretation of those virtual arrow-key presses is handled by whichever application has the focus. The standard web browser interface uses to move between form fields, and the arrow keys to select items within a single form field (e.g. select box, radio button group, textarea field); thus, when the web browser is at the forefront, and the focus is on a multi-select form field, wheel motion affects the current selection.

  65. look at Dublin Core by Anonymous Coward · · Score: 0

    There's a standard evolving, but nobody's using it.

    http://dublincore.org/

    -- Ender, Duke_of_URL

  66. Yup... by Da+VinMan · · Score: 2

    IANAL and I don't have specific knowledge of this occurring, but really, what's to stop it from happening?

    My suggestion to anyone is that they develop three good domain names that they would be happy with. But for god's sake, do it *offline*! Don't search for them, don't try them in your browser, and don't tell anyone what they are. *Then* just go register one or all of them. Don't wait, don't search, and don't even breathe until they're yours.

    Oh, and don't forget to trademark the language in those URLs (can't be plain English remember). If someone sees your new URL and likes it, they could register the TM if you don't. Then they can sue you for ownership of the domain, since you're clearly infringing on their TM; and they'll probably get the domain in the end.

    Hey, I don't make the rules...

    And my favorite word today is don't.

    --
    Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
  67. Google Goggles by Tablizer · · Score: 0

    But search engine spammers can do the same thing: buy a bunch of other sites and put links to their target site.

    1. Re:Google Goggles by 21mhz · · Score: 1
      But search engine spammers can do the same thing: buy a bunch of other sites and put links to their target site.

      Their sites would have little initial ratings. As soon as no one links to them from outside, their total rating pool remains low. So, to rise actually high, you have to attract other popular sites. Combined with a shit filter (bad words decrease ratings), this should sort uglies away fairly well.

      --
      My exception safety is -fno-exceptions.
  68. Searching for BSML: Bull Shit Markup Language by SimHacker · · Score: 2
    I received the following email message from the CFO of a company called LabBook, about my Bull Shit Markup Language (BSML) web page.

    Appearently, they would prefer that people searching for "BSML" did not turn up my web page. I wonder if they've tried to get the Boston School for Modern Languages to change their name, too?

    Now isn't the whole point of properly using XML and namespaces to disambiguate coincidental name clashes like this? If LabBook thinks there's a problem with more than one language named BSML, then they obviously have no understanding of XML, and aren't qualified to be using it to define any kind of a standard.

    Maybe LabBook should put some meta-tags on their web pages to decrease their relevence when people are searching for "Bull Shit" or "Modern Language".

    -Don

    ========

    From: "Gene Van Slyke" <gene.vanslyke@labbook.com>
    To: <don@toad.com>; <dhopkins@maxis.com>
    Sent: Monday, November 12, 2001 10:36 AM
    Subject: BSML Trademark

    Don,

    While reviewing the internet for uses of BSML, we noted your use of BSML on http://catalog.com/hopkins/text/bsml.html.
    While we find your use humorous, we have registed the BSML name with the United States Patent and Trademark Office and would appreciate you removing the reference to BSML from your website.

    Thanks for your cooperation,

    Gene Van Slyke
    CFO LabBook

    ========

    Here's the page I published years ago at http://catalog.com/hopkins/text/bsml.html:

    ========

    BSML: Bull Shit Markup Language

    Bull Shit Markup Language is designed to meet the needs of commerce, advertising, and blatant self promotion on the World Wide Web.

    New BSML Markup Tags

    CRONKITE Extension

    This tag marks authoritative text that the reader should believe without question.

    SALE Extension

    This tag marks advertisements for products that are on sale. The browser will do everything it can to bring this to the attention of the user.

    COLORMAP Extension

    This tag allows the html writer complete control over the user's colormap. It supports writing RGB values into the system colormap, plus all the usual crowd pleasers like rotating, flashing, fading and degaussing, as well as changing screen depth and resolution.

    BLINK Extension

    The blinking text tag has been extended to apply to client side image maps, so image regions as well as individual pixels can now be blinked arbitrarily.

    The RAINBOW parameter allow you to specify a sequence of up to 48 colors or image texture maps to apply to the blinking text in sequence.

    The FREQ and PHASE parameters allow you to precisely control the frequence and phase of blinking text. Browsers using Apple's QuickBlink technology or MicroSoft's TrueFlicker can support up to 65536 independently blinking items per page.

    Java applets can be downloaded into the individual blinkers, to blink text and graphics in arbitrarily programmable patterns.

    See the Las Vegas and Times Square home pages for some excellent examples.

    --
    Take a look and feel free: http://www.PieMenu.com
  69. BSML prior art? by SimHacker · · Score: 2
    Oh no, I am quaking in my hip boots, and up to my chin in deep doo doo. A big corporation is trying to claim the rights to BSML, the name of my invention: Bull Shit Markup Language.

    The wheels of government and commerce would grind to a halt were they not well lubricated with Bull Shit. So I created the Bull Shit Markup Language and published the BSML web page years ago, putting it on the public domain for the good of mankind. Now somebody has finally taken it seriously, and is trying to monopolise BSML!

    He who controls BSML controls the Bull Shit... and he who controls the Bull Shit controls the Universe!

    http://catalog.com/hopkins/text/bsml.html

    Does anyone know of any prior art pertaining to Bull Shit and Markup Languages? What about VRML -- Maybe I could get Mark Pesche to testify on my behalf? c(-;

    Here's a list of the huge faceless multinational corporations I'm up against:
    http://www.labbook.com
    "IBM, NetGenics, Apocom, Bristol-Myers Squibb, Wiley and other leaders of the life sciences industry support LabBook's BSML as the standard for biological information".

    To paraphrase Pastor Martin Niemöller:

    First they patented the Anthrax Vaccine
    and I did not speak out
    because I did not have Anthrax.
    Then they patented the AIDS Drugs
    and I did not speak out
    because I did not have AIDS.
    Then they patented Viagra
    and I did not speak out
    because I already had an erection.
    Then they came for the Bull Shitters
    and there was no one left
    to speak out for me.

    -Don

    --
    Take a look and feel free: http://www.PieMenu.com
  70. Google by tedgyz · · Score: 1

    'nuf said

    Ok. I'll say some more. For most searches, google's algorithm does a tremendous job of bringing the relevant sites to the top of the list.

    In fact, when I look for product info and don't get the manufacturer's site first in the list, I consider that a strike against them - i.e. their web presence is put into question.

    --
    "No matter where you go, there you are." -- Buckaroo Banzai
  71. The The by tedgyz · · Score: 1

    Remember the band 'The The' from the '80s. It would seem to be damn near impossible to find them via normal search techniques. :-)

    I did a quick test, here are the results:
    Yahoo: A (listed the band site via their web site listings; official site was 4th in list)
    Google: F (quoting didn't help)
    Northern Light: C (found relevant matches, but the official site was nowhere to be found on the first 2 pages)
    altavista: A+ (official band site was #1 in list)

    Nowadays, you need to think about "searchability" when picking the name for just about anything. That is, assuming you want to be easily found on the web.

    I guess that's where dopey marketing names like 'Itanium' actually make sense. Very unambiguous search criteria.

    --
    "No matter where you go, there you are." -- Buckaroo Banzai
  72. Security hole - marketing wars by tedgyz · · Score: 1

    Imagine a company hiring hackers to break into competitors sites to put important keywords in the unthesaurus.

    For example, what if you hacked 3com's site to put the words 'ethernet' and 'network' in their unthesaurus. It's unlikely that a professional company like Linksys or others would do this, but it is entirely possible.

    You could argue that meta keywords should take precedence, but I'm sure the hacker would remove those words from the meta keyword list.

    --
    "No matter where you go, there you are." -- Buckaroo Banzai