The Anti-Thesaurus: Unwords For Web Searches
Nicholas Carroll writes: "In the continual struggle between search engine administrators, index spammers, and the chaos that underlies knowledge classification, we have endless tools for 'increasing relevance' of search returns, ranging from much ballyhooed and misunderstood 'meta keywords,' to complex algorithms that are still far from perfecting artificial intelligence. Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."
This sounds like a good plan but i dont think anyone would be willing to risk having their page show up lower in a search when someone was intending to find it. Plus anyone that finds the page in a search by accident is just a new potential customer.
Just shitlist any site that is obviously reaching for hits? If a porn site has the words "Alan Turing" in its metadata and doesn't mention anything about Turing later in the site, list them as not being allowed to participate in your search.
Hell, an engine that did that would almost be useful.
Google seems to do a good enough job of filtering out irrelevant responses as it is.
Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases.
Okay, pretend I'm a webmaster. What's my incentive to have my page show up LESS in anyone's search results?!
If someone didn't want my site, why do I care if they get it? And if someone wants my site, I don't want to take any chance with an "anti-thesaurus" that might end up excluding my site!
Well it's not as good/effective an idea as what this fellow is suggesting, but you can have a lot of fun with people based on their Referer fields. for instance, use it to just bounce them back to their queries, or bounce them to a different query (one for porn sites is always fun), or bounce them to a more relevant page, or fuck with them however you like. If you've ever had to set up Apache to block people from linking your images, you already know how to do it.
Wouldn't it be better to put more effort into describing what a site IS about, rather than what it ISN'T?
After all, if you describe your site, a good search engines will use this information well (so you shouldn't get too many erroneous hits). However, if you list your non-words, a bad search engines will just see this list and treat them as keywords!
When I first read this, it seemed like a good idea. However, it quickly dawned on me that this is a solution in search of a problem. How many people are actually complaining about too many hits to their web site?
.edu connection and aren't allowed to make a profit off of it. Otherwise you're just throwing money away.
Please forgive me for mentioning capitalism on Slashdot, but a website that receives many misdirected hits is perfect for targeted marketing. Think of the possibilities: if your web site is getting mistaken hits for "victor mousetraps," sell banner ads for "Revenge" brand traps and make a killing on the click-throughs. With a little clever Perl scripting, determine which banner ad to show based on which set of "wrong keywords" show up in the referer. Companies will pay a lot of money for accurately targeted advertisements. Selling these ads would undoubtedly pay the whole bandwidth bill and probably make a profit to boot.
So no, unwords are not necessary. Unless you're running a website off a freebie
~wally
Not such a bright idea to whine about too much traffic on your website and then get a link to your site from a slashdot article.
Mod my comments down. It'll be fun.
If I think that this is just a retarded stupid idea.
The people whose web pages are being thrusted to the top of the query lists are the people who are polluting the metadata and other tags for the sole purpose of getting their sites higher in the search lists
So lemmy get this straight: you want all good and honest people (who aren't causing the problem in the first place) to opt-out of common searches (which they'd never want to do), and this will thus remove the legitimate entries from the pool of queries, returning an even more polluted list from your search engine.
am I missing something here?
Although there are a few people who would be helped by removing absolutely irrelivant queries, the vast majority would actually suffer if they used this.
If God gave us curiosity
when it realizes that all the TERRORISTS have to do is put the following bit in their HTML: to conceal their web-based activities....
Marking up pages with information about the meaning of the terms on them is the main thrust of the work on semantic web - see http://www.daml.org/ (for DAML - the DARPA Agent Markup Language), http://www.semanticweb.org/ (One of the main information sources) and finally the new W3C activity on the subject: http://www.w3.org/2001/sw/.
How far, how fast it will go is another matter but there's certainly a lot of interest in creating a more "machine readable" web.
.sig
The main power technique, at least on google, is utilizing quotes and AND/OR to limit search results. Rather than spewing a line of text, enclosing specific "phrases" often gives more accurate results.
Then again, I have been able to simply cut n' paste error messages into the groups.google.com form and immediately receive accurate, useful hits. I think that though the internet and webpages and generally disorganized and uncentralized, an outside entity can impose order given enough bandwidth, time, energy and intelligence. In the future, web services, probably based on CORBA and SOAP, will allow sites to return messages to searchers or indexing services, thus doing away with a lot of the mystery in the current system.
All that said, I have had excellent luck with google finding about 95% of all the information I have searched for in the past couple months, showing that a well-written spider and intelligent classification and rating can circumvent the problem of so much untagged, nebulous information.
The internet is something like the world's largest library where anyone can insert a book and random organizers may (if they wish!) go through and make lists, hashes and indexes of the information for their own card catalogs. Right now, each search service maintains its own separate list! The crawler is like a super-fast librarian who can puruse the book. The coming paradigm will be fewer, more accurate and useful catalogs along with books that "insert themselves" into these schemes intelligently and discretely after a validation of informational content.
I reckon his site can handle the superfluous hits.
My friend found that one of the highest things people were finding his webcomic by was "Digimon Porn"... And his comic has no "digimon" or "porn" about it...
With all the terrabytes a day coming into the Wayback Machine (http://web.archive.org), plus the tons and tons of stuff they have from ancient times (as far back as 1996!) it would be awsome of it was searchable. Even some kind of mundane type of search. Sure, Google's index is great, but this blows Google way out of the water. I've found sites in there I made in middle school and never wanted to see again, but data is data.
NerfOnline - Because Nerf Guns aren't just for kids -
For example, if I'm looking for info on a Toyota Supra and too many Celica-related pages come up, I'll type:
toyota supra -celica
On a related note, does anyone feel that Google's built-in exclusion list of universal keywords (a,1,of) is really aggravating when Google excludes those words in phrases?
I just heard some sad news on talk radio - open source hero Mike Bouma was found dead in his San Francisco home this morning. There weren't any more details. I'm sure everyone in the Slashdot community will miss him - even if you didn't enjoy his work, there's no denying his contributions to the open source comunity. Truly an American icon.
If you replace <meta="keywords" content="mickey mouse"> by <meta="nonwords" content="bestiality mouse-fucking zoophilia kinky ....>, you might draw more Disney lovers and less perverts to your site, but I suspect your HTML file will grow quite a lot bigger ...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
It not only could it be used to make some pages better but it would also be interesting to see how it would dumb down legal jargon such as laws to see if the average person can read them without banging thier head against the wall repeatedly over a parking ticket
From: frankie3327@aol.com
To: staff@cs.here.edu
Subject: help!
i have a lexmark 4590 and it wont print in color.
it only makes streaks. also the paper always
jams. how do i fix it? please reply soon!
The senders never had any connection to the college or the department. We'd reply telling them we had no idea what they were talking about, and that they should seek help elsewhere. It was rather annoying.
We eventually figured it out. The department web site maintains a collection of help documents for users of the systems. One of them talked about how to use the department's printers, what to do if you have trouble, etc. At the bottom it listed staff@cs.here.edu as the contact address for the site.
You've probably guessed it by now. That page came up as one of the top few hits when you searched for "printing" on one of the major search engines (I forget which one). Apparently lusers would find this page, notice that it didn't answer their question, but latch on to the staff email address at the bottom, as if we were an organization dedicated to helping people worldwide with their printers. Furrfu!
I think we reworded the page to emphasize that it only applied to the college, and we haven't received any more emails lately. But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.
So in answer to your question: When a search engine returns a page that doesn't answer the user's question, the user will often complain to the webmaster. That's a clear incentive to the webmaster not to have the page show up where it's not relevant. Also, it's not the goal of every site simply to be read by millions of people; some would rather concentrate on those to whom it's useful.
So I would suggest that he could think about checking the refferer as this site is showing and maybe directs all users that come from a search engine to a page where he offers a search engine that is limited to his site. Since the referrer also includes the whole search string he could maybe even use it to fill out his search form.
I would even prefer this method because it often happens to me that I enter a site via link from a search engine and then I find out that the result page is just a part of a frameset and its missing properties like Javascript variables. If I would redirect search engine users to a defined starting point on my site they would have less troubles (Don't start a disscussion about the sense and use of frames here :-) )
Someone quick!, I have a program due in PROLOG in about 5 hours!
ok, I just need to convert a string to all caps so I can compare it to its reverse (simple palindrome program)
I've gotten everything to work except converting the string to all caps, or all lowercase, or finding a caseless compare statment. 1 of the 3 will work and save my ass.
Thanks for the help!!!
... you could just get people to switch to Google instead.
On my idea notepad I said this:
"Technique to negate words in a document for increased searching. For instance, include files that cause a phrase like 'How we converted to XHTML 1.0' to show up on every page. Only the page with actual information, should show up in search, not every page with the include file."
[news for me, stuff that doesn't matter]
To further clarify, search engines should search for patterns of words wich indicate it is being over-used. May be very difficult, but I think recognizing include files/libraries might be feasible.
[news for me, stuff that doesn't matter]
Extensions: Unless you are modifying the java interpreter, even the 'core' libraries (on my platform, anyway) must be in the classpath. So 'extending' the language consists of putting a jar file in the classpath. C# has the same thing, called the global assembly cache. - now, before you say, yes, but you have to add a reference to it, I want you to remember that you have to reference every assembly you use, including System.dll - there is a (customisable) set of references appended by default by the c# compiler.
Dynamic class loading: you skip over Reflection everywhere, as far as I can see, and here is no exception: I have written an app that finds all the .dll's in a directory, instatiates each class in those dll's that implement an interface or have a certain (custom) attribute, and then calls methods and responds to events from those classes. It is possible, using reflection's emit classes to have your code write those classes before calling them. I have used this same thing to accept url's of web services to call them dynamically (for testing). How is it possible you missed something so major to the language? (check out Assembly.Load(), Object.GetType(), and Type.Invoke..)
It makes me wonder if I can trust the research done on the rest of the article. Thanks for the effort, much of it is very well written... but if I can't trust it all, it's not much use to me.
Sincerely, Mike Bouma
"Officials acknowledge that there are very few examples of terrorists actually using public records to glean sensitive information, but they say that the terrorist attacks prove the need for extraordinary caution."
"We have to get away from the ethos that knowledge is good, knowledge should be publicly available, that information will liberate us," said University of Pennsylvania bioethicist Arthur Caplan. "Information will kill us in the techno-terrorist age, and I think it's nuts to put that stuff on Web sites.
"Indeed, chemical and water industry groups are lobbying the Bush administration to curtail regulations providing public access to the operations of public facilities, data that environmentalists say are critical to ensuring safety."
I use filenames all the time on google to find what I want. Sometime's I get lucky and find the file in a directory, with many other files related to the files I am looking for. Another added bonus is I don't have to wade through annoying banner ads or popup windows.
If someone wants to commit a violent act, they can easily succeed WITHOUT a "how to" manual. They may not get away with it but that hardly matters if the violence results in deaths.
Take away documentation on bridges, buildings, weapons and whatever you want. They'll ALWAYS figure out another means of attack that wasn't considered.
In fact, the current state of affairs can be considered a side effect to their attack that the terrorist probably hadn't considered but is surely welcome news to them regardless. Terrorism has infected America and its affect is spreading from within. Terrorists attack our way of life. We'll destroy our way of life by trying to protect ourselves from another such attack.
How about this: Let's just completely dispose of the Bill of Rights, right now, in the name of national security! I mean, really, we may all die because of the freedoms it allows. Do away with freedom and we'll live forever. Freedom isn't all that it's made out to be anyway. Take Cuba and China for example. They're wonderful places to live. All the people throughout history that died fighting for their freedom must have been idiots, huh? The people that died for America's freedom and ultimately the Constitution and Bill of Rights. What a waste when all they've done is ensure our death at the hands of someone that has learned to build a bomb from publicly available information.
I prefer to die free, fighting for freedom, than to "live" shackled and bound.
The problem isn't' information availability. The problem is how we treat each other that can infuriate someone to the point of hatred.
Given a particular word on a particular website, it's fairly easy to decide if it's relevant or not. How? By looking for links to that website from other websites which mention the same word. That's the idea behind Teoma and a number of other search algorithms. Sites which "unintentionally" get hits for unrelated topics simply don't register on these engines. Link analysis provides much more accurate metadata, because it's based on other people's opinions.
Another problem with metadata in general, of which spam is but one symptom, is the fact that creators of content often have no idea of how their content appeals, or fails to appeal, to other people. Did Mahir have any idea that his name would become a top-ranked search term? Does anyone have any idea how his content should be ranked for a given search term (besides number one, of course)?
What is the number one piece of metadata found in spam messages? This is not spam.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
On a related subject, I've been looking for a domain name that is a) easy to remember and b) does not generate a zillion hits if you type the name in a search engine. (and c) is not a silly long string of words).
:o(
;o)))
It's funny how most people thing that common word domains are valuable, but forget that if you have a name that, when typed into a search engine, jumps out as the only result is pretty valuable too. Especially if it sounds like it is spelled.
Maybe not the best example, but since the 4 letter TLD's are practically all gone, I was going to register duxo.com. Unfortunately one of the many domain hogs got it the day I was going for it.
I got an other one though, but it's not up yet so I won't tell what it is!
But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.
http://www.robotstxt.org/wc/exclusion.html
More hits is almost NEVER a bad thing for a site's main purpose (getting people to see it, and hopefully take an interest in what's there)
For just the same reason as the automotive industry has made clean fuel vehicles standard, and the very way our capitalist world operates. For the time (money) it takes to implement this thing to make the world a better place, the costs can not be substantiated. Granted, if a lot of sites did this, there would be more time for everyone to spend playing with their dog rather than dig through irrelevant search results. But Joe webmaster's company is never going to pay him to do it, and he's not going to spend his free time doing it when he could be spending time with his dog.
That's the way the world is working right now, and people who want to change the world to a better place will probably spend their time doing other things rather than putting unwords in their web documents.
Saving bandwidth, perhaps? For a hobbyist's website hosted cheaply (and thus having a low transfer limit), it might be quite desirable not to attract too many visitors who aren't actually interested in the site's contents. Of course, that's not a very common scenario, good search engines will give such sites a low priority anyway because they're not linked to very often.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Webmasters, however, should be careful with these new "anti-words", as when they mix with their word counterpart, a gigantic explosion results.
In the old days of the internet back when it was run by the government, you could be literally be expelled from using it if you ever did this. Now its a standard practice and many schools ban the newsgroups. This very fabric of how the internet got started and contains valueable learning materials. Why? Well thank these porn spammers! Boy, does that piss me off more then anything else. Anyway I think the indexing metadata is a good one for web searching. It will make searching for valueable data alot easier and give AOL users a reason to switch. You might hate AOL but the users I know who use it say everything is organized right in front of you at your fingertips. No searching needed. If you ever needed to do a search for something specific you can always find what you need immediately. This is quite difficult with the world wide web unless you know exactly where to look.
http://saveie6.com/
Porn sites who promote (through a variaty of means) the words "free, porn, sex" and the like and then demote "pay, fee, membership, credit card".
This proposal will not make the indexing of sites more reliable. If anything it will add to the common confusion associated with meta keywords. Yes it is quite a nice idea in theory but I can't see anyone wanting to exclude words from being searched. The main point in the proposal was that the author felt guilty about pulling in people who had entered search terms that appeared on his page. One would ask why he is publishing information on the internet if he doesn't want people to look at it. A better solution would be to get people to use search engines properly. As an example I will use the stalking on the internet term. If people put these words into google and come up with his page then prehaps they should have modified their query to something like "stalking on the internet" and they may not have found his page. On the other hand if his page contains the phrase "stalking on the internet" it migh be just what the seaker was looking for.
To this proposal I say nay. or prehaps oink.
did you have the page disallowed for search engines? if something is for internal use only, you really ought to have dropped in a robots.txt to exclude it altogether.
if more people used robots.txt, a lot of 'only useful to internal users' sites would drop right off the engines, leaving relevant results for the rest of the world...
just a thought......
Screw you all! I'm off to the pub
Surely this kind of issue is what Tim Berners-Lee and the W3C is trying to address with the Semantic Web.
The problem with content on the web today is that while it is perfectly readable by humans, it is incomprenesible to machines. If Tim and Co get their way, and I for one would love to see the Semantic Web catch on, then we can get rid of kluges like the Anti-Thesaurus, HTML meta keywords and the like.
-- "So, what's the deal with Auntie Gerschwitz et all?"
A long time ago (in a galaxy far away) I kept a playlist of my radio show. I had one page per month. One month I played Prono For Pyros "Pets" twice. Guess which web page in our department had the highest hit count for the next year...
Backups are for wimps. Real men post their data in comments and have slashdot mirror it
Presumably the same could be done for <meta name="keywords"> in HTML.
-- Ed Avis ed@membled.com
In some jurisdictions, you get into trouble if a search engine refers to one of your pages when you enter a trademark (and you are not entitled to use that trademark). This way, you could easily tell search engines not to list your pages when such a trademark is present in the query. Complying with court orders wouln't be a major problem any more.
However, you could show some information if people visit with a certain Referrer header, directing them to more useful pages. This works in the majority of cases, and it doesn't need much cooperation from the search engines.
Did she squirt?
--
CNN declares War on Islam!
Left-wing America declares War on its Civil Liberties!
Isn't this what robots.txt is for? You disallow all search engines apart from your own from indexing pages that you don't really think people outside your department will want to see. Think how long it would take to put excluded words into every page of your site when a single line in robots.txt would suffice :)
Take for example a search for the string tar, which will yield documents containing:
tar -zxf update.tgz, or cp update.tar update.old, or roofing tar , or jeg tar en øl nu
Each instance of tar above has a different meaning, but the same spelling. When you get into misspellings, spelling variations, and conjugation, then the actual concept is even harder to associate with a given range of strings.
Even Google searches are for strings and not concepts, but Google's ranking algorithm relies on which pages get the most links from pages that also get the most links. However, you'll still get different results for color vs. colour and tyre vs tire. Because the algorithm only reflects how people have chosen their links, it does, from time to time give unusual associations. ;)
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
2. Some sites have menus on each page listing every topic on the site. You search on a word and get every page in the site returned, including those that mention the topic only in the menu. A tag such as this <nonsearchable> </nonsearchable> surrounding the menus might aid in solving this problem.
Unfortunately, these problems are always better solved by stronger search engines. Even though it is several orders of magnitude harder for a search engine to figure out that those things aren't important, it's several orders of magnitude easier to get google to do it than it is to convince 10 million web page maintainers to do it.
Jack Valenti and the MPAA are to technology as the Boston strangler is to the woman home alone
I believe that most search engines would implement
this by not indexing those words for that page.
It is the only way to do it without increasing the
load on SE. The other way, no matter how efficiently implemented, would add processing needed to produce results. This means more machines need to be added to the clusters.
Very few webmasters complain about users finding
their site because bad search results.
Most of them are happy to have traffic.
Most web sites don't have meta tags, but most web designers do want their clients to see impressive hit counts in their traffic reports. Ummm, so who thinks web designers are going to take the time and trouble to add a feature that will decrease traffic?
Oh you capitalist-thinkers. Spare a thought for Geocities/ Hypermart users who have to start shelling out money if they cross a certain hit threshold.
there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."
So, in other words... businesses will want to reduce their exposure on the web? I don't think so.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Picking out the "irrelevant" words is much harder than creating tags that contain the most relevant ones, which is the main point of meta-tags. Most of us have brains that are trained to pick out what is important, not the opposite, so few people would bother to implement this. Language is hard, computers are dumb and few people have been willing to "explain" language to them to make search smarter. In other words, nothing like works on a significant scale if much effort has to go into it. Tagging important words can be semi-automated with summarization software, which will accomplish much more in terms of relevancy ranking than tagging the ones to ignore. And by the way, this proposal misunderstands robots.txt. The point isn't to conceal the existence of pages, it is to tell *robots*, not people, to stay away from them. (I'm the owner of the mailing list for it.
too much ... makes one blind. 8-)
stronger search engines
The more traditional search engines (not google?) have protections against sites that do extreme things to get to 1 in the hitlist. They have protections against repeating 1 word a lot of times. (META="sex, sex,sex"). Repeating your "exwords" in the normal meta tag so many times should trigger the search engine "spam alert" and decrease the search relevance.
There were a couple of interesting papers at the ACM's SIGIR this year that use only the anchot text that points to a webpage to get a description of the pointed to page and they could do some cool things like language translations with just that data.
Does Google even use metadata? I thought their big thing was external linking.
Those who fail to understand communication protocols, are doomed to repeat them over port 80.
I know of at least one web page that has been very carefully constructed so that search engines won't find it, but people who know what they're looking for will find it easily.
With no subject-specific keywords, however, unless you do know what the author is talking about, you won't have any idea what she's so pissed off about.
No, don't ask: I am routinely pissed off for the same reason, and will not post the URL here.
I wouldn't mind if searches for my name brought up my current web page, rather than the one I had in 1995. But that's another matter.
...laura
Matteo Ricci (he's listed in a bibliography; there is no info to speak of)
While I have occasionally found a source I needed from a hit on a bibliographic entry, one of my pet-peeves, even on Google, is long lists of nothing but bibliographic entries. Usually it's a pretty clear sign that there isn't much on the topic available on the Internet, but sometimes I just need to change my search terms slightly.
But I think nonword is a bad idea. If the website's editors decide to keep a word, and Google's page-rank technology shows it to me, I'm willing to check it out.
Well some docs are here, and the mod_rewrite reference is here.
Here is a goofy example that does a redirect back to their google query, except with the word "porn" appended to it. As an added bonus, it only does it when the clock's seconds are an even number. (Or do the same test to the last digit of their IP address). Replace the plus sign before "porn" with about 100 plus signs and they won't see the addition because each plus sign becomes a space. The "%1" refers to their original query.
Here's another one that checks the user-agent for an URL, and then redirects to it. This keeps most spiders and stuff off your pages since they usually put their URLs in the User-Agent:
Anything you can think of is possible. I think you can even hook it into external scripts.
It's even worse than a lack of incentive to decrease relevance. There's actually a strong incentive not to: advertising.
CPM ads pay the same regardless of relevence. CPC ads tend to pay *even more* for visitors who aren't interested in your content, since they're more likely to click on the ad on the way out.
I googled around a bit and found a Java applet and browser plugin that can do this, but does anyone know of a straight-up IIS service-level configuration method of disabling "image theft," much like the method for apache described in the howto above?
Links to FAQs, HOWTOs appreciated!
For a search engine at a single site, this is very useful. You watch the queries and results. If a page doesn't show up, but it should, you add the search terms to the keywords. If it shows up, but you don't want it to, what do you do? Create an anti-keyword field.
I don't have mod points right now, but has anyone else noticed that if you use a wheel mouse under windows, you do your mod, and then you "wheel down" to click the moderation button. If you don't remember to click away from the mod box, you end up given the poor person a completely different mod than you intended.
Maybe this is only an Opera issue?
Waltz, nymph, for quick jigs vex Bud.
There's a standard evolving, but nobody's using it.
http://dublincore.org/
-- Ender, Duke_of_URL
IANAL and I don't have specific knowledge of this occurring, but really, what's to stop it from happening?
My suggestion to anyone is that they develop three good domain names that they would be happy with. But for god's sake, do it *offline*! Don't search for them, don't try them in your browser, and don't tell anyone what they are. *Then* just go register one or all of them. Don't wait, don't search, and don't even breathe until they're yours.
Oh, and don't forget to trademark the language in those URLs (can't be plain English remember). If someone sees your new URL and likes it, they could register the TM if you don't. Then they can sue you for ownership of the domain, since you're clearly infringing on their TM; and they'll probably get the domain in the end.
Hey, I don't make the rules...
And my favorite word today is don't.
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
But search engine spammers can do the same thing: buy a bunch of other sites and put links to their target site.
Table-ized A.I.
Appearently, they would prefer that people searching for "BSML" did not turn up my web page. I wonder if they've tried to get the Boston School for Modern Languages to change their name, too?
Now isn't the whole point of properly using XML and namespaces to disambiguate coincidental name clashes like this? If LabBook thinks there's a problem with more than one language named BSML, then they obviously have no understanding of XML, and aren't qualified to be using it to define any kind of a standard.
Maybe LabBook should put some meta-tags on their web pages to decrease their relevence when people are searching for "Bull Shit" or "Modern Language".
-Don
========
From: "Gene Van Slyke" <gene.vanslyke@labbook.com>
To: <don@toad.com>; <dhopkins@maxis.com>
Sent: Monday, November 12, 2001 10:36 AM
Subject: BSML Trademark
Don,
While reviewing the internet for uses of BSML, we noted your use of BSML on http://catalog.com/hopkins/text/bsml.html.
While we find your use humorous, we have registed the BSML name with the United States Patent and Trademark Office and would appreciate you removing the reference to BSML from your website.
Thanks for your cooperation,
Gene Van Slyke
CFO LabBook
========
Here's the page I published years ago at http://catalog.com/hopkins/text/bsml.html:
========
BSML: Bull Shit Markup Language
Bull Shit Markup Language is designed to meet the needs of commerce, advertising, and blatant self promotion on the World Wide Web.
New BSML Markup Tags
CRONKITE Extension
This tag marks authoritative text that the reader should believe without question.
SALE Extension
This tag marks advertisements for products that are on sale. The browser will do everything it can to bring this to the attention of the user.
COLORMAP Extension
This tag allows the html writer complete control over the user's colormap. It supports writing RGB values into the system colormap, plus all the usual crowd pleasers like rotating, flashing, fading and degaussing, as well as changing screen depth and resolution.
BLINK Extension
The blinking text tag has been extended to apply to client side image maps, so image regions as well as individual pixels can now be blinked arbitrarily.
The RAINBOW parameter allow you to specify a sequence of up to 48 colors or image texture maps to apply to the blinking text in sequence.
The FREQ and PHASE parameters allow you to precisely control the frequence and phase of blinking text. Browsers using Apple's QuickBlink technology or MicroSoft's TrueFlicker can support up to 65536 independently blinking items per page.
Java applets can be downloaded into the individual blinkers, to blink text and graphics in arbitrarily programmable patterns.
See the Las Vegas and Times Square home pages for some excellent examples.
Take a look and feel free: http://www.PieMenu.com
The wheels of government and commerce would grind to a halt were they not well lubricated with Bull Shit. So I created the Bull Shit Markup Language and published the BSML web page years ago, putting it on the public domain for the good of mankind. Now somebody has finally taken it seriously, and is trying to monopolise BSML!
He who controls BSML controls the Bull Shit... and he who controls the Bull Shit controls the Universe!
http://catalog.com/hopkins/text/bsml.html
Does anyone know of any prior art pertaining to Bull Shit and Markup Languages? What about VRML -- Maybe I could get Mark Pesche to testify on my behalf? c(-;
Here's a list of the huge faceless multinational corporations I'm up against:
http://www.labbook.com
"IBM, NetGenics, Apocom, Bristol-Myers Squibb, Wiley and other leaders of the life sciences industry support LabBook's BSML as the standard for biological information".
To paraphrase Pastor Martin Niemöller:
First they patented the Anthrax Vaccine
and I did not speak out
because I did not have Anthrax.
Then they patented the AIDS Drugs
and I did not speak out
because I did not have AIDS.
Then they patented Viagra
and I did not speak out
because I already had an erection.
Then they came for the Bull Shitters
and there was no one left
to speak out for me.
-Don
Take a look and feel free: http://www.PieMenu.com
'nuf said
Ok. I'll say some more. For most searches, google's algorithm does a tremendous job of bringing the relevant sites to the top of the list.
In fact, when I look for product info and don't get the manufacturer's site first in the list, I consider that a strike against them - i.e. their web presence is put into question.
"No matter where you go, there you are." -- Buckaroo Banzai
Remember the band 'The The' from the '80s. It would seem to be damn near impossible to find them via normal search techniques. :-)
I did a quick test, here are the results:
Yahoo: A (listed the band site via their web site listings; official site was 4th in list)
Google: F (quoting didn't help)
Northern Light: C (found relevant matches, but the official site was nowhere to be found on the first 2 pages)
altavista: A+ (official band site was #1 in list)
Nowadays, you need to think about "searchability" when picking the name for just about anything. That is, assuming you want to be easily found on the web.
I guess that's where dopey marketing names like 'Itanium' actually make sense. Very unambiguous search criteria.
"No matter where you go, there you are." -- Buckaroo Banzai
Imagine a company hiring hackers to break into competitors sites to put important keywords in the unthesaurus.
For example, what if you hacked 3com's site to put the words 'ethernet' and 'network' in their unthesaurus. It's unlikely that a professional company like Linksys or others would do this, but it is entirely possible.
You could argue that meta keywords should take precedence, but I'm sure the hacker would remove those words from the meta keyword list.
"No matter where you go, there you are." -- Buckaroo Banzai