Domain: searchenginewatch.com
Stories and comments across the archive that link to searchenginewatch.com.
Comments · 285
-
It's easy...
Just don't visit sites that do Paid Inclusion, or realize that some results may be "tainted". Personally, I still find that All The Web still gives decent results for most things. This is because, it seems like All The Web & Google have this thing about who can get the most pages indexed.
For a good list of what search engines show what ads in what ways, check out this page at Search Engine Watch. -
It's easy...
Just don't visit sites that do Paid Inclusion, or realize that some results may be "tainted". Personally, I still find that All The Web still gives decent results for most things. This is because, it seems like All The Web & Google have this thing about who can get the most pages indexed.
For a good list of what search engines show what ads in what ways, check out this page at Search Engine Watch. -
Google Isn't ImmuneGoogle will start adding similar revenue-generating ideas, or their financial backers will start demanding changes.
Two-thirds of Google's revenue is from ads. They are opening new sales offices (e.g. Germany), but slowing down tech hiring. That suggests they are betting on increasing ad revenue at a time when their competitors have decided that ads alone can't sustain search-engines. Google's techie hiring cutback also suggests that they don't think additional software R&D can help them grow as much as investing in non-tech areas. [Estimates I've seen of Google's revenues are US$30M - $70M a year, with their CEO saying that makes them just about profitable.]
Worse for Google, they hold few patents for their basic technological advantage, and their infrastructure (including their huge database) could be rebuilt in a few weeks by a cash-rich M$. The only protection they have against Teoma et al is their staff -- but loyalty can be bought. (Google uses options to encourage employees to stay. If the options cease to look promising, some people will leave.)
Another problem facing Google is their staff itself. 50 of their 250 employees are PhD's. That means they have lots of valuable technical knowledge, but it also means that 50 of their highest-paid employees have a collective 0 years experience in business planning. Consider that their senior management lacks a CFO at all, and is loaded with CS doctors who tend (like normal geeks) to want to work on "cool" things instead of profitable ones.
Google's proud of its lack of advertising -- but don't they also lack the marketing that would produce such advertising? Look at two of recent new products: the USENET database (cool, but what good does it do for *Google*?), and the shopping-catalog database (a possible money source...but very risky, requiring licensees to share their revenue stream and catalog-shoppers to change their habits.)
Being private means Google can avoid stockholder demand for quick profits...sort of. Their only source of funds is two VC firms, since the founders had little money of their own. The two firms [1][2]-- each of whom has a seat on Google's board -- will eventually demand return on their $25 million investment. Remember, the folks who gave Google its money want to see profits, and have *lots* of experience in tweaking start-ups to generate them.
Don't get me wrong -- Google's great;Brin & Page deserve copious kudos & cash. However, I'm watching for some danger signs:- Lots of new "Sales" or commission-based positions at company
- An exodus of employees. (With their high retention rate, "exodus" might mean 10 people.)
- Research efforts into non-Linux infrastructure.
- A lot of new product offerings that target consumers directly.
- A removal of one (or both) founders from day-to-day operations.
- More parterships with content producers.
- Another level of financing (demonstrating VC belief that they can grow.)
-
Google NOT the most Popular Search engine
-
listing of search voyeur sources
Danny Sullivan has the best list of realtime and delayed voyeur sites I've found so far at his What people search for page.
-
Re:same old bull again
Google is the most popular search engine, that's a fact.
That's a belief, that you didn't even bother to check. Check this out, and then read this.
probably even the most popular site, it's THE BEST STATISTIC YOU CAN GET.
What part of your ass are you pulling all this from? Most popular site?? Best statistic you can get???
Instead, why don't you review this guy's post.
Oh, and FUDing Linux with "Nobody uses it" and calling people who dare to look at real life statistics crack addicts, *IS* Linux-bashing.
I'm still waiting for you to look at "real life" statistics, and not ones that skew them toward your beliefs. Hey, if you can show a real world study that shows Linux on the desktop with any sort of marketshare > 1%, I'll admit I was wrong. So far, you haven't shown jack, except a lot of wrong assumptions (Google "most popular site"???).
Just out of curiosity, are you willing to admit that Linux on the desktop is just not there, or do you "just know" that it's better than what the numbers show?
-
Re:same old bull again
Google is the most popular search engine, that's a fact.
That's a belief, that you didn't even bother to check. Check this out, and then read this.
probably even the most popular site, it's THE BEST STATISTIC YOU CAN GET.
What part of your ass are you pulling all this from? Most popular site?? Best statistic you can get???
Instead, why don't you review this guy's post.
Oh, and FUDing Linux with "Nobody uses it" and calling people who dare to look at real life statistics crack addicts, *IS* Linux-bashing.
I'm still waiting for you to look at "real life" statistics, and not ones that skew them toward your beliefs. Hey, if you can show a real world study that shows Linux on the desktop with any sort of marketshare > 1%, I'll admit I was wrong. So far, you haven't shown jack, except a lot of wrong assumptions (Google "most popular site"???).
Just out of curiosity, are you willing to admit that Linux on the desktop is just not there, or do you "just know" that it's better than what the numbers show?
-
Re:why i don't love anything but google...
Google's strategy of providing a simple effective search engine has been a breath of fresh air in the industry and it's sucess has been incredible. Take a look at the latest audience reach ratings here. The graph comparing Google to AltaVista is particularly startling. When AltaVista relaunched as a portal site in Nov 99 they initially gained users but as soon as Google appeared it has been dropping like a stone. No other search engine outside the major players (Yahoo, MSN, AOL, Lycos, Netscape) has managed to maintain it's position against Google and it is likely that Google will pass Netscape in the next few months. Even more impressive when you consider that this is only google.com's market share and doesn't count hits from Yahoo or Google's international versions.
-
Integrity Versus Security
This is slightly off topic, however it does apply when you start thinking about what information is currently available.
A lot of information on the web has recently been deleted. While it is true that Google has much of this material cached, more and more information related to war, disease, and terrorism will go away.
While we need to worry about security, we also need to care about security. When folks get information, they can make choices. When choice is available, we have room for freedom.
-
Teoma discussed earlier on /. interesting article
Teoma was discussed earlier on
/.. The article featured in that posting was quite interesting in it's own right and worth a close read, even if you don't go through the comments of the earlier post.
--CTH -
Link should be ...
[...] enhances the link analysis idea [...]
-
search technology not perfected yet
launched a beta version of its search engine which enhances the link analysis idea
Sorry! The file you requested couldn't be found in Search Engine Watch. Here are some ways to find what you may be looking for: -
link still not working
Thank God for that post-posting editing function, eh Hemos?
;) Within 10 minutes, the link to searchenginewatch.com changed 4 times: teoma%20.html, toema.html, toema. html.. Oh well. -
Bad URLThe poster accidentally put a space in his http tag.
-
How flimsy is this?
If this page was any more slanted, Nader and his consumer watchdogs could go after its authors for kinking people's necks. "Paid placement"? In a "sponsored links" box? How deceitful!
You know what else I head, you can advertise cars... in the newspaper's "Wheels" section! For money! Therefore, all journalists are whores.
-
Re:Typical Slashdot FUD
Wow, I can't imagine why Google would need 8000 computers for it's search engine when you would think others do it with less.
-
Re:Don't just sit there, do something about it !
Search engine technology was well established before these patents where filed. I remember using Lycos in '95 or '96.
I don't remember the first time I used a search engine, but I suspect it was in the autumn of 1993.
IIRC, the search engine I used, was either the WWW Worm or WebCrawler (most likely the former at first, see below), and AltaVista came a while later as a "revolutionizing" new thing from DEC, partially to promote their Alpha 21164 processors (launched in 1995).
Search Engine Watch seems to agree with some of what I remember; AltaVista opening in December 1995 and WebCrawler in April 1994.
That's about where you get when it comes to prior art; the WWW wasn't much before 1993, and DEC most certainly wasn't the first player in the open. Proving prior art to most of the claims should be relatively easy, unless the patents are so specific that they only cover the things that AltaVista did and nobody else had done before (I don't quite see how that happened, the clue about AltaVista was that it was fast). -
Re:Don't just sit there, do something about it !
Search engine technology was well established before these patents where filed. I remember using Lycos in '95 or '96.
I don't remember the first time I used a search engine, but I suspect it was in the autumn of 1993.
IIRC, the search engine I used, was either the WWW Worm or WebCrawler (most likely the former at first, see below), and AltaVista came a while later as a "revolutionizing" new thing from DEC, partially to promote their Alpha 21164 processors (launched in 1995).
Search Engine Watch seems to agree with some of what I remember; AltaVista opening in December 1995 and WebCrawler in April 1994.
That's about where you get when it comes to prior art; the WWW wasn't much before 1993, and DEC most certainly wasn't the first player in the open. Proving prior art to most of the claims should be relatively easy, unless the patents are so specific that they only cover the things that AltaVista did and nobody else had done before (I don't quite see how that happened, the clue about AltaVista was that it was fast). -
Foundation needs Fixing
I think the "search industry" needs fixing if the internet is ever going to live up to its potential. The November edition of Danny Sullivan's SEARCH ENGINE REPORT makes the problem pretty evident, and I think my post to this other slashdot subject board offers the solution.
-
The Bugaboo is Relevancy
The biggest problems with Search Engines, is relevancy. The problem being that when I do a search for a word like "magic" the search engine will return results based upon its algorithm, but trying to produce relevancy from a single search word is just about impossible as a task. With a term like "magic" I could be looking for:
- Magic as in Magic the Gathering - a collectible card game I used to play.
- Magic as in the occult.
- Magic as in sleight-of-hand.
Or any of a large number of subjects that I could have in mind at the time of my search. The results from a search engine such as Google, will rank pages which contain the word magic in the page title, multiple times in the body of the page, in the META tags, in or near HREF links, or which are linked to by many other sites higher than those which do not meat these criteria. It differs from search engine to search engine, depending on criteria.
None of these criteria for ranking take into account the nature of my query - what I had in mind when I did the search. In other words they do not directly address the relevancy of the results. If a search engine offered me the opportunity to pick from results it returned and gradually refine the search to produce better results it would be addressing this situation. Some do with a "search again in this result set" or "more like this" type option on their results pages, but its still kinda mechanical, and not all that reliable.
I think it will take some sort of AI analysis of search requests based on user-feedback of some sort and with a learning capability to surpass the current crop of search engines. Until such time as we have some smart systems working behind the scenes on searching any improvements will no doubt be incremental rather than radical.
Now, as for keeping the specifics of how a page is ranked secret I think its absolutely necessary. There is a constant, quiet, war going on between the search engines and the folks who want to get their websites listed at the top of the page when a result set is produced. The people who regularly submit their sites to the various search engines, with each search engine receiving a specially made page generated just for its benefit to ensure that the website gets the best ranking possible etc, are not interested in how accurate the search engine is, they simply want to come up first. The folks at the search engine generally want the most relevant pages to be returned. There is an essential difference of purpose between the two camps.
On the side of the search engines, they have control over their ranking system, and change it peridically to prevent abuse of the system. The folks who are seriously trying to get to the top of the heap in the search engine results are constantly trying new methods to get ahead.
For instance, at one point some webmasters were creating their webpages with a lot of text at the bottom of the page that was the same font color as the background, so that the search engines would spider the contents of the page but users would never see those contents. This let them list all sorts of words that scored higher in the search engines returns, but had little or no relevancy to the page contents. The search engines got wise to this trick and now most will penalize you for using it.
Opening up the search engines ranking rules would only make the system easier to abuse more precisely. No matter how many eyeballs pour over the code, it will still not change the nature of the guy who will use any method at his disposal to get his porn page returned as Link #1 when you do a search for MP3 because its the hottest term currently being searched for.
Google has altered this battle somewhat by ranking pages higher in their results based on how many other webpages contain links to that page (and also based upon the nature of the linking page. They use a distinction between pages which contain a lot of links - like a web directory such as my own Omphalos - and those which are linked to by a lot of other pages. Both get points for different reasons and in different instances. I don't remember the details), but even this is open to abuse, although with a bit more effort required. I know of a website which has over 200 different URLs registered and operational, all of which contain pages which point back to the main URL they are promoting. When a search engine such as Google goes to anaylize this website, it will rank it higher because it is linked to by so many separate domains and so many separate pages on those domains. Its harder to abuse, but it can be done.
Of course, this is all basically irrelevant, since each of the search engine companies keeps their methodology and their source code highly protected. It is worth millions of dollars in revenue, and I cannot honestly see any of them deciding to release their software in this way.
If you have not noticed, practically every graduate student who devises a new and effective method of indexing and ranking search results ends up creating their own company once they have delivered their thesis and entered the real world. That is certainly how Google started, and I believe is also how Ask Jeeves got going. I am sure that most of the other main search engines have gotten going in the same or similiar manners.
All that said, If you want to play with a true search engine that is GPLed and works quite well, although not on the scale of a Google or an Altavista, try UDMSearch. It runs just fine under Linux or FreeBSD (I have installed it on both in the past) and I am using it on my site under Solaris. It is still in an intense development cycle and new versions are released regularly, but its worth exploring if you are interested in how a search engine works, and want to get your hands dirty.
For more information on the big boys, check out Search Engine Watch, and finally, if you are simply interested in Space, Space Exploration or Space Science, check out SpaceRef.
-
Search Engines - Skewed Results - Doing Research
I've been following the wonderful world of search engines for several years in my role as web educator and maintainer for a University library. Skewed results seem to be an inevitable part of commercial engines - Alta Vista, et. al. were doing it long before Google burst on the scene. One of the great weaknesses of the Internet is the inadequacy of search engines and directories in support of serious research. While librarians seem to think that they could nicely organize the whole thing, I have my doubts that Dublin Core metadata or some extension of MARC into site classification will ever solve the problem. That said, Google is still probably the best general, "comprehensive" search tool available today. Expecting dispassionate morality from a business entity, however, is naive -- so naive that I'm a bit surprised that SlashDot's cynical staffers find it noteworthy. If you'd like to dip into the sordid world of internet search tools check out Search Engine Watch -- it's a good starting point to find out about business relationships as well as characteristics and performance of the various engines.
-
Re:Search Engine Comparisons> Is there a site out there that has thorough and *unbiased* comparisons of the different search engines out there?
Search Engine Watch is a monthly newsletter which does in-depth comparison reviews of search engines.
-
Wait, isn't deep linking OK?
Matter of fact, Slashdot had a story about this earlier this year, where a federal judge ruled that deep linking was OK, as long as people knew that they were going to someone else's site. The RIAA had a similar situation with Lycos' MP3 search page, but nothing came about it.
In this case, if MP3Board.com is throwing links to different FTP servers out, but people know that these sites aren't run by MP3Board.com, then doesn't that make it OK?
Thoughts?
-- -
Google Vs. Links2Go (was Re:Shameless Plug)There is some similarity between the way Google uses link popularity to weight links and the way Links2Go uses link popularity to weight links. However, there is also similarity between the way Google uses link popularity and the way Inktomi, Excite, FAST, Go, and Northern Lights uses link popularity. In fact, all the major search engines use link popularity to rank search results [Search Engine Watch].
The difference between Links2Go and Google is that Links2Go learns a topical directory of links from an analysis of web pages. See, for example, links and topics related to Linux. Links2Go chose these topics and links because web page authors as a whole tended to use those topics and links to organize information on the web (not because a single human somewhere decided that they were the most interesting links).
This allows Links2Go to organize links by topic, allows users to navigate the topic hierarchy (topic "drill down") as they would with one of the hand-created directories, and bias keyword search based on topics and links related to the search terms. These are all features that are lacking in Google, because Google has no taxonomy of topics or method for classifying links by topic.
I like Google and use it all the time when I'm looking for specific information. But, when I want to research a topic in general and explore links related to a topic, I use Links2Go.
-
Good to see a study on this...
Of COURSE there are a bunch of 'dead end' and non-connected sites. There are a thousand web rings for Leonardo DiCraprio just languishing, having been abandoned for whoever is hot now... argh...
I love Google, but lately when I search I get more results consisting of dead links and posts to message boards than any useful info. I've been on the mailing list for the Search Engine Watch newsletter for a couple years now, and while there's a lot being done to weed through all the fluff, IMHO the fluff is growing at too high a rate for the technology to keep up with presently.
Anybody currently active in the industry got an insight into how search engines are combatting all this expired flotsam?
The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk -
WAP search engines
Taken the complaints many people have that an HTML page can not always be rendered very nicely into WML, I tried to find out search engines that searched exclusively for WML pages. I found at least four different engines. Two of them seem to be more or less bogus (Waply and WapWarp). The best one seems to be FAST Wap Search. Are the any other good WAP search engines out there?
-
AltaVista.com wasn't squatting
The company that regiestered it was named AltaVista. It was poor judgment on Digital's part not to name it something with a name already in use and a domain previously registered. Though the company that did own AltaVista.com later capitalized I'm sure they also encountered way more traffic than they were planning for on their website.
-
Backing up the WebAccording to Inktomi, the web today is about 1 billion pages. I haven't found any estimate of how much data there is in gigabytes (somebody knows any?), but if you just limit yourself to the textual data, there should be more than enough room on one disc to backup the entire "hypertext part" of the web (and maybe some graphics too
:-)But then again, when (if at all) this becomes available, the Web has probably grown way beyond that... Oh well.
Another good reference is this page at Search Engine Watch.
-
rank links according to use by other searchersOne thing that might possibly improve search engines is a new layer of link-relevance ranking. Most search engines rank links based on how many of your keywords appear in the text associated with that link. Google has added a nice new wrinkle with their "how many sites link to this site" ranking. It seems that an other layer of ranking would be possible and useful, one based on the behavior of other link searchers: how many times a given link was chosen by someone else who looked for the same (or similar) keywords as you did. This won't work, of course, if most search queries are unique, but maybe most aren't, I don't really know. Regardlesss, it sure is amusing watching what people are searching for at search-voyeur sites.
An aside about the changing nature of the web-wandering public:
Wow, I've just come back from checking out the unfiltered (i.e. allows porn-associated searches to appear) metaspy voyeur site. Folks, I think the internet public may be changing. When I first checked this out for several weeks ~1 yr ago, most of the searches were porn related. This time, out of about ~100 search queries, I saw only a few of sex-related ones. Are things changing? That would be nice.
______________________( // ///#\) -
true content vs. commercialism
Another thing the search engines could do is figure out how to ignore "trolling" pages. i.e. those which are nothing but index spam, a catchy title, and a refresh tag to ship your browser off to fetch their actual main page
Right on! Nothing burns me more than to see the 'enter here' page when I go to a site. Some flash or other animation or some huge graphic that loads up, and then either sits there, forcing you to 'click to enter' or redirects you to the rest of the site, which is where I wanted to be in the first place...what purpose does that serve? Oh, sorry, it impressed the client you built the page for. (I will admit I have done this once or twice, but not until after trying to talk them out of it.)
As far as searches, I agree that we need a new standard, one that is not only intelligent and dynamic, but that can outwit those who try to trick it. I believe that's quite a way off... until then I'll keep reading /. and Search Engine Watch.
The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk -
Trademarks and Meta Tags> Is there a precident for this case?
There have been a number of cases involving meta tags. I keep a page about the topic at http://searchenginewatch.com/ resources/metasuits.html.
The short answer is:
-- Anyone can sue anyone
-- Winning is another issue. The legalities of trademark terms in meta tags (or indeed, anywhere on a web page) are still being sorted out. Unfortunately, there's been too much emphasis on meta tags as some how requiring special regulation. -
Let's talk precedents
A friend of mine who runs SearchEngineWatch has been very interested in the meta tag lawsuits and has actually been an expert witness for Terri Welles in Playboy's meta tag lawsuit against her.
He has a very interesting page listing Meta Tag Lawsuits which summarizes some of the recent cases, whether they have been settled, and the importance of the settlement.
My understanding is that the courts have ruled that you cannot use trademarked meta tags if you are attempting to deceptive with them (see Oppedahl & Larson v. Advanced Concepts) or attempting to "hijack" another web page. But if you have a legitimate reason to be using a trademark term to properly catalog your site then its use would be legitimate.
I don't think the courts have made a definitive ruling on the legitimate use of trademarked terms in meta tags yet. But that might happen in the Terri Welles countersuit. And she did win her original case allowing her to use trademarked terms on her site. And Playboy was denied an appeal.
I'd say Pez fan sites have a legitimate reason to use the term in meta tags based on the Playboy vs. Terri Wells case. But the sad thing here is that PezCandy has a right to sue and small players without the resources to fight back will simply back down.
Hopefully we'll get a definitive ruling on this soon from somebody who can afford to fight back.
Joachim
-
Let's talk precedents
A friend of mine who runs SearchEngineWatch has been very interested in the meta tag lawsuits and has actually been an expert witness for Terri Welles in Playboy's meta tag lawsuit against her.
He has a very interesting page listing Meta Tag Lawsuits which summarizes some of the recent cases, whether they have been settled, and the importance of the settlement.
My understanding is that the courts have ruled that you cannot use trademarked meta tags if you are attempting to deceptive with them (see Oppedahl & Larson v. Advanced Concepts) or attempting to "hijack" another web page. But if you have a legitimate reason to be using a trademark term to properly catalog your site then its use would be legitimate.
I don't think the courts have made a definitive ruling on the legitimate use of trademarked terms in meta tags yet. But that might happen in the Terri Welles countersuit. And she did win her original case allowing her to use trademarked terms on her site. And Playboy was denied an appeal.
I'd say Pez fan sites have a legitimate reason to use the term in meta tags based on the Playboy vs. Terri Wells case. But the sad thing here is that PezCandy has a right to sue and small players without the resources to fight back will simply back down.
Hopefully we'll get a definitive ruling on this soon from somebody who can afford to fight back.
Joachim
-
Meta - Lawsuits
You all may want to check out this link... http://www.searchenginewatch.com/resources/metasu
i ts.html It has some good info RE: meta tags and and lawsuits. The company I work for has seen our competitors use our trademarked name within they're meta tags in the hopes of trying to capture our traffic and to place higher on search engines. It may not matter to review sites but it definitely causes a problem when people try and snipe your customers away by using the name you've built. Over the hub, Through the switch, Around the firewall...Nothing but 'Net -
Re:Why I have given up on search engines....
The main problem with search engines, is the results are usually polluted with porn, warez and other such stuff, by people with loads of METAs or whatever the search engine looks for. What we need is a search engine that starts (from either a directory, or a classic search engine), and re-ranks the results according to how useful people found the site they clicked on.
But, wait a minute, they're here! There are several sites that use a popularity-based reranking system, excite and snap.com being among the most popular. Of course, the other engines are following in their footsteps - it appears that direct hit (who power hotbot and several others are using the same sort of thing.
It'd be even better if it could group users into profiles according to their (user-selected) demographic (eg: doctors, british). That way, if an American types "football" they get links about American Football (gridiron?), if a Brit types "football" they get links about soccer, and if an Australisian types it they get links about rugby.
Search Engine Watch has an article about such a system, by the same people who provide the technology behind snap.com (disclaimer: okay, I work at globalbrain, but I'm talking generally).
See also, the cnet Search Engine Shoot-Out