Why AltaVista Lost Ground To Google Sooner Than Expected
techtsp writes: Marcia J. Bates, UCLA Professor Emerita of Information Studies recently explained why Google's birth led to the downfall of AltaVista. According to Bates, early search engines including AltaVista adapted the classical IR methods. At the other hand, Google founders started off with a completely different approach in mind. Google successfully recognized the potential of URLs, which could be added to the algorithms for the sake of information indexing altogether. Google's modern age techniques were a huge boost to those older techniques. Whatever other business and company management issues AltaVista faced, it was the last of the old style information retrieval engines.
https://www.quora.com/Why-did-Altavista-search-engine-lose-ground-so-quickly-to-Google/answer/Marcia-J-Bates
The article says "URLs" when the Quora post, cited as the source, says LINKS. Also the article is basically devoid of any information, other than "Google did better because it used LINKS to help determine ranking." Thanks for the headline, with a summary, linking to an article that misquotes the linked source, that has a healine worth of information. No really, thanks.
Often wrong but never in doubt.
I am Jack9.
Everyone knows me.
What is this non-article doing on the frontpage ? There is absolutely no useful content or detail, the summary is even better than the complete article.
Submitter "techtsp" seems to just spam links to this pc-tablet low-quality site, guess one randomly passed the filter.
I found the summary of this article very confusing. Phrases such as "At the other hand" and "indexing altogether"?? Oh, and call me ignorant for not understanding what "IR" means. Infrared? Then I read the article and found that the summary is just a badly strung together quotation of the text, including all of the grammatical errors. I'm still confused, but slightly less so.
Literally. That's what the article says if you click through the summary and rewrite to actually read it. To quote: "What the Google founders recognized about search on the Web was that information about LINKS could be added to the algorithms." Which isn't wrong, of course, but if you call yourself a nerd you already know a hell of a lot more about the page ranking algorithm than this already.
Altavista was popular for a small web. Once it got big we needed a better tool.
Now us tech guys oddly enough who seems to be in charge of state of the art technology are very reluctant to changes. So Altavista didn't change fast enough for the newer larger web.
Google when it came out it was for a larger web, and was designed for the larger web. And what made it stay, was the fact that they weren't afraid to give us die hard techies the middle finger and make a lot of upgrades, however they did it in a way that most of us didn't notice it much.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Providing links to search results was obviously far more useful to web users than Infrared.
Duh.
Whenever someone says Information Retrieval I think about that agency in Brazil.
Some drink at the fountain of knowledge. Others just gargle.
Then you are doing it wrong. I find everything that is not totally obscure like some error cases in a library almost nobody uses.
First paragraph at Wikipedia: "AltaVista was an early web search engine founded in 1995. It was once one of the most popular search engines, but it lost ground to Google and was purchased by Yahoo! in 2003, which retained the brand but based all AltaVista searches on its own search engine. On July 8, 2013, the service was shut down by Yahoo! and since then, the domain redirects to Yahoo!'s own search site.[2]"
Second and third lines of TFA: "Founded in 1995, AltaVista was a very popular Internet search engine website. Nevertheless, AltaVista lost ground to Google and was purchased by Yahoo! in 2003. Ten years later, Yahoo! officially shut down AltaVista in July 2013 and redirected the domain name to its own search engine website."
Hmm...
Firehose moderation picked this article? Editors allowed it in? or did Dice just take a big payoff?
Some drink at the fountain of knowledge. Others just gargle.
I don't specifically recall using Alta Vista, but I do remember how terrible all of the search engines were before Google came along. They didn't return the most relevant results, they returned the web sites that paid them to be placed higher; Google was the first one to actually do what the user wanted from a search engine - return relevant results.
Instead of ranking relevancy by hits of a word inside the document, which was how it was done before, google ranked relevancy by references to the content.
Note that most in-house wikis still rank things the old way, which is why most search results from your internal wiki suck. Even google's custom search on your internet page sucks...because without humans performing relevancy ranking via links google is just as bad as the old stuff.
By the time Altavista got popular, the interface was a cluttered mess where you could hardly find the search line. Google came with an almost empty screen with a logo and a search line. You'd have switched just to save your eyes. More like the good old Webcrawler interface.
i don't pretend to understand the system, but if this made it through, slashdot is clearly broken.
it does not seem possible that rubbish like TFA could make it through the review process.
am i therefore correct to assume that slashdot, being owned, now posts what it wants, when it wants?
irrespective of the algorithm that supposedly IS slashdot?
please tell me i'm wrong..
i usually am.
The article says "URLs" when the Quora post, cited as the source, says LINKS. Also the article is basically devoid of any information, other than "Google did better because it used LINKS to help determine ranking." Thanks for the headline, with a summary, linking to an article that misquotes the linked source, that has a healine worth of information. No really, thanks.
It's a paid-for "article" to a ad-infested link-farm.
Here's a link to the ACTUAL story: https://www.quora.com/Why-did-...
If you want news from today, you have to come back tomorrow.
Altavista had better results than Google for years, especially because you could use all sorts of search modifiers that Google didn't support till later like -no_pages_with_this_word or +must +have +all +these and logical operators.
But then as the leaders they got cocky and wanted to be a portal and filled up the page with so much crap and spam it hurt. Meanwhile Google's page was still just search box, go, I'm feeling lucky, and a few other tiny things.
That's why I switched after Google got good enough that they were comparable, NOT better. It was just less annoying. That's why most of the people I knew back then switched.
AltaVista realized too late what they'd done and tried to rebrand as 'Raging' with just a simple search page, but by then it was too late.
I'm sure the Google approach is much more scaleable but the article seems terribly confused and like it's trying to make some bizarre sense out of a cultural artifact from a time they can't comprehend.
...a coworker pointed out that this new search engine "Google" was much better for finding academic papers. At that time, Google was excellent for academic papers, but useless for most other things.
My how times have changed. Not that I can obtain academic papers without paying through the ... nose ... anyway.
If you want news from today, you have to come back tomorrow.
not google that cause AltaVista to fail.
by TheSpoom (715771) Uncaring Linux user here. I have nothing to add to this but please continue. *munches popcorn*
Yes, Altavista was better than Yahoo. I remember reading that Yahoo was a static directory, updated by humans; whereas AV had a newfangled web crawler. Anyone remember the term 'spider'? Altavista wasn't known all that well though, and it was part of my geek cred to show it to users. And usually, it found what they were looking for.
Pretty much the moment Google came on the scene though, it was better than Altavista. AVs answer? Plaster the front page with ads and 'content'. Make it a 'portal' to the web.
Heh. Wrong answer.
A problem was that these search engines, unlike Google, were doing a kind of "grep" of the word to find through the whole data, to yield a bigger number of results. Searching for "book" gave results like "bookmark", "bookmaker", "bookkeeper" etc... While Google returned results about books and derivatives.
Slashdot, fix the reply notifications... You won't get away with it...
Today search is ALMOST ENTIRELY SHIT. It is used because shit is king.
If you think that, you don't remember Alta Vista, which had millions of links to "Page not Found" and in the search results had multiple listings to the same (often broken) page.
"First they came for the slanderers and i said nothing."
There's a thing called the science citation index that sorts papers that are referenced more to a higher score than those that are not referenced much, and it's a good way to find those papers on a topic that others have found most useful.
Google saw it worked and applied a similar method using links (as the above poster wrote). That method brought human judgment that had already been applied into the mix and enabled them to index far more rapidly than AltaVista with better results than AltaVista's simple keyword searches. It was more likely to lead people to a key site that many used instead of an abandoned fan site.
That's the main difference.
Inertia? AltaVista, Hotbot, and Excite had the inertia. They were the big players when a couple of college students thought up the idea that became Google. AltaVista and the other established players had the inertia.
The established search engines also had algorithms based on word frequency in various parts of the page. I did search engine optimisation back then, so I studied it in detail. The simplified explanation is that searching for "Einstein" would return whichever page had the word Einstein repeated the most on the page. Minus points for repeating it "too many" times.
Google had a revolutionary idea. If lots of good pages link to abouteinstein.com, It's probably a good page. That's Page Rank, and it worked quite well. That's the far and above the most important reason Google won - their ranking system was far superior because it was based on a different, better, idea.
* You might wonder how Google knows which pages are "good", in order to calculate which pages are linked to by good pages, and are therefore also good. It's recursive across the whole internet. If lots of pages link to princeton.edu/physics/, and princeton.edu/physics/ links to lab.gov/particles/, then lab.gov/particles/ gains some "good" points. Specifically, it gains an equal share of the Princeton's pages rank value as all other links on Princeton's page. In other words, whatever value a page has, that value is divided equally among each page it links to. So a page "vouches" for each page it links to, but if it links to many pages, it can also pass a small amount of credibility to each.
Learn your lesson. Avoid posting anything sourced from website such as these. There are lots of them, buzzfeed of IT related news, mostly done by Indians, and all they do is copy from another websites and even forum posts
When Google first came on the scene, most people accessed the internet by dialup. Google's simple page loaded faster. Thats it. There is no other reason. Anyone who doesn't remember the "dialup" internet cannot comment on why one page was more popular than another,.
Around '92/93 I was an Alta Vista user. They they decided that if you shovel money their way they would put your search results to the top of the list. I, and evidently a couple others, said "fark that" and went looking for alternatives. Google was the alternative that gave the best search results.
Fark Alta Vista, I'm glad you're dead and buried.
Alta Vista decided to go the portal route, with a bunch of crap on the search page. Google came out with a simple look, with only the keyword field.
https://web.archive.org/web/19981202230410/http://www.google.com/
vs.
https://web.archive.org/web/19990125093146/http://www.altavista.com/
"Even for Slashdot, that was a very obscure reference!" - Anonymous Coward
Altavista tried to monetize their search by biasing results based on ad revenue; Google didn't (at first). It turns out, people aren't interested in a biased product, even if it's free.
From my recollection it was because it did away the mess of the portal concept, did away with intrusive ads and focused on search. It was simple and effective. Everything else was a marketer's wet dream, but a mess for anyone else.
I am sure people who used the net back then can confirm that it was the simplicity and elegance of Google that gave it the advantage. I certainly switched because of that.
Jumpstart the tartan drive.
Hit 1: Wikipedia entry.
Hit 2...n: Random URLs.
Though much has made about "the potential of URLs" for searching, aka PageRank, my own experience as someone who used AltaVista up to the moment he discovered Google was that Google was the first full-text web search engine - or at least the first one I experienced.
Prior to Google, all the search engines simply indexed extracts of pages, primarily meta-data such as a page's own description of itself. That led to frequent disconnects between the preview content provided by the search engine and the actual content of the resulting page. Sometimes, I would search on a quoted term, see that on the search results, then not find it on the page. Very frustrating. Though I preferred AltaVista at the time, the other major search engines of that time all had the same problem and were all pretty comparable in terms of user experience.
Upon first using Google, it quickly became clear that Google was different. You could actually tell from the user experience that it was a full-text search, unlike all the others. Basically, the problem above never happened. Although PageRank may also have been an important part of its success, the difference between full-text search and what the others were providing at the time was so compelling that it just didn't matter: there simply wasn't any comparison from the user's point of view.
Now, (and for many years past) all of the major search engines provide full-text search so we just take it for granted now. They probably also all use something like PageRank, which probably isn't to hard to implement once somebody has thought of it. Personally, I find it hard to tell the difference between them now, though I still prefer Google, probably simply because of having had a long and happy experience using it. (Oh, except for when they shut me off once years ago for doing too many queries via a Python script...)
They had a complimentary idea, not a different idea. Page Rank ranks a page in general terms, but tells you nothing on if it has anything to do with Einstein (from what I understand). You still need some form of the old way of judging the Einsteininess of a page.
Troll is not a replacement for I disagree.
Yes, Altavista was better than Yahoo. I remember reading that Yahoo was a static directory, updated by humans; whereas AV had a newfangled web crawler. Anyone remember the term 'spider'? Altavista wasn't known all that well though, and it was part of my geek cred to show it to users. And usually, it found what they were looking for.
My progression - Archie and gang - http://www.albany.net/allinone... which was a lot of different search engines depending upon your quest (now some sort of search engine yet still a bookmark)- AltaVista (Which was good for hacks and cracks) - Yahoo (for it's then search ability) - Google (it's result listings then rated by the most active web pages, and sparse look).
If you POP3 your mail you understand just how badly Microsoft and Yahoo want to route your Email. I still have a handle used in my E-mailer Forte Agent hard coded by Microsoft live. (not off topic, they are fighting Gmail).
It was a technology demonstration of DEC's (remember Digital Equipment? If so you are old!) new Alpha chips and servers, so powerful that they could index the entire early 1990s web. A very minor side project.
When Compaq bought DEC, they were surprised to find that they had also bought Alta Vista. Around then somebody tried to commercialize it and killed it in the process.
https://en.wikipedia.org/wiki/...
My progression - Archie and gang - http://www.albany.net/allinone... which was a lot of different search engines depending upon your quest (now some sort of search engine yet still a bookmark)- AltaVista (Which was good for hacks and cracks) - Yahoo (for it's then search ability) - Google.
Edit: something didn't look right after the submit, a double check (bookmarks) it was astalavista.box.sk not altavista which was the popular search engine at the time
It probably also helped that Google was a simple UI, where AltaVista and all the others were aiming for the portal type UI's with ever increasing clutter and load times.
I wonder how yahoo still exists and how people still user their services despite all the security problems they had in the past.
The details are already vague, however as far as I recall, Google was so much better at finding things, and altavista links were getting stale and polluted with a lot of rubbish in between. It took so much more effort to find links related to your actual search in Altavista.
Plaster the front page with ads and 'content'.
Hell, yes, It took longer to find the frigging search box than to return the results. Also, IIRC, Altavista didn't even own altavista.com, which was some completely separate business. cf. Google.
Google had a cooler logo.
systemd is Roko's Basilisk.
It was about speed of loading. Google had a blank white page with a search box. Altavista had gone the horrible "portlet"-style approach of gluing loads of things together. Google's page loaded quickly, Altavista's did not.
When I, and those I was working with, first switched to Google the actual search results were different to what you'd "expect" (Altavista's results were the gold standard, any deviation was looked on suspciously) but they were about the same in quality. Later they became better, but it wasn't the driver at first - was all about the clean page.
No that is not what killed them. They were already dead. Altavista and the other search engines of the era that started off good (such as hotbot), all died due to an inability to prevent gaming. Google's techniques were obtuse for a long time, and very robust in the face of attempts to influence page-rank and hence continued to provide a useful service even after becoming popular. Altavista was a fantastic search engine until others began to abuse it; a typical "tragedy of the commons" case, or a "survival of the fittest", take your pick. I think something has been lost in the inability to search based on page content to find useful, but more obscure, information (AND, OR, NEAR etc). Having said that, the web has changed so much in this time, Wikipedia, for example, greatly fills this niche. Ultimately only humans can determine useful content and hence the need to rely on popularity or human vetted content (such as journals). So here we are.
This was my first reaction after hearing an ACM presentation circa 1992-4 about this new search mechanism, that I realized after shuffling through their academic gobbledegook was essentially page ranking -- even with its "refined" inheritance method. I thought -- "A search term will be judged the most relevant because of how many pages link to it. Purely a frequency (popularity) criterion with "fancy" ways of using frequency to assess quality" 20 years later, the Kardashians become the Gabors of the net...
Charly in SJ
The whole bit about Google using links as an integral part of PageRank (and this being different from AltaVista, et al) has been public information since around the day Google went live. Google, for all their secretiveness, has never been shy about that bit. (And, of course, it led to the creation of the SEO industry, since AltaVista-baiting by simply stuffing keywords colored white past the article over and over stopped working.)
"in order to calculate which pages are linked to by good pages, and are therefore also good. It's recursive across the whole internet"
You speak in the present tense, but I think it's widely believed that today, the original pagerank algorithm plays only a minor role. The original algorithm was very easy to game by building a site with a million auto-generated pages, all linking to each other and to the main page. How they actually do it today is a closely guarded secret, although it's likely that links between sites and internal links play a role.
Avantslash: low-bandwidth mobile slashdot.
On top of this, Google was fast!
It is hard to imagine now, but in those days "surfing" included a good deal of waiting, because of slower connections and probably slower servers. I remember Altavista being significantly faster than e.g. Yahoo search, and Google being faster than Altavista, most likely because the two academics that started it had a more sober web site.
It was about speed of loading. Google had a blank white page with a search box. Altavista had gone the horrible "portlet"-style approach of gluing loads of things together. Google's page loaded quickly, Altavista's did not.
True! Suddenly I didn't have to search for the search box among a cacophony of blinking and bleeping cascades of disturbance. It was like walking the red light district and then stepping into the library. And the search results were presented just as soberly.
Aaah! Peace! I immediately fell in love. <3
Before Altavista there was Webcrawler. But Yahoo took over both and replaced the engines with their own crappy variants.
Before them there was Veronica and Gopher.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Bing - not even a serious competitor. Heck - even DuckDuckGo do a better job.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
As soon as someone figures out how to play the ranking game the rules will change. It has been played over and over again. If I remember correctly there was a hack that caused a search for "general failure" (or similar) to direct to G. W. Bush.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Rather - Yahoo killed the decent engine and put in their own and that was the final nail in the coffin for Altavista.
There was some crap result in the searches, but with good search statements you could get what you wanted until the Yahoo engine made it impossible.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Every sentence is a paragraph. Most aren't even complete sentences. It's like a random collection of thoughts. Taken from other sources. Or something.
There. Fixed it for ya :^)
The search page was slow, was full of ads and the results were almost irrelevant. The search quality really took a dive when sites started loading up the metadata with keywords. Unsurprisingly when a better search engine appeared everyone jumped ship.
Today search is ALMOST ENTIRELY SHIT. It is used because shit is king.
If you think that, you don't remember Alta Vista, which had millions of links to "Page not Found" and in the search results had multiple listings to the same (often broken) page.
I remember AltaVista well and broken links in many search engines in the 1990s. But just because search engines don't direct you to broken links as often anymore doesn't mean they're better.
Now, rather than millions of broken links, search engines direct me to millions of websites that don't contain my search terms and often have nothing to do with what I'm searching for.
Is that really much of an improvement? Actually, I think it's worse -- because it takes me a fraction of a second to see a 404 error and go back and try a different hit. But when a search engine directs me to mostly links that have nothing to do with my search terms, it can take me many seconds of skimming to discover that a particular hit is bogus.
Google probably reached its optimum usefulness a little over 10 years ago. Ever since, it has gradually tried to become more like "Ask Jeeves" and less useful for people who actually have serious research to do. First, you had Google offering corrections to misspellings (a useful feature), but then those would replace your actual search. Then you had dropping of the default "AND" operator that made Google efficient and useful at the beginning. Then they dropped the "+" operator a few years ago. Then they broke double quotes and verbatim search to various unpredictable degrees. And now whenever I search for obscure terms, by default Google tries to replace them with what it thinks are "synonyms" (but which often aren't, or which I don't want). So I stopped using Google for many of my default searches a few years back... and there's really nothing out there that rivals the efficiency and precision of Google ca. 2000.
Bottom line: If you're a moron who can't spell, can't bother to think about what words might actually appear in what you're looking for, and likely don't even really have a clue what you're even looking for -- well, today's search is much "better" for you. Granted, 90+% of people are probably like this, so that's why Google targets the "lowest common denominator." If you're actually looking for a serious SEARCH engine, it's not Google anymore.
The problem is the rise of SEO. If Google just gives the straight, obvious answers (which they did 10 years ago), then people will SEO and you will get garbage. Google really had a down period 7 years ago, because of all the SEOers trying to push garbage pages up in the results.
"First they came for the slanderers and i said nothing."
I remember that Google loaded much faster over my dial up modem but mainly the quality of the porn^H^H^H^H tech results was better. Yes, better results for technical information, not porn, nothing whatever to do with porn.
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
It still "works" to create thousands of pages which link to the page which you want ranked high, but it doesn't (and didn't) work that that well because the feeder pages lack PR. I know of only two significant, but changes in that regard. As always, a page can only pass on the PR that it has. Because noone links to your feeder pages, they each have an actual PR of 1/(number of pages on the entire internet) . To have a really high PR without external links, the number of pages you create has to be a significant fraction of all pages on the internet. So millions of billions of pages WILL create PR.
Two simple new additions make manufacturing PR more difficult. First, duplicate page detection. You need millions of DIFFERENT pages. Second, and more important, Domain Rank. It's calculated just like page rank, but with domains instead of full URLs. If lots of different domains link to wkp.org, then wkp.org is ranked high. Pages on many different domains link to wikepedia.org, so wikipedia.org has strong domain rank. To manufacture this, you need not thousands of pages, but thousands of domains. From there, it's simple fraud detection to find the few people who buy up thousands of domains and put bogus pages on them. Are thousands of simar pages, devoid of content, hosted at the same place? Might be BS, and therefore penalized specifically- without changing the basic algorithm.
The key algorithms don't need to be secret, the thresholds for certain penalties do.
Understanding PR, one way of finding good pages relevant to Einstein is obvious:
1) You already have the PR, so you know how popular each page is.
2) Disregard all pages that don't mention Einstein.
3) Run PR again, starting with each page's general PR as the initial seed.
From this you'll find that many good pages which mention Einstein link to wikipedia.org/einstein/. Therefore, that page is probably relevant to people looking for information about Einstein.
If you want to, you can also subtract a portion of the page's non-Einstein PR. In other words, although Einstein pages link to blah.com, so do NON-einstein pages. Links from pages which do NOT mention Einstein weaken the inference that the page is relevant to einstein. So, total Einstein rank is the PR from Einstein pages minus the PR from non-Einstein pages.
Altavista was great, and I loved being able to narrow my results by using formal search parameters etc. Fine-tuning a Google search was initially possible through adding a word or two, but now with all its ignoring words, attempts at identifying typos etc, rather than tuning, each new search is just another throw of the dice. Keep rolling until you get a 6.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
"miserable failure", "french military victories" "worst band in the world"
Snowden and Manning are heroes.
My experience with Alta Vista was that sometimes it seemed to go "off-track", answering, not my question, but a similar question. When I tried to refine my query, it still seemed stuck on what it thought I asked before, not what I was asking.
I later found out that they used "Bayesian Logic", where the answers to the previous questions guided the answer to the new question. No wonder I had this problem!
When Google came along, of course I went with them, and still do. They are still the #1 Search Engine, although some of their other services, like Google Maps, have become untrustworthy.
Yep. Speed of loading and no clutter. I switched the instant I saw Google's home page, because while I was doing fine with Altavista's search results I hated all the crap that took forever to load.
Since 2010, of course: DuckDuckGo. For similar reasons, really.
wg
RIP to my first porn search engine. I was in teenage-heaven when they introduced picture and video searching.
This is a little article that tells us everything we already know, after going through the clickbait. Thanks /.