Google's Weakness, AltaVista's Strength
Some people love the results they get at Google, others are often disappointed. To a large extent, both the pluses and the minuses derive from Google's ranking system, which (as the folks at Google explain www.google.com/technology) depends largely on the number links to a particular page and the relevance of the content on those linking pages to the content on the target page, and the quality of the pages doing the linking.
Thanks to that complex and brilliant system, over time, the best pages often rise to the top of search lists. But that takes time -- a lot of time.
It works great for old, established sites to which many other old, established sites have linked. (It works great for my site :-) www.samizdat.com ). But new sites, regardless of the quality of their content, get short shrift. It takes 2-3 months for the new pages to get into the Google index. Then it takes time -- perhaps years -- for other "important" sites to discover the new site and link to it; and then months more for the new versions of those pages with those new links to get into the Google index.
So if I'm looking for content that is likely to have been on the Internet for a year or more, Google is great. But if I'm looking for fresh content, I'll go elsewhere.
For me, for years "elsewhere" meant AltaVista -- for two reasons. AltaVista used to add new pages to its index, for free, within two days of submission, while other search engines typically took weeks or even months. That meant they had the freshest content. In addition, AltaVista provided you with a set of very precise commands that couldn't be matched anywhere else.
Over the last year, as AltaVista has struggled to become profitable, they have destroyed their beautiful free submission process, trying to force Web sites to pay for submission. Free submissions (which typically come from the kinds of content-rich sites that I'm interested in) now seem to take three months or more -- no better than the other search engines and often worse.
Fortunately, the powerful commands remain -- for instance, the ability to exclude as well as include terms in your query. AltaVista lets you use minus signs and plus signs to indicate what you really don't want and what you do want. And for some specialized searches the exclusion is essential.
For instance, say you want to know what Web pages outside of your own site have links to your pages. At Google, I can do a search for link:samizdat.com or get the same results by going to their "Advanced" search and using their "page specific search" to find pages that link to a particular page. But my results are then littered with pages from my own site -- information I don't need and don't want. At AltaVista, I can search for +link:samizdat.com -host:samizdat.com and get exactly what I want -- finding out who thinks enough of my pages to have linked to me without my having contacted them: a valuable list of well-wishers and potential partners.
Similarly, Google lets me restrict a search to a particular Web site. For instance, if I include in my query the term site:samizdat.com or in Advanced search under Domains I choose to restrict the search to that domain, Yes, I get results only from that site. But to use that command, I need to have additional query terms: site:samizdat.com alone generates no results.
At AltaVista, however, I can search for host:samizdat.com and get a complete list of all the pages at my site that are in the AltaVista index. Or I can search for url:samizdat.com/isyn and get a list of all the pages in that directory at my site are in the AltaVista index. Or I can search for url:samizdat.com/consult.html to see if that particular page is in the index.
In other words, AltaVista provides a higher level of precision and the ability to get information that is particularly valuable to people in charge of Web sites and Web-based marketing projects. And if they'd just fix their free submission process and provide the service they used to, they'd kick Google's ass for searches for current information.
P.S. -- The folks at Google are very proud that their system defies human tampering. In fact, what they've done is encouraged the development of bizarre business models structured to take advantage of their link-based ranking system. For instance, Webseed Publishing now has over 1000 sites, all with different domain names. These content-rich sites are each run by different dedicated individuals. (I'm one of them :-) In many cases, the content deserves high rankings for its quality. You might wonder why the umbrella business for all these sites bothers to maintain over a thousand different domain names, when it would be far simpler and cheaper to have them as directories under a single domain. But because the domains are different, the many thousands of links these sites have to one another all count toward the automated calculation of their popularity and quality at Google, giving them all a boost in the rankings and hence bringing Webseed more traffic and hence more revenue.
P.P.S. -- AltaVista appears to be making a comeback. Six years ago, when I was in the Internet Business Group at Digital and Digital owned AltaVista, about a third of the traffic to my Web site came by way of AltaVista. Whenever AltaVista had a glitch, I saw it immediately in my traffic stats. In fact, I sometimes was able to alert the engineers at AltaVista about problems before they had noticed them themselves. Over the years, due to increased competition from other search engines and also due to the business folks at AltaVista making bad decisions and jettisoning great capabilities/services (like 2-day free submissions, their affiliate program, LiveTopics, and newsgroup search), the number of people finding my pages by way of AltaVista plummeted. By January 2002, only 1% of my traffic was coming by way of AltaVista, despite the fact that as a long-standing fan and also as co-author of the book The AltaVista Search Revolution, I had lots of information about AltaVista at my site. I was actually getting twice as much traffic from the International Atomic Energy Agency (part of the UN), when I had no information at all related to atomic energy. But in recent weeks the traffic from AltaVista has climbed sharply. It now amounts to 6% of my total. I wish I knew why that was happening. In any case, I hope that trend continues.
I'm amazed, an entire article astroturfing AltaVista. Sadly, the author is a bit short-sighted, and doesn't realize how quickly stuff appears in Google's cache (often within weeks, less than a month), or that even if something accidentally ranked lower because of the number of links a given page receives, it still ends up in the first page or two anyway. *Sigh*
Did somebody declare a new holiday today without telling me? Is this now "Pick on Google Day"?
Skiers and Riders -- http://www.snowjournal.com
This could be an autonamous anomoly just with me, but when I am looking for a certan topic, the webpage I need comes up on the first page with Google. It is not just the best web site. Alot of them really suck, but they just seem to allways have that one obscure piiece of information I have been looking for.
I once shot a man who posted too many, "Imagine a beowulf cluster of these"
I'm usually satisified with the search results I get at google. I suppose I'd say that if I find it, I find it at google.
I haven't used Altavista much, except for babelfish, but after reading this I may have to give it a try sometime.
appended to the end of comments you post, 120 chars
long after banner ads had come to altavista, you could avoid them easily by using its text-only mode.
powerful commands and no ads... what a concept!
i only switched to google after altavista finally got rid of their text-only page.
What I don't find on one, I look for on the other, if I can't find what I want on either I change my critera. And so on until I either find what I want, something close to what I want or fall asleep trying.
crazy dynamite monkey
Whenever Google results have been disappointing, I hop over to AltaVista and search there.
For me, Google doesn't have to be the perfect search engine - it's already enough. I type in google.com and it loads damn near instantly. There's no annoying advertisements, and I can search in h4x0r or Sveedish Chef, bork bork bork.
If I can't find what I want on Google, fine, I'll use another engine. And what's wrong with that? We honestly can't have too many search engines (Well, business problems aside), because each one ends up with different ranking systems, different data pulled up from queries, etc.
.. are you suggesting that different goals require different tools, possibly made by different companies? Don't let the OS market know this, or we will kill the thriving flamebait OS war scene. :(
Actually, there is lots of good information you provide on the capabilities of search engines. I, for one, would love to see more "A is good for this, B is good for this", instead of simply grouping and competing A & B, suggesting that one can only use one.
IMHO, this is where (free) web services really rule - I can't buy 5 different cars for 5 different reasons I use cars, but in the case of these types of services, the cost of using and switching between these services is very next-to-nil. Hopefully, web services will start encouraging companies to share again, as Google and Altavista may very well demonstrate that sharing market segments with other players makes everyone happier in the long run.
"Old man yells at systemd"
Google returns search results based on the "rating" of a site. The rating is mostly based upon how many other sites in Google's database link to that site. While this scheme is more tamper-proof than the "greatest word match" that some search engines use, it isn't invincible.
It's quite easy to get your site rated high: Create a hundred free web sites on geocities and post a page full of nothing but links to the site you want to pump up. You'll get rated "10/10" in no time.
Now it took months to get into DMOZ, but we did. Yahoo - still hasn't accepted us into our proepr catagory even after 2 or 3 tries over a year and a half.
I think Google could benefit by adding some more advanced filtering command slike Altavista has - I agree they are nice. But the bottom line is, for obscure sites, once you get in Google, look out. Months later we finally got into the other mainstream search indexes (we submitted to them all at the same time) and in teh end Google is THE place for referrals. By orders of magnitude. YMMV, but it seems the other search indexes blew it when tehy killed free submits since folks knwo that they will only return paid sites (plus rank skewring, for $$$, etc)
Only time will tell, but I use Google daily and am happy with the results and performance - no other search engine comes close IMHO
Top Most Bizarre/Disturbing Error Messages
Altavista used to be my search engine of choice, but I gradually abandoned it around 3-4 years ago - shortly after it was spun off from DEC I noticed a general decline in quality.
./ today is that they are being done in an organized fashion by intelligent (and somewhat witty) people. I agree that there is significant potential for Google-bombing to be exploited for commercial gain in the coming years. But I don't think it can nearly as bad as some of the awful stuff that's done with meta tags. I'm sticking with Google (for now) because it is still lightning fast and doesn't put a bunch of crap up on my screen.
The one thing I've noticed about these "flaws" in Google "exposed" on
i wish more sites would develop tool bars similar to google... it is extremely convienient.
on all my windows boxes it is one of the first things i install.
google is probably the best search tool right now, and they make using it a breeze. altavista used to be the best search tool, but they made it harder and harder to use, and then search tool lost its top spot. totally different situation. if google looses its top spot in the search tool field, i'll still use it for its ease of use.
MARIJUANA, SHROOMS, X: ONLINE?! - E
Perhaps AltaVista is indeed better (or used to be, as the author points out) at indexing new content, but I'd never know, as I have been using Google exclusively almost since its public debut. However, I think that this point will become less and less important.
Yes, it's true that Google's algorithm prevents new content from being ranked high, because no one has linked to it yet necessarily, but that's by design - it is indeed at that point unproven in terms of quality. However, the spidering process can use improvement so that when many many people link to this new site just a few days later, it now ranks higher.
Google specifically mentions (in previous interviews I read with employees) that they're always working on updating the speed, as well as the precision. The longterm goal is to significantly decrease the amount of time it takes to respider everything, and therefore make the info more relevant faster. I trust that they will continue to improve, and eventually this differentiation between "Altavista is better for new stuff, Google for old" will go away completely.
Please subscribe to see the more insightful version of th
Boolean mumbo-jumbo? That's the best PART of AltaVista. Google limits querys to 10 words? That stinks! Google is great for simple querys about common subjects. AltaVista's boolean query is great for finding that site whose link you can't remember but you remember some of the words that were close together. AltaVista's boolean query is great for finding information on little-known subjects that you can pretty well guess what keywords will be near each other. I used to use AltaVista's boolean query exclusively. Now, I find it's best to try both AltaVista and Google. Each find content the other won't.
Judging by the eloquence of your post, I'm going to be honest when I say that Sesame Street is probably too complex for you.
I dunno, this may be off topic, but according to this link, Google does not accept any ads for companies that have websites or products in any way affilliated with firearms or knives.
This comes as a dissappointment for someone who regularly visits Geeks With Guns.
Say it ain't so...
Of course you can find things with search engines now. Google's "trick" of counting links helps a little bit for a particular class of query, which is when you know the name of an organization and you want to find its site...it works well because more people will link to the site as opposed to other sites that discuss it. But as I have written elsewhere, if AltaVista is 99% lame, then maybe Google is only 97% lame...which is three times better, but still terrible if you take a step back.
Now Google is doing a lot of good things outside from its basic search engine, which should be applauded. The caches, saving old Usenet posts, the image and catalog searches, etc. are all good things -- but they don't affect its basic ability to search well.
Further karma ho' expounding can be found right here.
- adam
After reading this article, tons of /.'ers are now hitting altavista and doing a
:) (myself included) :)
+link:mysite.com -host:mysite.com
to see how many people have linked to them
"Just tell him ya did it! That's what he wants to hear anyway..."
Sure I go to AltaVista and others after hitting a brick wall with Google but that is very rare for me. Perhaps the issue is when I do searches I am looking for info on technical issues usually revolving around compiling this or that GNU package or Service.
_ __
No tool is the best tool for every purpose and perhaps many people should give other search engines a try and see the strengths.
However, I don't really see that point of an article that is simply a Hoorah for one service over another with differing models of profit and aims.
The author had simply pointed out that AltaVista as opposed to other search engines has advanced searching abilities including the ability to exclude terms. No, it has to be an AltaVista over Google article.
Different tools for different times and different uses.
_______________________________________________
ACK
So basically you are saying that you are simple.
If it wasn't for the "I'm feeling lucky" button then some day's I'd have no luck at all.
I've hit Karma 50 and gotten a Score:5, Troll... I win!
I don't know about you, but this guy's article didn't even seem coherent. He seemed to jump all around different points to come to some conclusion. I'm a little disappointed in O'Reilly's publishing standards. I'm accustomed to seeing good content at sites like onjava.com, but not this rubbish.
(Warning: moderation bitching follows.)
What the fuck? When I tuned in to this thread, the parent to this post was marked 4: Insightful, because the author liked the name and thought it looked pretty.
Don't get me wrong: those are two of the same reasons I use Google. But seriously, think of better uses for your mod points, people.
Google has exclusions, site and link queries too.
See http://www.google.com/help/refinesearch.html
5 years ago Altavista was my search engine of choice. Both for my own searches and as the number 1 engine for getting my clients websites ranked in.
Back then you could submit to Altavista, and have a good ranking within a week.
Over time, the relevance of the returned results dropped dramatically and the time to get a site listed plummetted, quite often taking longer than Yahoo!
Then Google came along and I haven't looked back since. I've consistently been able to find the results I'm after thanks to the way Google indexes sites.
I'm now able to almost guarantee clients that their sites, whether old sites that are being revamped or new sites that are freshly hatched, will be ranked well within Google and also ranked within a short period of time. I think the longest I've ever had to wait for a site to be fully indexed is three months.
Plus the indexing of database generated pages and PDF documents by Google is a life saver. Without this feature a lot of the content I develop would be lost.
I think it will take a miracle to get Altavista back on track. I wish it was as great as it once was, but for now it's relegated to one of the less important engines both from a searching and a submitting point of view.
... because it is so good.
...
I'm a librarian. It is the most difficult time in history to do library research. There are hundreds of overlapping commercial databases out there, each with their own coverage, interface, and search engines.
Students used to locating information with Google are appalled at the steps it takes to locate a scholarly journal. You need to browse a list of subject databases, search them, then locate a printed copy of the journal via our catalog (a growing but still small percent of journals are available online).
Someday searching the various literary databases may be as easy as Google, but in the meantime there are drastic capitalist impediments to making it easy to do library research.
... so ask a Librarian if you ever need help
I've used google exclusively for the past 2 years. Never, not once, have I had to go to another search engine. 99% of the time, what I search for with google, I will find what I am looking for within the first page (10 results), very very often in the first 2 or 3 results.
I have no need for altavista. I don't care if yo use altavista. Google works just fine for me. If altavista works just fine for you, so be it. Use it. No one cares.
All this speculation on the future of google recently is ludicrous. "google bombing" poses no threat. The people who work there are extremely talented. If it becomes a problem, they will undoubtedly fix it.
Google is the most popular search engine in the world, and with good reason. They are not going to give that up.
So will everyone please just sit down, shut up, and stop bickering. Use whatever tool works best for you.
Joseph?
Are there any hard-core Free Software advocates who are hard-core enough to boycott Google because they don't release the source to their search engine?
After all, isn't it your right to view the source code to any application you use?
And if your response is, "well, Google isn't an application, it's a service delivered over the web". Well then, does the freedom of an application depend on whether the processor is accessible to you or not?
Sometimes it's best to just let stupid people be stupid.
Could google extract Whois info and IP Address ownership info to determine if linked sites are related? I don't know about the IP Address info but the Whois info could probably be extracted by a spider. Eliminating internally linked sites would be a way if revising the rankings to better reflect their value....
Just a thought
J:)
Oh well, no point in steering now.
Here's a very relavant /. example:
/. posted that netscape 6 is supposedly spyware. one poster replied that /. article had become the #1 result for those search words. It has since fallen back to the original results, but it shows that google can be tampered with using lots of hits.
/. today can argue all they want, but IMO Google's results are qualitatively more relavent than altavista. So if this is going to be problem, we haven't seen it yet.
The other day
He was going to screw up the spyware system by searching google for "CROSSDRESSING MONKEY PORNO" a bunch. I replied with a physical link for search google for this. Sometime later an anonymous coward posted that the
But these posts on
-Sean
Because the name is better, and because it's clean? But no mention of it returning the most relevant results.
Are you telling me that if google switched names and interfaces with a terrible search engine (like, say, excite or lycos), you would start using that?
You, sir, are stupid.
Joseph?
Go here.
Nope, no sig
Maybe if Google had indexed the article faster we would've been able to avoid this repeat!
...but you can also make Google pop up when you click the "Search" button in IE. This makes Google searching even easier since you can have the search window open on the left and hit your search results on the right. (Yay for "tabbed browsing", IE style.)
Also, the coolest feature of the Google toolbar IMHO is not even the instant search, but the "Highlight" button. Gone forever is hitting Ctrl-F and typing in a search term. Just search for something in Google, go to a result, and hit "Highlight" -- the search terms are instantly highlighted. This saves me an incredible amount of time when I'm searching through, say, mailing list archives.
The Google toolbar is one of the biggest reasons I use IE. (Well, that and the fact that page developers, including myself, follow the rule of thumb "Design so that it looks good in IE and works in Netscape.") But anyway, I digress. If you're using IE, check out toolbar.google.com and download it.
Simpli - Your source for San Jose dedicated servers and colocation!
I would like to see a program and specification that dictates a formal data format for information in a mathematical schema. This could be the foundation for a universal translator and certainly a decent means of doing a search engine.
The idea is pretty simplistic, although the implementation is complex.
Any communication takes place by translating an idea into a sensory input form.
Examples: Sight (written language, video, sign-language), Touch (brail, texture), Sound (conversation, music), Taste (Like water for chocolate?), Smell (pheromones?).
Obviously, not all of these mediums are easy to work with, but we can certainly start with written language.
All languages use the same basic principle: convey relevant information about a central subject. How they go about doing it is different even between versions of the same language (British English vs. American English).
If we described an objective hierarchy of physical objects described by pure mathematics and implanted them into a central, world-wide database then open-source parsers for each language could handle the task of translating any written text, in any supported language, into this common language. If correctly implemented a search engine could enter into a short dialogue with a person performing a search and then return information very specifically relevant to what the user was searching for.
Example dialogue:
[user]I want information on Mary Jane Carpenter.
[google]There is a very famous person by that name. Her official website is [here]. [Here] is a list of fansites and [here] are some other sites which discuss her. That name is mentioned in [these] sites, but it is unclear if they are talking about the same person. [Here] is a list of other people with that name.
[user]The person I am looking for isn't famous.
[google]Then you are probably looking for one of [these] people.
[user] Are any of those people from St. Lewis?
[google] [Here] is a sight dedicated to a Mary Jane Carpenter from St. Lewis.
This may sound like an impossible streatch but it really isn't. The famous Mary Jane Carpenter has a unique id on her object and many thousands of attributes which uniquely identify it from any other Mary Jane Carpenters. Ambiguity is dictated by the same rules that govern conversation: context.
If I have a page that contains no content other than Mary Jane Carpenter sucks! then a simple fuzzy logic routine should be able to infer that the Mary Jane Carpenter I am talking about is probably the famous one. Other clues could be gained from other parts of my site or other documents which have me as a source.
I realize that I am talking about a HUGE database, but it sure would be handy...
My $0.02 will always be worth more than your â0.02, so
And that is that I sometimes NEED to use the near keyword in altavista to get a complex search to work correctly. If google added the near keyword I would get rid of my quicklink bar entry for altavista's advanced search.
p.s.
the advanced search page is all text, not even a banner ad so it's almost faster than google to load.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Similarly, Google lets me restrict a search to a particular Web site. For instance, if I include in my query the term site:samizdat.com or in Advanced search under Domains I choose to restrict the search to that domain, Yes, I get results only from that site. But to use that command, I need to have additional query terms: site:samizdat.com alone generates no results.
You can use the following workaround to do a site: search on google without any keywords. Just do "site:yoursite.com -stuff" where stuff is gibberish (bang on the keyboard a bit). For example, this search shows 1,290 pages from samizsat.com. On the other hand, an altavista search for that site shows 1,090 hits for pages on that site.
I don't know why Google doesn't allow simultaneous "site:" and "link:" searching, as that is something many users would like to do.
In my experience, Google websearch is best for specific web searches... Dmoz.org directory is best for broad Directory style searches, where you know the broad category that your search fits into, and you wish to find several sites that have this topic in common. (Yahoo, prior to advertisement bombardments held first place in this category) Google websearch is also among the best for file searches... try including "index of" (with quotes) in a search for a specific file.. (example: "index of" passwords.doc for interesting results) Google websearch is best for up to date news story searches... (try including "news" in the search query.) Limewire is best for music and video searches, both general and specific. Overall, Google is best for nearly all searches, in my opinion... and is usually more effective than using search boxes on specific websites...
Holy shit! That guy got to plug his website NINE TIMES in an article. I can't imagine how much he had to pay for exposure like that. Next we'll be seeing ads like this:
Features: ICMP echo requests are 37337!
Posted by CmdrTaco on 03:35 PM -- Wednesday March 13 2002
from the leet-nettools-impress-chicks dept.
Hey, Slashdotters! I just found this 37337 tool called pign. You can use it to send an ICMP echo request to IBM.COM. You just type "ping ibm.com"...
And it pings IBM.COM! Check it out:
>ping ibm.com
Pinging www.ibm.com [129.42.17.99] with 32 bytes of data:
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
Reply from ibm.com : bytes 32 time 80ms TTL=128
(Read more...)
Seriously -- I'm sure more curious people clicked over to samizdat.com than clicked on any of the other ads on the screen (thinkgeek and ibm for me). Maybe there is something to text ads on community sites (ala kuro5hin)
sm
I could say more, but then I'd have to kill you.
-- Nobody should take away Microsoft's freedom to innovate, particularly since they haven't used it yet
A lot of people ignore the single biggest innovation for quality results that Google did: default 'and' states for keywords. I worked at AltaVista for a year and tried to convince people that it was the way to go but no one would listen. When combined with their ranking technology [which is impressive but not infallible] it yields the best results.
fun fact: I also tried to get a proposal started for AltaVista to acquire Google in the summer of '99. Aren't you glad I failed?
Now I just don't see how AltaVista can give anyone more current results if their bots are featherbedding.
___
"with their freedom lost all virtue lose" - Milton
Damn straight. I couldn't say it better myself. I only said what I think everytime I goto google, I'm not looking for karma.
It's called XML
I react to only the most volatile substances.
XML requires a DTD which isn't mathematical in nature and XML is inherantly bound to the english language.
What I am talking about is a mathematical DTD for every type of object in language. A truely universal language.
My $0.02 will always be worth more than your â0.02, so
The mailto link "Richard Seltzer" is woefully malformed.
"mailto:seltzer@samizdat.com or http://www.samizdat"
Please fix it.
When it is fixed, please dont fuck up my karma by marking this as redundant.
I would consider subscibing if it would gaurantee proper links and spellchecking.
Mr. Seltzer thinks that the shortcomings of Google are that it doesn't allow for more "powerful" or "expressive" queries like "link:samizdat.com" or "url:samizdat.com/isyn". The question is: how many people really use such queries? How many times have you (also not the typical user, but lets assume so) wanted to see who links to a particular site? Typically, someone who knows that site well or has already found it will look for such information. As far as I'm concerned, Google does a tremendous job of finding informative sites for me, quickly. Usually when I search, I have a keyword or two in mind, and start with that. Within a couple of clicks (or just 1, if "I'm feeling lucky") I'm on my way. Probably Mr. Seltzer is biased because he is ex-Digital or something, and was pleasantly surprised at the uptick in Altavista referrals to his sites.
When google doesn't work for me, I go to Teoma.
What would you do if your best friend cuddles up with your biggest enemy?
It's alive.
How did you know that was my favorite show? Can you read my mind as well?
Alas, they can no longer be reached. Their search engine is seriously broken. It picks on a site and hits it hard and repeatedly.
They will make 100,000 requests on a site with only 20,000 static items within 24 hours. On our co-operative co-loacted server, we host around 80 sites, many of which are content rich. When Alta Vista choose to visit just one of them, our total bandwidth usage jumps by an order of magnitude.
We have been unable to get past their front line support, I am not prepared to maintain robots.txt on all of our member's sites just to control their broken robot, so we had no alternative but to block their entire subnet at our firewall.
If anyone has evidence that the AV robot is fixed, I'd be happy to let them back in.
isnt that what wask jeeves was all about ? type a question in and it comes up with a list of relevant results. problem is that pages are not easy to interpret. what happens if some of the words are GIFs or the page is a troll page from a porn site or its a /. comment listing with a huge range of topics? its a complex problem to solve.
computer systems dont do language very well. they cant think or reason without formalised instruction segments which have been predefined for them. sad state of affairs but thats the way it is right now.
How much does it cost to have slashdot run a story?
Read the excellent information Google has provided about how the engine works, and use the engine with its inner-workings in mind. When you meet the machine half-way instead of trying to dumb it down for the user, you'll get a hell of a lot more done.
In Google's case, taking half a minute to think about what you're looking for, then tossing in a few related bits of jargon or other words relevant to the context you're after does amazing things. With a little forethought, you can almost always find what you're after and be down to a page with nothing but relevant links with just an extra word or two added as filters.
According to http://www.leekillough.com/robots.html - iaea.org is commonly used as a fake referrer by spam harvesters.
My amazing wife - Artist, Author, Philosopher - Laurie M
After having used Altavista, I found the search-results giving me 2 or 3 pop-unders, how goddamn annoying is that? Also, the results of searches on Google are just simply more accurate and detailed.
Fuck Altavista.
Everyone seems to be missing the point that I was left thinkging about when I read this article.
The government is trying to do something that has been proven to fail due mostly to the amount of information exceeding the ability of the technique designed to filter it. On top of this, they are proposing it be done with a vastly larger base of information.
How does the government propose it read every e-mail, website, IM (IRC, ICQ, whatever your flavor)and file transfer, pick out the child pornographers, "terrorists", and whatever other evil dooers are the pick of the year, and do so with any accuracy?
The article very clearly paints a picture that indicates the government will either have to develop an AI, or approve human cloning so they can staff an extra 10 billion CIA secretaries to read every word the rest of the world types. I don't even want to try and think of the funding needs a project like this would need to begin being useful in any way.
They don't try to record each phone call that every person makes, why should their e-mail be any different?
Well, let's see...
Ahh, you mean slashdot.jp?
Which happens to be registered to VA Linux Systems Japan, whereas slashdot.org is registered to OSDN, who happens to own VA Linux Systems Japan?
You mean that link? You mean Slashdot Japan?
How ridiculous is that?
The author is basing this on outdated information. Google knows to crawl sites that change frequently more often than those that don't. Here is a concrete example:
I posted Two Kinds of Order by John Marks on March 11th, and mentioned this to some colleagues who might be interested. I linked to it from a Weblog or two,and Doc Searls did too.
Today it is number 1 on a search for 'two kinds of order' out of over 2 million, and a search for John Marks brings the page up in 5th position, despite there being lots of other John Marks's on the net.
Thats what I call fast (and relevant)
After reading a slashdot article a couple weeks ago (Tiniest R/C cars), I started looking for more info on Tomy's "Bit Char-G" toys. Searching for "bit char" on google gives me tons of relevant results, and then some stuff on variable data types, etc. Altavista gives me no relevant results. "bit char tomica" gives me mostly garbage, and "bit char tomy" finally gives me 1-2 relevant pages.
All hail Google. And yes, the toolbar rocks. ^__^
I've built up so much character I have an alter-ego
As it was over-explained, Google ranks pages according to how many links elsewhere points to that page.
:D
Remember this post from Slashdot ? It is about Macromedia wanting Flash to be used to design the entirety of a site.
So, I don't suppose Google can fetch the URLS inside a Flash file (correct me If I'm wrong), so, if Macromedia's dream become true, how would Google cope with it?
BTW, how any search engine would deal with such a catastrophe?
Cheers.
Crack-smoking moderators note: The third link, a.k.a "some japanese site" is slashdot.jp. And if you want "More results from slashdot.org", you click on the damn link. Is it good that a search engine lists three million links to the same domain by default?
Google treats new sites as having low utlity, but that doesn't mean that Google is out of luck on new content. Google knows that certain web sites, especially web logs (like Slashdot itself) and news sites are updated very frequently, and re-indexes them more often. Thus, if you're interested in current events, Google will tend to return results on current events from "reputable" sites. (I've been unable to find a reliable reference for this; you can check out this one from DaveNet.)
This doesn't help you out if you're trying to get your new business noticed, which is something site managers care about desperately. It also doesn't help you find the new business that appeared two weeks ago that might be able to help with your problem. Sadly, it's generally the same business owners who care about that case, too, since in general somebody has already beaten you to the punch with their web site and the customer gets the problem solved, without you.
No, it's not perfect, but it solves the problems of web searchers very, very often. It may be less good for web site owners, but compared to the searchers they are in the minority.
The other day I played with the Google advertising generator, just to see how much an ad would cost and how it worked, not with any intention of advertising. (Check it out, it's fun.) Anyway, I pretended to be advertising a local special-interest club where I am a member. By the time I had picked the advertising keywords that gave me the ad traffic that I wanted, those very same words typed into the search box brought up the club's web site as the third link on page one.
I would advertise why, exactly?
.. search engine for a very long time. I told and advised averybody to use it as the first page to pop up. I loved the search and I was fairly handy in finding stuff with it fast. And I considered altavista as the most complete search engines of it's generation. Then along came google, and presto, as the indexed number of pages grew steadily, I came to love it's way of detecting good websites on a particular topic.
Ofcourse, finding good stuff easy is easy to fall in love with, but it seems I have forgotten about the secret new treasures hidden away in the dark digitlands, that are roamed primarily by the old giant. Thanks for all this insight, really.. I might reconsider my search-engine of choice, and it is good to know I was right all along
With great power comes great electricity bills.
Comment removed based on user account deletion
and the fifth link on google is for. Quite Slashdot.org today!
autopr0n is like, down and stuff.
The problem with popularity-based systems such as Google and Freenet is that popular != good. For years, the most popular television show in America was Married with Children, a program of such ungodly awful lowest-common-denominator content that it frankly horrifies me to think that alien civilizations may someday receive those television signals and decide not to contact us. The books on the bestsellers lists are often the lowest grade of junk -- Danielle Steele and Sidney Sheldon, anyone? -- and, as is often lamented by Slashdot stories and posters, the most popular songs on the radio are also of dubious value. The same applies, mutatis mutandis, to the web.
When rating systems -- including Slashdot's -- increase the visibility of what is already popular, they only serve to reinforce the status quo. What's "cool" stays cool, not even necessarily because the audience is that monotonically unimaginative, but because new and different things are filtered out. If, for example, Microsoft actually managed to produce a solid, reliable, inexpensive, and reasonably licensed piece of software, this is about the last place you'd hear about it. With Google's link-popularity system, websites presenting unpopular or dissenting views are much, much harder to find than knee-jerk me-too reactionary sites. This is no small issue, considering that the benefits of a free society are built, ultimately, upon dissent -- and the ability to spread dissent. This is no less true when the dissent is artistic than when it is political.
Almost a decade ago, I used to laugh at the efforts of old-media companies to transform the web into another form of television. It's a lot harder to laugh now, as the chief gateways to the net for the vast majority of the population use sophisticated software to dumb down the net. Sure, the "other" stuff is still out there, but if you can't find it, it may as well not be.
Proud member of the Weirdo-American community.
Uh....
"AltaVista lets you use minus signs and plus signs to indicate what you really don't want and what you do want. And for some specialized searches the exclusion is essential"
Google does the same thing. It's not like Altavista has this special feature that Google doesn't. In fact I've found the main reason many people are not happy with Google is due to the fact that they don't use it enough to understand the best way to write a query.
My wife was looking for some information and since she's heard me say that Google is the best engine to use she tried it but couldn't find the information. When I got home I did the same thing and found the information on the first query. It's all in the fingers baby... but seriously like any tool it takes time before you learn to use it the way it's meant to be used.
The way this article comes across is the same as me saying Borland Turbo C++ is better than MS Visual C++ because it has the following features
Include directives
Printf
The ability to compile applications
when in fact MS Visual C++ does the same thing but just might be done a little different
Did you ever read anything????
They said(not exact quote) "We linked to this three days ago" in the story!
It's a rebuttle duh!
The point of the original article was hidden in the last few paragraphs. He was making a point about various government's attempts at universal surveilance, i.e. attempting to log all packet traffic, etc...
His discussion of web search techniques was to illustrate the nature of the problem these would be omnisicents face. Because the data they collect does not have the richly linked nature of web content, all that these governments government entities will be left with is mountains of meaningless data. They will be stuck using AltaVista like searching and matching techniques.
And we all know how useful Altavista is these days.
-josh
So, I wonder if Flash might implode on the basis of their success in the ad market coupled with all the problems of using Flash to generate your pages, plus the simple fact that almost no Flash site actually delivers anything that's still interesting after the first visit so who'd miss it?
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
So if I'm looking for content that is likely to have been on the Internet for a year or more, Google is great. But if I'm looking for fresh content, I'll go elsewhere.
How do you explain this, then? It's a standard Google search for the terms 'Andrea', 'Yates' and 'verdict'. The top link is hardly a year old, but rather an extremely recent and relevant link to CNN's site about the trail verdict.
--
A scribe trying to do web based research, discovers all the hits relayed to its browser
are content lacking sites created and owned by internic.net.
Its odd when you notice it for the first time, its anoying the second time, and its BOGUS the third time!
Its almost as exciting as doing web based legal research on a justice department public server.
There is a lighter side however, with all the lameness one gets motivated to actually visit the library for a change.
Any other jokers seen such?
---> Ich bin ich das Ex ---
Just to toss into the mix... I usually use sites like google or altavista when I'm planning to spend a lot of time wandering around the web getting different views on a topic. When it's something where I want to get to solid information quickly though, I usually end up somewhere like ask.com or about.com. Having people manually organize the entire internet is hopeless (as Yahoo has shown). Still, I do like using sites that take the trouble to set aside a few quality sources where there'll be more accessable, for the when I don't have the time (or enough knowledge!) to solidly judge the quality of the information on a topic for myself.
I created my own homepage about 2 months ago. I just searched for my name on Google. The page showed up as the first hit. Then I searched for my name on AltaVista, and I scanned through several pages of hits without seeing my homepage.
Major Astroturfing here... This guy makes his living because of AltaVista, so of course he's going to plug it.
His links at google
I don't mind him plugging AV, I just wish he'd disclosed his relationship a little more clearly, a little more earlier in his "article".
LongTail SSH Brute Force analysis tool is here!
Gotta love Opera for the url linking concept..."g whatever" searches Google for whatever, "a whatever" does altavista, alongwith a host of others and you can easily set up more (fr searches Freshmeat for me)...between that, the ability to kill pop-ups, and the mouse click navigation, Opera is one bad browser.
check out lsdie : unformed.hypermar.net
(i wrote it)
it's for ie, it installs a little bar like the IE Search bar but a lot smaller, and no ads, and has support for selecting among about 20 different search engines, easily selectable in a convenient interface....
Sounds good! Why don't you enter the Google programming contest? This idea would be interesting to implement, even if in testbed conditions. Incidentally, Teoma (or Vivisimo -- forget which) already offers something like this, though not on the scale you outline (they have "topic clusters").
Many of the features described above as 'missing' from Google are in fact there, often in exactly the same form they're described in. For those (including the submitter) who can't be bothered to explore new tools, here's a quick run-down...
Google in fact lets you do exactly the same thing. There's an example in rebuttal to another point later on, so I'll leave it at that.
Strangely, Google allows an eerily similar query: try +link:samizdat.com -site:samizdat.com (note the plus and minus I promised earlier...)
Okay, admittedly this one's a little different, but it is possible. Try site:samizdat.com +samizdat.com
So Google would seem to do everything the submitter is looking for as well as having a more efficient ranking system. The recent 'abuses' of the Pagerank system are nothing compared to the flood of porn sites that once greated anyone who searched with a few common keywords.
I heard all the raves about tabbed browsing and I downloaded a copy of Mozilla 0.9 to try it out. It's the exact same thing as the Search/Favorites/History panel in IE! I don't get how this is "revolutionary". (Please, if you care to enlighten me on how it is different, do so. I'd welcome a better explanation.)
Check out this screenshot that I did for my friend, and tell me how this is different from tabbed browsing in Mozilla.
Simpli - Your source for San Jose dedicated servers and colocation!
Comment removed based on user account deletion
What I find amazing is how many people are totally clueless in defense of their chosen search engine.
All these +/- things this guy is talking about can be done at Google too. Yes, even with the site: and link: specifications.
Do a little research next time.
Samizdat is a russian word and in Sumizdat people never published anything for profit as your site is trying to do. No wonder that it doesn't attract any links except through AltaVista that is fitted to your needs. Bad for AltaVista... I vote for Google.
>
"I was actually getting twice as much traffic from the International Atomic Energy Agency (part of the UN), when I had no information at all related to atomic energy."
What the author doesn't know is that the IAEA's URL (iaea.org) is used as a phony referrer string by a number of site-crawling robots that look for email addresses for spam. So he probably wasn't getting a single real link from the IAEA and should have given AltaVista more credit all along...
It's Slashdot's evil twin... SlashNOT
Aha! Okay, I knew I was missing something. I used to use that kind of stuff in Opera. Yeah, it's cool, but Windows XP does that with IE as well (well, in a sense: it groups all IE windows together.) I could see how this would be beneficial, though. I'll keep it in mind when Moz 1.0 comes out. :)
Thanks again.
Simpli - Your source for San Jose dedicated servers and colocation!
Nice one!
Do you believe in death after life?
I couldn't agree more on the need for more than one search engine. We all have our own particular favourites, be it football team, car or brand of tea. The same can also be said of search engines, and for many people at the moment it seems to be Google. On the courses that I run I usually ask the question 'what's your favourite search engine?' and the two that are most often mentioned are Google and Yahoo!
At the Online Conference in London recently I attended several talks on search engines and workshops regarding them, and AltaVista was barely mentioned. However, even just a couple of years ago AltaVista was highly regarded as perhaps the best of the free text search engines; it had a large database which was regularly updated, it was also constantly updating and adding new features and its search syntax was very flexible. Yet now it's being seen as an also ran and on at least one newsgroup that I take (alt.internet.search-engines) the majority of web authors say that they hardly pay any attention to it.
I chose to look at AlltheWeb, otherwise known as 'FAST' rather than Google, if for no other reason than many people are already aware of what Google can offer, and I thought it would be more interesting to concentrate on a slightly lesser known search engine, but one which is increasingly being mentioned these days.
AlltheWeb is owned by Fast Search & Transfer ASA (FAST), a Norwegian company. FAST claims that it has over 625,000,000 web pages indexed, which is certainly an acceptable size and is comparable to AltaVista and Northern Light, but still lagging some way behind Google. However, it is making considerable claims for both the freshness of it's data - it claims a rate of between 9-12 days which puts it way out in front of the other major engines - and for its news stories, claiming in a press release "Indexing up to 800 news stories per minute and real-time indexing of news stories from over three thousand online sources" AllTheWeb Upgrade Announcement
The main search page is very clear and uncluttered, consisting of a single screen, which makes a change from the confusing approach taken by AltaVista, while providing more immediate functionality than Google. The user has several immediate options; a choice of language to search in (almost 50 different languages), the search box itself to enter terms, a tick box to tell the engine to search for the exact phrase, and options to search for web pages, new, pictures, videos, MP3 files and FTP files. An important fact to note is that with AllTheWeb you can search directly in news- something that can't be done with Google.
AllTheWeb has one of the most customisable interfaces I've seen in a very long time.
The search results page more than make up for some of the other less exciting features. The screen is clear and uncluttered, with none of those 'featured sites' that are becoming increasingly common with other search engines, such as AltaVista. At the very top of the list of results are a number of 'Beta Fast Topics' which are a dozen or so specific topics related to the results retrieved - rather like the Northern Light customised search folders, and which provide the same function - a quick way of narrowing a search down to a smaller tightly focussed group of pages. AllTheWeb provides a brief summary of the page returned, the size, and if appropriate, the opportunity of retrieving more hits from that specific site, using the same approach that AltaVista uses. Another nice feature is that even if the user does a search for web pages, a small box pops up on the right hand side giving the results of a multimedia search, with an indication of the number of hits found for images and video.
In conclusion therefore, AllTheWeb combines many of the best features of other search engines, with few of their disadvantages. That combined with the freshness of its data does make it look a very attractive alternative to Google, and worryingly for AltaVista, a very viable replacement for their own offering. I suspect that in the coming few months I shall be paying rather more visits to AllTheWeb, and rather less to AltaVista.
See here for the complete article written on this subject including actual comparisons Goodbye Altavista, hallo AllTheWeb
Every time I hear somebody rant about something like Mr Doctorow does about AltaVista, I think "this guy probably has no clue". In 90% of the cases I'm right.
AltaVista was a great search engine. Google is better, but it doesn't give him or anyone right to put it down like this. It's not like Google was created out of the void. It was creating on the ground prepared by AltaVista and it had to perform according to high standards set by it.
I passed the Turing test.
I tutor newbies about basic computer use (including web and email), first thing I show them is yahoo, then I show them google. The stay with google once I'm done explaining all its features.
:P).
For newbies, its hard to use a search engine that is hardly a search engine (could a newbie really tell that yahoo has a search box just from the front page? "I THUT YAHO WAS FOR XMUS CRDS???"). I usally find what I'm looking for in the first page of results, worst case, its on the second page.
I think we're headed for "search engine wars" with people like this. Google is great for the power user and newbies alike. It also proves you can advertise without pushing crap into people's faces (thats it, piss them off with annoying ads so the go and use the compition insted, THATS; GUUD BUSYNESS NO?!?!
Hopefully, Macromedias dream *won't* come true simply because Google refuse to index flash files. That way, flash will continue to be restricted to mostly worthless junk, just like today. Noone wants to publish their precious content in a way to renders it invisible to Google
On the other hand, MM will probably be more than happy to supply Google with the tools they need to grab URLs and content from their (proprietary format?) files
1. photopgraphy, google requires a minimum of text to index a site, I had a site of all my own photo work and google wouldn't touch it.
2. I can speak first hand that the linking rating system does not work. a site I worked on http://www.williamkowalski.com/does not show up even when typing in a search for william kowalski and it's the authors official site!
Someone even went to the trouble of translating the interface into a language with probably fewer native speakers than Klingon: Irish! Brilliant! My interface language of choice már is feidir liom. Now if only that someone would go the extra mile and do the About, Help etc pages as Gaeilge...
The conclusion of your syllogism, I said lightly, is fallacious, being based on licensed premises
I just did a quick search on Google for information on The Fightin' Whities (a basketball team in Colorado). The first link that came up is from a newspaper article on the team published two days ago. I can't think of any other search engine that would index something that quickly.
God invented whiskey so the Irish would not rule the world.
I remember when, due to Google's linking technology, those searching google for "dumb motherfucker" came up with George W. Bush's website as the #1 link due to the vast number of other websites linking to George W. Bush with those words. The Bush people complained and threatened to sue because google's technology revealed that thousands of website authors felt the president was a "dumb motherfucker." The Google company caved in and adjusted the search logic for www.georgewbush.com to exclude any association his website has with that phrase.
Google isn't cool anymore. When your mom starts boasting about all the googling she's being doing (with the tool bar even!) it's time to move on.
Alta Vista has a clunky retro vibe which just hits the spot.
Life's good in a paisley shirt.
have they finally provided this option? sometimes google is too tiresome to search. of course, you can always string a bunch of string1 OR string2 OR ... :)
Cory Doctrow's article, BTW, is riddled with errors and false assumptions about the ranking algorithms used by both Google and AltaVista, which have evolved radically since their introduction. AltaVista, for example, ceased to be a purely keyword-based engine even before it hit the street. And, now that Google actually has a substantially sized audience, they're having to deal with a lot more spam attacks than ever before.
The real battle to come isn't going to be entirely technology based, however. It's going to be a duel between CMGI and Kleiner-Perkins to see who can figure out how to make a profitable (with good ROIC) search engine business before AltaVista and Google drain their coffers. In that aspect, Google is woefully behind.
-=paulf
...-.-
I agree that being able to type 'g search text' in the location bar is a useful thing. You can tweak Mozilla to do this, with about ten lines of javascript (in the navigator.js file in chrome/comm.jar). You can also get it to search e2 for you when you type 'e search text', or whatever. A particularly useful and fast tweak is to set it up so any text entered in the location bar which features a space character is automatically treated as search text, and searched for at Google.
Mozilla also lets you search for highlighted text via the context menu.
Three cheers for highly configurable software!
M-x praise-emacs
[google]There is a very famous person by that name. Her official website is [here]. [Here] is a list of fansites and [here] are some other sites which discuss her. That name is mentioned in [these] sites, but it is unclear if they are talking about the same person. [Here] is a list of other people with that name.
[user]The person I am looking for isn't famous.
[google]Then you are probably looking for one of [these] people.
Actually, this isn't anywhere near as hard as you make it out. I have worked on a number of projects that have this as the goal. While perfect language understanding is a long way off, problems like this require only basic understanding.
Specificially, when you index pages you compare them using inverse document frequency (IDF). By iterating it is relativly trivial to split the related documents into pretty little folders, infogistics is one example, and I've used a demo of a different one that draws a graph of related documents. Interestingly, the complex boolean queries require AltaVista as a backend. Structured searches like this is the next step search engines, I'd be betting on seeing it outside the lab some time year depending on VC funding.
The level of interaction you're talking about is also obtainable. Here (for text users: infinity.otago.ac.nz:8080/teKaitito) we have a reasearch project that does exactly this and more, athough there isn't much use visiting it since we have most of the functionality hidden. A similar project provides an interactive tour of a museum.
A web search for resolution and inference engines should convince you this is a current research topic and should be widely available quite soon. So really all you're asking is for a good inference engine to be integrated into a state-of-the-art search engine. By far the hardest part will be getting the CPU requirements down. I'd say it will take two years at maximum
"...most used piece of freeware on my windows machine (IE doesn't count...I only use it because of the google toolbar.)"
That, and IE isn't freeware - contrary to popular belief, you paid for it when you paid for your O.S. Microsoft didn't make it for nothing, and where do you think their money came from?
It's just a thought.
Oh yeah, and download mozilla and then go here and then download this. Click on it in mozilla and it will install their plugin for you. Then, don't use IE, and smile.
*kudos* to the guy who posted this before me, and definately to the Mozilla programmers who wrote it, i just installed their pseudo-toolbar and it is definately cool.
Who is this Anonymous Coward character, how does he post so much, and why is he always such a whore?
You may not realize it, but what you are talking about is strong AI. The only existing implementation of the data structure you want is in the human brain.
My company's website rose to (2 from the) the top of a generic two word search for google in about 4 weeks. Our sales department was impressed- Altavista indexed the site 2 month later, and regardless of the specificnes of the search we never got close to above link 100.
I won't list the site cause we recently started using Adwords (by the by, they work great!) and I fear the SD bill.
PS Looking back at AltaVista it seems they've tried to googlfye things a bit- it's much cleaner now.
You're the other guy who actually reads the articles! Wow, it's like finding my identical hand twin.
My other
It's funny, my site isn't what I call big but it's indexed in google and it didn't take 2-3 months, I have friends with websites they just opened and the first thing I get from them is what the fuck is Googlebot doing in my logs already. Infact why don't you try it yourself, setup a domain and see how long it takes google to get there. Index it and then have it in it's search engine. I'd make a 50 dollar bet it'll be max 4 weeks.
It takes the current clipboard, and opens a Google search for it. Small tweaking (it's only a short amount of code) would, I'm sure, make it work for other sites. Telsa claims it as working with Gnome, but I use IceWM, and it works great. A real boon, and no wasted space in my netscape toolbar.
Author, Shell Scripting : Expert Re
Indeed. Would that include spell-checking of random comments from users who type "guarantee" and "dont"?
Better to
For March, I've been hit:
1, 2, 4, 5, 6, 7, 8, 11, 12, 13th March.
I'm pretty happy with Google. It gets me results, and it searches me pretty often. 225 hits in March. The only thing it doesn't seem to have picked up, is the homepage (/). It changes pretty often (new headlines, whatever I'm thinking about today), but I guess people link to the content.
Author, Shell Scripting : Expert Re
One thing that Google has over AV is their lack of dumb advertising tricks - as I was reading the comments to this article, an AV window I had open brought itself to the front. I don't mind annoying ads, but when you start trying to control what I do on MY computer, you've overstepped your boundaries.
The best thing about a boolean is even if you are wrong, you are only off by a bit.
I use Copernic for many of my searches. It uses many search engines and seems to always find what I'm looking for. (www.copernic.com) Google and Copernic are all I find I need.....
Fiddlesticks!
The slashdot software fouled up and got my email address and URL wrong.
I'm at seltzer@samizdat.com and
http://www.samizdat.com
As for Google, I've submitted pages for many sites -- including four that I run myself -- and it typically takes 2-3 months for a free submission to get into their index.
...with Altavista is that it insists in giving me the friggin home page in French.
OK, OK, I admit I live in France and speak French, so it's not mortal; but Google has a way of turning the French off whereas Altavista doesn't -- or if it does it makes a bloody good job of hiding it.
No, your children are not the special ones. Nor are your pets.
I have to admit that, after reading Richard Seltzer's response to Mr. Doctorow's article, I'm wondering which search engine he's really arguing for. He freely admits two key facts: that poor business decisions had led to a decline in the quality of searches in Altavista searches, even before Google arrived on the scene; and that Google is a lot better at turning up quality, long-standing content than Altavista is.
So maybe Google doesn't have ALL the latest stuff....... it's extremely good at 95% of what I need, and I don't even need to use those silly "+", "*", "-" symbols anymore. Also, is it just me, or did Altavista only recently decide to fix the special character parsing of keywords in its searches? I remember a time in the not-too-distant past when doing a search on keywords containing "." or "'" would choke, and only return the prefix part of the keyword. And perhaps the best thing about Google? No popups, no sloppy "I paid for advertising" links posted at the top of my result set, no banner ads on each and EVERY DAMNED PAGE -- Google searches, and that's it!
But even if this sounds like I'm bashing Altavista, I'm not. I still fall back on it to see what it'll turn up when I'm looking for rather obscure subject matter, and besides, I don't think Google would be where it is today without having another search engine to compete against. And now that Google's fortunes are rising, Altavista has started focusing more on the quality of ITS searches once again -- a situation where we all win.
For heaven's sake, the /. crowd is getting lamer by the day.
:-),
The fastest site out there is: http://alltheweb.com
(though google is trying hard to catch up -- think they have, haven't tested lately.)
Google is ideal for
USENET: groups.google.com, Images: images.google.com
searching M$ sites like MSDN & "http://support.microsoft.com" Microsoft Support and 'course the highlight feature is very handy.
For security and other reasons, I wouldn't recommend customizing IE with stuff like the google toolbar.
I tried it and ended up with a UN flag of tool bars(dictionary, research, lang. translators, tech searches, Domain search tool bars, e.t.c.)
Instead get Opera. Your productivity will go up by a factor of 4.
Bottom line: Read the advanced features of every search site occassionaly and do not tie yourself to one search option otherwise you'll be no different than a person living in a trailer park.
(they only know what is around their park).
Ohh..yeah. Altavista is great especially when you are trying to search for linux and getting that you can enlarge your penic another 5"...
the first time I saw googlebots in my access logs i nearly fell over, we hadn't asked anyone to index our site.
but I have to admite to being very impressed, every month the googlebots come to visit, they don't disrupt the site (the National Library of Australia hit us with a denial of service attack called "Pandora" when they tried to suck down the enitre site in one go, complete with recursing loops), and they rank us very highly (perhaps too highly, there are more authoritative sites in our region, we do more comment).
anyway I suspect the author forgot that most users of search engines aren't website owners hoping to be indexed, but people doing searches.
Sites that have been regularly updated for a couple of years tend to be a better source of information than those slapped up yesterday.
'There is a Light that never goes out.'
Could he have mentioned his own website any more? Talk about taking advantage of slashdot.
What signature defines me as a person?
Quite a few errors.
. or g/
:) )
First off, Google does indeed support + and - searchs. By default everything is + but this can be changed when using an advanced search.
Google does not support wildcards to my dismay, or REALLY compelx boolean statements, but uh, hehe, I have not actualy HAD to use those after I switched over to using Google.
Wow, I remember when having parans four or five layers deep was rather common, LOL!
You can just type in a web site and Google will report to you what information it has on it.
For instance if I type in my sites address (which has been submited to Google, a week or so ago) of com2kid.netfirms.com I get an ever so nice
"Sorry, no information is available for the URL com2kid.netfirms.com/index.html"
Right below this though is a link that says
"Find web pages that contain the term "com2kid.netfirms.com".
If a page DOES exist in the directory that I can easily select to list sites that link to it (right off of the main page, VERY easy to do) and find pages that are simular to it (and for once this feature WORKS!)
http://www.google.com/search?q=related:slashdot
nifty.
Not a VERY good example, but it works wonders compared to what other search engines manage to turn up.
I can search by domain (VERY handy) or EXCLUDE domains (also very handy.
If any kiddoes are near by I can turn on safe search and not have to worry about as much of the seedier side of the web turning up. This is especialy useful when searching with images.google.com
But there is one final proof that clutchs it all for me.
On Google a search for my online name turns up 121 results.
On Altavista it turns up 11 results
Need help treating your acne? Come here!
I do really quick Google searches straight from the address bar in Mozilla. goo red hat brings me the same page as if I had entered "red hat" in the text box on Google's home page. gool debian is just like I had pressed "I'm Feeling Lucky" after entering "debian".
To set this up, first bookmark the following links:
- Google Search
- Google I'm Feeling Lucky
Then go to Manage Bookmarks, right-click on the Google Search bookmark, open its properties page, and enter "goo" in the keyword field. Do the same for the "lucky" bookmark, but enter "gool" as the keyword (or "lucky", or whatever). Restart Mozilla, and then you've got super-fast searches.--Bruce
There are 10 kinds of people in the world: those who understand binary, and those who don't.
I've had a google popup in Mozilla since, well, since I was still using Netscape. I've also got a dictionary popup in my toolbar, and one for FOLDOC. See bookmarklets
The antidote for misuse of freedom of speech is more freedom of speech.
-- Molly Ivins
Try searching for 'God' in Google and you will come to http://www.phpnuke.org , the site of the famous free portal system PHPnuke.
:D
Perhaps there is no genuine "god" in that site, but to employ so many hours coding such a great program for free, maybe the Vatican should have a look on it on the next canonization
I've been in a constant state of seething pissed-offness since Northern Light shut down their search engine... I had only recently recovered from Infoseek's conversion to useless wierdness a couple years back.
/. to relieve some stress... Voila!!! Got diverted to AltaVista and got 5 out of 10 relevant results! Actual companies that will actually repair our actual speedometer!!!
This very night, we've been searching for a company that does speedometer refurbishing, so with the search "refurbished speedometer" in hand, off we went...
* tried HotBot (his fave engine) -- the results sucked... a link to Firewire iBook was _not_ what we had in mind.
* tried Google (the one I have resorted to lately based on the advice of small children) -- the results sucked less (at least some of the results were about cars), but nonetheless sucked.
* logged into
As for the time it takes to get a site into a search engine, all of them are advertising 1 to 3 month time periods for "free indexing"... I submitted a new site a couple weeks ago to both Google and AltaVista -- neither one of them has searched the site yet.
My opinion is that people want to search and get results quick, not to make a dialog or comprehend a paragraph - but it'd be interesting for layman. Not a bad idea, he should enter the contest yaeh.
Pretty much the one situation that sends me to Altavista these days is when I'm looking for a phrase or a quote. Altavista can do strict strings and complex boolean fun better than Google.
So let me get this straight... from your sample of one, you can draw a conclusion?
I was recently searching for 'tetrachromicity'
on Google and Google went like, 'Do you mean
tetrachomity'? And indeed, that was the correct
term I was looking for. That was quite impressive.
I think there is a place for another engine a-la
Alta Vista that doesn't try to be smart about things but does keyword matches very well. It's
really good for those cases when you remember nothing from the site except an exact sentence
or a few words exactly as they come in the text.
I like the norwegian fastsearch thingie for that
even better than Alta Vista - it may be an even
larger version of the same thing.
Ask Jeeves is OK, too. Of course, the natural
language processing doesn't work, but it's pretty
good as metasearch. Apparently, they get the best
stats and the most useful inputs because when
people try to enter fully qualified English queries, they simply end up giving them more search terms then usual.
I don't know why Google doesn't allow simultaneous "site:" and "link:"
It does.
Demonstration
You just can't do it from the advanced search page, you have to (gasp!) type it into the box yourself.
BugBear
Ignorance is curable. Stupid is forever.
Why would this be a catastrophe? I've banished Flash from every box I own, because I only saw it used for content-free, garish spam. Good riddance, and any such ignorantly designed page deserves to be ignored.
Students used to locating information with Google are appalled at the steps it takes to locate a scholarly journal.
Well, I don't know about your "students" - but I personally like both!
Especially in the last few years the Library indexes have vastly improved! I remember when InfoTrac came out w/ it's monochrome screen, CD-Rom jukebox, and cheesy IBM dot-matrix printer (sometimes thermal).
*THAT* was nothing compared to some of the systems in use today - plus the fact that many publishers provide indexes for all of the journals they publish. Most of the time you can get a
But, that's just me.
I'm a 2000 man.
> I don't suppose Google can fetch the URLS
> inside a Flash file (correct me If I'm wrong),
Sorry, you're wrong. Google does index the links inside a Flash movie, but not the content.
Of course anyone who takes search engine marketing seriously wouldn't dream of hiding indexable content in unstructured binary proprietory formats anyway.
Calum
--
Calum I Mac Leod, Scottish Borders
With Google you can play your dialog by pphrasing it in terms of keywords to be found on the pages, and this is exactly what I do when I use Google:
[user] "Mary Jane Carpenter"
[google] (the first 1-10 links return bio, official website, etc. etc.)
[user] (doesn't find the MJC wanted; determines a distinctive attribute that is likely to appear on somebody's webpage) "Lewis" [google] returns more specific results There are plenty of keywords to try - if I have trouble finding something, it probably means that it isn't on the Web in the first place, not that Google doesn't give me a way to locate it.
(But that claim can be researched, of course.)
Of course you have to learn how to spell. Google can employ soundex to return "Louis" matches in response to "Lewis" queries. But if it did this kind of thing automatically, its results would deteriorate, because false positives with unknown causes are much more disruptive to the user than an excess of genuine hits; the latter can simply be filtered away by adding more search terms.
"because the domains are different, the many thousands of links these sites have to one another all count toward the automated calculation of their popularity and quality at Google"
Wrong. PageRank counts links, whether they're on the same domain or not.
"giving them all a boost in the rankings and hence bringing Webseed more traffic and hence more revenue"
Wrong. Webseed tripped Google's spam penalty and their hub has had a PageRank of exactly zero since about January 25.
"AltaVista appears to be making a comeback"
Wrong. AltaVista is on its last legs. Fast Search is a quick, comprehensive search engine with advanced features. Inktomi is a good referrer for site owners because it powers big sites like MSN Search. Google is a big, quick search engine that's iextremely popular, very easy to use and it still has a habbit of putting the better sites near the top (even though people are trying very hard to spam it).
Calum
--
Calum I Mac Leod, Scottish Borders
"It works great for old, established sites to which many other old, established sites have linked. (It works great for my site :-) www.samizdat.com ). But new sites, regardless of the quality of their content, get short shrift."
Google weights its results to the new or recently updated, and when I submitted a web page, it was available within two weeks, and high on the list within two more.
Google also updates *daily* for certain special categories, like news sites.
"Fortunately, the powerful commands remain -- for instance, the ability to exclude as well as include terms in your query."
So does Google. Did you even TRY it?
"At AltaVista, I can search for +link:samizdat.com -host:samizdat.com and get exactly what I want"
GOOGLE link:samizdat.com -site:samizdat.com
Works beautifully.
"But to use that command, I need to have additional query terms: site:samizdat.com alone generates no results."
site:samizdat.com +a
site:samizdat.com filetype:htm OR filetype:html
"+a" is particularly helpful, as it will find damn near anything.
"...also as co-author of the book The AltaVista Search Revolution..."
Well, that explains it, then.
a little fish in a big pond (thomas weigel)
Explicit submission is a toy to make webmasters feel good.
I applied at Google to work as a programmer. I observed the same basic problem that seems to plague most Silicon Valley companies, and that is that Google employs too many people who just don't know what they are doing. Why? Because like most of the companies in the valley Google hires young kids with no life experience and usually college degree. These kids just an arrogant attitude and an sufficiently conservative bent to impress the dimwit managers. In short, a useless bunch of people. Programming is simply not a blue-collar job, no matter what Silicon Valley wants to believe. It's a profession and companies need to respect that.
Apparently, the domain name is included in list of queryable words, so a query on
site:samizdat.com samizdat.com
will give you all pages on samizdat.com. It generates 1,290 results, just like your workaround.
You know, Microsoft's street address also says a lot about their mentality.
It does. Demonstration [google.com] You just can't do it from the advanced search page, you have to (gasp!) type it into the box yourself
Nope, not really -- I tried that too. It just treats "+link:samizdat.com" as the search phrase "link samizdat.com".
There's a much easier way. If you want even more speed then feed your search directly into the search engine's CGI using IE4 powertools "Quick Search" that it's why I still use IE4 ;-) It's way better than the Google toolbar.
Simply feed in the URL and the location of the search parameters using %s and you're set - despite using Google several times a day I haven't seen it's front page in months (except when I used my friend's computer).
For Google the script is:r ch
http://www.google.com/search?q=%s&sa=Google+Sea
I type g nuclear physics
For altavista boolean search the script is:r =& pg=aq&search=Search
http://www.altavista.com/sites/search/web?q=%s&
I type ava (nuclear NEAR physicists) AND scientist AND (glow NEAR in NEAR the NEAR dark)
For Google you can't beat getting good hits at the top, but then with Altavista the boolean queries are so good that I sometimes get *ONE* hit and it's exactly what I was looking for. For example try this precision search I'm surprised that none of the /. crowd PERL and shell hackers have mentioned using a script like this. Bajeeeeeeeesus.
A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?
I didn't use much of altavista for years... each time I want to type it in my address bar, I type "astalavista"... maybe because I search for special things :)
"Science will win because it works." - Stephen Hawking
P.P.S. -- AltaVista appears to be making a comeback. Six years ago, when I was in the Internet Business Group at Digital and Digital owned AltaVista, about a third of the traffic to my Web site came by way of AltaVista.
I don't see this as a good thing. You seem to see it as a good sign if a significant number of hits to your website come by way of AltaVista, and that it's bad when that percentage goes down. But I wouldn't feel very comfortable if a large chunk of my business was so reliant on a single path. Redundancy is key. I would be much happier if the number of customers I got came from a large variety of sources. It's like a mutual fund - don't put all your money in one stock. Don't put all your eggs in one basket.
As a high-school librarian in California I encourage students to use Google when they know exactly for what they looking, otherwise I suggest SurfWax which offers a great deal more flexibility in terms of a research tool. I have found that students especially like the ability to save research in a web-based folder that can be shared with other students (great for collaboration).
I tried Seltzer's approach and he's all full of hot air!
Google gave me exactly what I wanted and AV filled pages upon pages of repetitive stuff! Google suppresses the repetitive stuff unless you want to
view them.
Fair enough, It is not my aouthroty to dictate what is right and wrong. I am not some moral mecca to disperse my mighty knowledge throughout the world. My apologies to you.
I once shot a man who posted too many, "Imagine a beowulf cluster of these"
Surely you mean the Netherlands and not New Zealand? (goes quietly mad answering his own posts)
Video Game cheats, hints a