Search Beyond Google

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Monday February 23, 2004 @06:00AM from the hand-that-mocked-them,-and-the-heart-that-fed dept.

An anonymous reader writes: "'Search Beyond Google', the cover story of the March issue of Technology Review, is one of the few current Google stories that discusses whether their technology can stay ahead of the competition in the months to come."

22 of 248 comments (clear)

Min score:

Reason:

Sort:

google needs "stemming" by elwinc · 2004-02-23 06:13 · Score: 4, Informative

I'm a heavy google user, but I still miss altavista's ability to search for stems. For example, an altavista search for "slid* rul*" will get 'slide rules,' 'sliding rulers,' and plenty of other variations. Google does support whole word wildcards (try "miserable * failure") but stems are even more useful.

--
--- Often in error; never in doubt!
1. Re:google needs "stemming" by Kelerain · 2004-02-23 06:55 · Score: 3, Informative
  
  Google seems to do this by default (bottom of page). Notice the search for slide rules has several instances of 'rules' highlighted?
  Aditionally, in the 'one up the competition' category, google can search synonyms of words like this:
  Google: ~slide ~rule. I learned that one in the 'favorite google features' thread. More info on thier advanced help page (3rd down, "~" searches). I also really love (and use heavily) thier other search operators.
In 3 months? by oGMo · 2004-02-23 06:15 · Score: 5, Informative
People seem to think Google is simply a place to find HTML pages. You type in your words, and poof, you get some relavent sites. Could this be replaced in 3 months? Google has a huge index, a very good search algorithm, and works for most people, but (in theory) someone might come up with a working alternative in that period. However:
- Images is great for searching for pictures. The results are uncannily good.
- Groups lets you search Google's huge Usenet archive (remember when they purchased this from Deja?).
- News is my primary source for world news.
- Froogle is great when searching for where to buy almost anything.
- Answers lets you pay for research when the rest don't cut it.
- Catalogs lets you search mail order catalogs for when Froogle doesn't cut it.
And more. Babelfish translation? Caching like a billion pages? Simple design, with text ads that are actually relavent? In 3 months.

Yeah, right.
--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Re:Google can't rest on its successes by Anonymous Coward · 2004-02-23 06:21 · Score: 4, Informative

http://search.yahoo.com/
Meta search engines by steve.m · 2004-02-23 06:30 · Score: 2, Informative

I quite like Vivisimo (after I figured out how to make it include Google in it's query by adding 'google' to the 'sources=' part of the query URL).

dogpile is also quite good, when you've got it set to display results by relevance rather than by engine.

Remember, Amazon isn't the only online bookstore, ebay isn't the only online auction site and google isn't the only search engine...
Re:All good things ... by indigeek · 2004-02-23 06:35 · Score: 5, Informative

Google works approximately by modding up the sites that get linked to the most. All the contributing links have an equal weightage it seems. This allows scamming by forming webrings and similar circular linking schemes
Another approach I heard being discussed is to give more popular sites a higher weightage. ie If a site has a lot of pages linking to it, the sites linked from this site must also be good. Apparently if done right, you can do a few iterations and get to a better algo.
Or probably assign a number to (karma if you will ) to each site. Then divide this karma by the number of sites it links to and add this to all the linked sites. Eliminate the cycles in the graphs and iterate.
Three keys to the search game by LostCluster · 2004-02-23 06:35 · Score: 5, Informative

There are three very distinct elements involved in creating a powerhouse search engine:

- A large crawl: A search engine with nothing in its database isn't going to work very well. A search engine needs as big of a crawl as possible in order to have any results at all. This takes huge resources in terms of bandwidth and computing power. Some of the early search engines met their demise when they couldn't afford to keep their crawlers growing as fast as new web content comes out.

- The Sorter: Once the long list of results that match the keywords are pulled out of the crawl, a sort needs to be applied in order to locate the best results and present them first. Google got vaulted to the top because PageRank was better than anybody else has ever put out. However, PageRank isn't perfect, so there is still room for somebody to make something better than PageRank.

-Promotion: A web site just sits there unused if it isn't promoted. Google never spent much on advertising and it just relied on word of mouth since it was so strong in the other two areas. And now that everyone turns to them first without even checking other engines, that has given them the strong advantage of a strong brand image. However, we've seen plenty of cases where inferior technology has been beaten out by better marketing. If somebody's tech passes Google, without marketing it nobody will know about it. Therefore, look for the challengers to be launching major ad campaigns inviting people to at least try them before they assume Google is better.

Can anybody put it all together? We're about to find out...
Re:Regexps, please! Anyone! by LostCluster · 2004-02-23 06:37 · Score: 3, Informative

The problem with regexps is that they can be used to create very database-expensive queries. No search engine is ever going to allow a query that returns the entire database as the result set either.
Re:Regexps, please! Anyone! by roman_mir · 2004-02-23 06:40 · Score: 3, Informative

but you wouldn't be able to run a regexp against the entire document base since Google does not store the entire document for the purposes of indexing (googlecache is for a different purpose), what kind of computing power would one need to search all documents in Googlecache with a regexp under one second? And for more than one user at a time?

--
You can't handle the truth.
Re:Search engine spam is the key... by mopslik · 2004-02-23 06:41 · Score: 5, Informative

I thought the whole concept of google was that it ranked pages higher if lots of other pages linked to it.

And this is exactly one of the problems that is now coming to light. Spammers set up hundreds of tiny sites that do nothing but point to each other, thus inflating their PageRanks. They've saturated Google to the point that searching for information about commercial products usually returns 2/10 legitimate pages.

At least, that's been my experience.
Google has added stemming by interiot · 2004-02-23 06:47 · Score: 5, Informative

Google recently added stemming as a search of {quit smoke} will reveal. You can read about it in their help section. Stemming can be disabled on specific words. Otherwise the update came around November 15, 2003, but is probably still in flux, so there isn't too much good info about it yet.
Re:How I'd fix Google... by Ratcrow · 2004-02-23 06:56 · Score: 3, Informative

This was posted on /. a while ago under a similar story, but in case you missed it, there is a place to report spam on Google:

http://www.google.com/contact/spamreport.html

I now have it as a bookmark so I can hit it quickly.
Sometimes you want spam by Anonymous Coward · 2004-02-23 06:56 · Score: 1, Informative

I discovered Discount Watcher via what seemed like a spam link on Google but it turns out to be a very cool service that finds the latest discounts on almost anything you want and turns it into an RSS feed. Now my aggregator is filled with spam. But it is spam I want.
Re:Search engine spam is the key... by justMichael · 2004-02-23 07:09 · Score: 4, Informative

try using this

something interesting -site:example.com

At this point there's no way to save it as a pref, but you could always drop it in a text file to keep a big list
Re:Google can't rest on its successes by edsarkiss · 2004-02-23 07:13 · Score: 2, Informative

that's because yahoo *offers* much more than google does.

if you want a simple search box, navigate to the yahoo! search page.

--

SIGUSR1
Re:How I'd fix Google... by inxil · 2004-02-23 07:20 · Score: 3, Informative

The google toolbar already has voting buttons. Not quite what you're talking about, but...

--
--
Why the hell not? Here's some SEO: Home Inspector
Look harder by stewby18 · 2004-02-23 07:46 · Score: 3, Informative

Actually, there essentially is a meta-moderate link tucked down at the bottom of the page:

Dissatisfied with your search results? Help us improve.

It's not an automated system, but it does let you report "bad moderation".
Re:Search engine spam is the key... by mopslik · 2004-02-23 08:06 · Score: 2, Informative

Yes, Google has tweaked their algorithms and added filters to strip out some of the obvious abuses. But lately it seems like each time they remove a link, two more replace it.

Maybe they've got some super-sneaky solution they're working on right now to remedy this. It would certainly help prevent searches like:

+product +information -buy -deals -ReferralFarmName -otherRedirectTerms -...
Re:I've heard the New Coke disaster was planned by Shenkerian · 2004-02-23 08:20 · Score: 3, Informative

Yeah, I'd read that, too. But Snopes claims it's not true.

--
You tell me how "whilst" differs from "while," and I'll stop calling you a pretentious jackass.
Beyond google... by Anonymous Coward · 2004-02-23 08:49 · Score: 2, Informative

Some of these smaller natural language engines are beginning to look very promising, see: answerbus,brainboost,webqa

Interesting as to why the big boys are largely ignoring this domain. I suspect old man jeeves has turned people off to the possiblity of reliable QA.
Re:All good things ... by trenton · 2004-02-23 08:53 · Score: 3, Informative

Yes, this is kinda how miserable failure points to where it does. A bit on the technique behind this here.

--
Too big to fail? Does that make me to small to succeed?
Re:Search engine spam is the key... by Idarubicin · 2004-02-23 11:23 · Score: 3, Informative

I'm surprised (in retrospect) that it took so many years for so-called ``google-whacking'' to emerge.
A Googlewhack is a two-word Google query that returns exactly one result.
The term you're looking for is probably Googlebombing, which refers to deliberately placing keywords and links on multiple domains to boost a site's PageRank. Originally, Googlebombs were pranks or in good fun, like a search for weapons of mass destruction.
Now "Googlebombing" is being expanded by some to include manipulating PageRanks for commercial ends. I'll leave it to the armchair etymologists of Slashdot to decide if that is a correct use of the term.

--
~Idarubicin