Slashdot Mirror


Indexing the Entire Web?

cah1 writes "BBC is carrying a story about another new search engine All The Web. The designers are planning to have the whole shooting match, all billion pages, indexed by the end of the year. " You can also read press from the company as well. I'm skeptical-they claim to be able to catch up within the first year, and keep up thereafter. But they claim to have 200 million already, so who knows?

2 of 98 comments (clear)

  1. Wow - Looks Great by Aaron+M.+Renn · · Score: 3

    I judge search engines by the most important criteria of all - how many references to me they have. Alltheweb now has vastly more than runner up Google, making them the biggest ever. I type in "Aaron M. Renn" and I got 1604 on AllTheWeb, ~500 on Google and only ~180 on AltaVista. Even if that number drops as I searched through the pages, it's still impressive. I did look through the plain "Aaron Renn" listings too, where they also crushed the competition (though it's a much smaller number of pages since I virtually always use my middle initial). Believe it or not, there is a page out there with another "Aaron Renn" on it. Pretty weird.

  2. Re:It seems that... by davie · · Score: 3

    Not to harp on one of my pet ideas or anything, but I think a distributed spidering project could be pulled off. The trick would be to delegate the work based on compute power and bandwidth, with the "low-end" clients doing the grunt work of spidering, then passing the raw data up to the bigger iron with more bandwidth where the relationships between sites could be ferreted out, keywords could be indexed and context established, etc. These sites could then pass the cooked data back to the top level servers (compressed, of course) for whatever final work needs to be done and then insertion into the database. The idea is to have each client do the work it's best suited for, and to distribute the load more evenly. Bandwidth could be a problem, but I think a lot of the data could be "tokenized" somewhat once references have been established, and some compression would probably help.

    If I had the networking know-how I would put together a proposal and start taking flame-mail, er, suggestions. Since I don't, I hope someone who does and is as crazy as me will pick up on the idea.

    --
    slashdot broke my sig