Slashdot Mirror

← Back to Stories (view on slashdot.org)

Searching the 'Deep Web'

Posted by ryuzaki0 on Tuesday March 9, 2004 @01:50AM from the sounds-more-like-the-deep-hurting dept.

abysmilliard writes "Salon is running a story on next-generation web crawling technologies, specifically Yahoo's new paid "Content Acquisition Program." The article alleges that current search services like Google manage to access less than 1% of the web, and that the new services will be able to trawl the "deep web," or the 90-odd percent of web databases, forms and content that we don't see. Will access to this new level of specific information change how we deal with companies, governments and private insitutions?"

3 of 193 comments (clear)

Min score:

Reason:

Sort:

AKA goodbye robots.txt by Anonymous Coward · 2004-03-09 01:52 · Score: -1, Redundant

AKA "What's a robots.txt file?" says the innocent web crawling robot. :P
Get ready to tighten up those dynamic site scripts by pubjames · 2004-03-09 02:00 · Score: 0, Redundant

My guess is that they will be looking at ways of automatically polling dynamic web sites to extract all the data from the database. So if a site has a page, for instance

www.site.com/index.asp?content=10,

the search engine will try content=1 to content=n to see what it gets.
Deep crawling my hard drive? by pieterh · 2004-03-09 02:17 · Score: -1, Redundant

Surely if it's not been published on a web site, it's not meant to be accessible and indexed. The hidden 90% is mostly confidential data, private documents, porn, and miscellaneous files. Why would anyone want to crawl this?

--
My blog