Slashdot Mirror


The Problem of Search Engines and "Sekrit" Data

Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.

3 of 411 comments (clear)

  1. Stopping Google won't stop the problem... by Kr3m3Puff · · Score: 5, Insightful
    The big complaint of the article is that Google is searching for new types of files, instead of HTML. If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

    The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...

    --
    D.O.U.O.S.V.A.V.V.M.
  2. Re:Simple but burdensome solution by Xerithane · · Score: 5, Insightful

    It is a burden, but the responsibility does not lie on a crawling engine. You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

    I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.

    If anything Google is providing them a service by telling them about the problem.

    --
    Dacels Jewelers can't be trusted.
  3. Re:Well Behaved Crawlers by ryanvm · · Score: 5, Insightful
    The Robot Exclusion Standard (e.g. robots.txt) is mainly useful for making sure that search engines don't cache dynamic data on your web site. That way users don't get a 404 error when clicking on your links in the search results.

    You should not be using robots.txt to keep confidential data out of caches. In fact, most semi-intelligent crackers would actually download the robots.txt with the specific intention of finding ill-hidden sensitive data.