Slashdot Mirror


The Problem of Search Engines and "Sekrit" Data

Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.

3 of 411 comments (clear)

  1. How can this happen? by Nonesuch · · Score: 4, Redundant
    To the best of my knowledge, search engines all work by indexing the web, starting with the base of web sites or submitted URLs, and following the links on each page.

    Given this premise, the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.

    That is to say, the web-indexing bots used by search engines cannot find anything that an ordinary, very patient human could not find by randomly following links.

  2. Easy solution by Arethan · · Score: 1, Redundant

    Your crawler is caching credit card numbers you say? Simple, check the content you cache for 16 digit numbers. Any that you find, you check with a simple LUHN (mod 10) algorithm. If it passes, you replace the number with "################" or a similar masking.

    There, all credit card numbers will now filtered from your cache.

    I understand the severity of the issue, and it's good to know this is happening, but the solution is simple.

  3. It's not Google's fault you're a dipshit. by Wakko+Warner · · Score: 1, Redundant

    If you somehow manage to post your credit card info on the web, exactly whose fault is it? The only way it *can't* be your fault is if it's a poorly-constructed e-commerce site that leaks out that kind of info.

    I just don't see what the big deal here is.

    - A.P.

    --
    "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"