Slashdot Mirror


The Problem of Search Engines and "Sekrit" Data

Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.

11 of 411 comments (clear)

  1. Tangential Google Question by banuaba · · Score: 5, Interesting

    How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?
    If I want to find lyrics to a song, the site that has them will often be down, but the cache will still have them in there.. Why is what google is doing 'okay' but what the origional site not okay? Or do they just leave google alone?

    --


    Brant

    Argle. Bargle.
    1. Re:Tangential Google Question by CaseyB · · Score: 3, Interesting
      Good question.

      Given that they do have (for now) some sort of immunity, it opens a loophole for publishing illegal data. Simply set up your site with all of Metallica's lyrics / guitar scores (all 5 of them, heh). Submit it for indexing to Google, but don't otherwise attract attention to the site. When you see the spider hit, take it offline. Now the data is available to anyone who searches for it on Google, but you're not liable for anything. The process could be repeated to update the cache.

  2. Re:A symptom of poor programming... by ChazeFroy · · Score: 5, Interesting

    Try the following searches on google (include the quotes) and you'll be amazed at what's out there:

    "Index of /admin"
    "Index of /password"
    "Index of /mail"
    "Index of /" +passwd
    "Index of /" password.txt

  3. Many crawlers ignore robots.txt by Ars-Fartsica · · Score: 3, Interesting

    I do not know if this is still the case, but Microsoft's IE offline browsing page crawler (collects pages for you to read offline) ignored robots.txt last time I checked. I know many other crawlers do likewise.

  4. Sure enough. by Joe+Decker · · Score: 3, Interesting
    Looked up the first 8 digits of one of my own CC numbers, and, while I didn't find my own CC # on the net, I did immediately find a large file full of them with names, expiration dates, etc. (Sent a message to the site manager, but this case is pretty clearly an accidental leak.)

    At any rate--scary it is.

  5. standing naked in front of the window by eddy+the+lip · · Score: 3, Interesting

    But other critics said Google bears its share of the blame.

    "We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

    also known as ostrich security...if you're s00p3r s3cr37 files are just lying around waiting for idle surfers, search engines are the least of your worries. if you don't know enough to protect your files (by, say, not linking to them, or .htaccess files, or encrypting them), it's not the search engines fault. it's you're own dumb ass.

    this guy's just looking for free hype for his book. if that's the kind of advice he offers, he's doing more harm than good.

    --

    This is the voice of World Control. I bring you Peace.

  6. Re:A symptom of poor programming... by Legion303 · · Score: 5, Interesting
    Please give credit where credit is due. Vincent Gaillot posted this list to Bugtraq on November 16.

    -Legion

  7. Re:Stopping Google won't stop the problem... by Zspdude · · Score: 3, Interesting

    It's definately very true that if there were no stupid people these things would not be an issue of controversy. However, society has struggled for a very long time to resolve the question, "Should stupid people be protected from themselves?" There will always be those who( whether they're just technologically inept or for whatever reason) will not act sensibly and not realize they are being foolish. Do they deserve protection as well, even though they don't know how to protect themselves? That's a question which is not quite as easy to answer....

    --
    What's in a Sig?
  8. Blissful ignorance backfires again. by hkmwbz · · Score: 3, Interesting
    That a search engine is able to harvest this kind of data just proves that some people don't know what they are doing. Forgive me if I seem judgmental, but these people are probably the same people who think Windows XP is the next step and that IE is the only browser in the world. But as is proven again and again, ignorance backfires. Not only are they attacked by viruses and worms and have all backdoors and security holes exploited - they are ignorant enough to leave users' data in the open, for everyone to get.

    Google's comment was:

    "The primary burden falls to the people who are incorrectly exposing this information."

    This is where they should have stopped. Those who find their credit card information in a search engine will learn a lesson and use services that actually take care of their customers' security and privacy. Google shouldn't have to clean up incompetent people's mess.

    In the long run, these things can only lead to the ignorant (wannabe?) players in the market slowly dying because they don't know what they are doing.

    I personally hope someone gets a taste of reality here, and that only the serious players survive. The MCSE crowd may finally learn that there's more to it than blind trust in their own (lacking) ability.

    --
    Clever signature text goes here.
  9. Different file types make my day by srichman · · Score: 3, Interesting
    The big complaint of the article is that Google is searching for new types of files, instead of HTML.
    The only people who complain about this are obviously the folks using crossed fingers for security. The rest of us love that Google indexes different file types.

    I'll never forget the day I first saw a .pdf in Google search result. Not that long ago I saw my first .ps.gz in a search result. I mean, how dope is that!? They're ungzipping the file, and then parsing the postscript! Soon they'll start uniso-ing images, untarring files, unrpming packages, .... You'll be able to search for text and have it found inside the README in an rpm in a Red Hat ISO.

    Can't wait until images.google.com starts doing OCR on the pix they index...

  10. Password search by azaroth42 · · Score: 3, Interesting
    Or for more fun, do a search like


    filetype:htpasswd htpasswd


    Scary how many .htpasswd files come up.


    -- Azaroth