Slashdot Mirror


Online Search Engines Lift Cover Of Privacy

Rican writes "MSNBC has an interesting article about how 'Googledorks' are using the powerful search engine to do searches across the web for sensitive and/or private information. Some of this information includes 'Medical records, bank account numbers, students' grades, and the docking locations of 804 U.S. Navy ships, submarines and destroyers.'"

9 of 460 comments (clear)

  1. The worst example.. by centralizati0n · · Score: 5, Informative

    The worst example I saw was the FBI NCIC 2000 manual [PDF]. It gives you examples of how to look up criminal records and such... which could be very useful to the criminally vested social engineer.

  2. Re:Um. by mhesseltine · · Score: 4, Informative

    .htaccess anyone?

    That, along with an appropriate robots.txt file should be all you would need to prevent a crawl, right?

    --
    Overrated / Underrated : Moderation :: Anonymous Coward : Posting
  3. Re:Um. by Elwood+P+Dowd · · Score: 4, Informative

    Here's how it works. Let's say you put a page on your site called

    http://yoursite.com/temporary/hidden/dontreadthi s/ private_document.html

    And it is not linked to ever.


    I realize this is redundant, and you were likely trolling, but Google will leave you right the fuck alone, so long as you put another little file at:

    http://yoursite.com/robots.txt

    That contains the text:

    User-agent: *
    Disallow: /

    I realize this is opt-out rather than opt-in, but there's just one place you have to opt, and there isn't another way that Google could possibly do their job. Everybody else seems to understand that the internet is a publicly accessible network.

    So who's to blame? You. You put a sensitive document in a publicly accessible location on the internet, and took no precautions to keep it secure. Not linking to it is not a precaution.

    --

    There are no trails. There are no trees out here.
  4. Re:Why Google? by Xenographic · · Score: 4, Informative

    1) This is old. I remember searching for things like '"index +of" vti' and other such things (try it and modify that search if you like, but it was interesting to find out just what sort of interesting tidbits one might find in such a folder).

    2) This is an article from MSN. This information was available long before Google, but it is, at the very least, curious to see this sort of article from Microsoft when they have been going to the press lately about how Microsoft intends to develop their own search technology...

  5. Re:Kazaa and Gnutella are cooler by tsvk · · Score: 5, Informative
    Go into kazaa and gnutella and search for any .doc files. Or some likely sounding names like "resume" or "job application".

    Other examples are ".dbx", the file name extension for mail folders in Outlook Express. Or ".pwl", the Windows 9x system password file (supposedly easily crackable with the correct tool).

    There are unfortunately clueless users who share their whole hard drive. File sharing programs have however started getting better in discouraging or preventing the users from doing this.

  6. What I like by Anonymous Coward · · Score: 5, Informative

    The thing is that most people will literally inadvertantly share their entire hard drive's contents, or at least all "media files".

    What I like to do is go on gnutella or kazaa and search for "DSN" or one of a number of similar prefixes. Why? Because most digital cameras save their files in a specific hardwired format, and the kind of people who leave their entire hard drive shared on kazaa are the kind of people who don't rename their digital cameras.

    You can find the most random, interesting, occationally personal shit that way.

    I'm trying to remember the other common prefixes besides DSN and failing.

    -- Super ugly ultraman

  7. Get a clue by Chuck+Chunder · · Score: 4, Informative

    The google mediapartners bot which will look at pages for the purposes of advertising such as in Opera is different and seperate from the bot that adds pages to Google's search database. The mediapartners bot does not feed the Google search engine.

    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park
  8. Re:Uh-huh. by Anonymous Coward · · Score: 5, Informative
    > Want to expand on that or are you just trolling? How did the
    > existance of that page get from Opera to Google such that it
    > could pin-point (not crawl) that page?

    Opera submits URLs browsed to by users, to google, when advert support is turned on.

    http://www.opera.com/adsupport/

    From that page:
    --------
    What is the connection between the Web page and the relevant ad displayed by Google?
    Opera's interaction with the Google ad system:

    The Opera browser sends Google the URL of the web page you are visiting and your IP address (with the exceptions Opera filters out -- see below)
    --------

    Exceptions are https, forms, passwords, cgi, and non-http URLs.

    As an example from my apache log file last night, when I gave a friend a URL to a photo:
    xxxxxxx.upc-g.chello.nl - - [10/Feb/2004:02:23:53 +1100] "GET /temporary/sooted.jpg HTTP/1.1" 200 74339 "-" "Opera/7.23 (X11; Linux i686; U) [en-GB]"
    crawler8.googlebot.com - - [10/Feb/2004:02:28:39 +1100] "GET /temporary/sooted.jpg HTTP/1.0" 200 74339 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
    It's surprising how many Opera users will deny this happens, despite the evidence. That's a 5 minute delay, google is pretty quick with its crawling. Personally, I don't mind. I put things up in my temporary directory and pull them down fairly soon after. I know nothing is secure if it's just an unprotected URL, so I'm not worried like the grandparent poster. However, Opera does send URLs to google, and google does come back and check them out.
  9. Re:Enough of the bullshit! by Syre · · Score: 4, Informative
    Hmm... if Opera doesn't send URLs to Google, why does it say on the page you linked (bold and italics mine):

    Opera's interaction with the Google ad system:
    • The Opera browser sends Google the URL of the web page you are
      visiting
      and your IP address (with the exceptions Opera filters
      out -- see below)
    • Google tries to determine your general geographic location based on your
      IP address, to better target the ads
    • The Google ad server consults Google's web database to find out what kind of content
      is on that page
    • Ads that are deemed most relevant are then served based on geographic location
      and the Web page accessed