Slashdot Mirror


Online Search Engines Lift Cover Of Privacy

Rican writes "MSNBC has an interesting article about how 'Googledorks' are using the powerful search engine to do searches across the web for sensitive and/or private information. Some of this information includes 'Medical records, bank account numbers, students' grades, and the docking locations of 804 U.S. Navy ships, submarines and destroyers.'"

30 of 460 comments (clear)

  1. The worst example.. by centralizati0n · · Score: 5, Informative

    The worst example I saw was the FBI NCIC 2000 manual [PDF]. It gives you examples of how to look up criminal records and such... which could be very useful to the criminally vested social engineer.

  2. Nothing new by dattaway · · Score: 3, Informative

    People have used this for years to find things like Bill Gates' social security number and all kinds of things we think should be private. Chances are, if its in a record somewhere, that information will leak onto the internet sooner than most people think.

  3. Re:Kazaa and Gnutella are cooler by baryon351 · · Score: 1, Informative

    They don't seem to be, although many could. There's just too many unique ones out there IMHO.

    Then again I don't have a WP that'll run those scripts.

  4. Re:FUD Story to pump MSN Search? by npistentis · · Score: 3, Informative

    it was an AP story- I read the same thing in this morning's washington post.

    --
    Gentlemen, you can't fight in here! This is the War Room!
  5. Re:Um. by mhesseltine · · Score: 4, Informative

    .htaccess anyone?

    That, along with an appropriate robots.txt file should be all you would need to prevent a crawl, right?

    --
    Overrated / Underrated : Moderation :: Anonymous Coward : Posting
  6. Re:Um. by Elwood+P+Dowd · · Score: 4, Informative

    Here's how it works. Let's say you put a page on your site called

    http://yoursite.com/temporary/hidden/dontreadthi s/ private_document.html

    And it is not linked to ever.


    I realize this is redundant, and you were likely trolling, but Google will leave you right the fuck alone, so long as you put another little file at:

    http://yoursite.com/robots.txt

    That contains the text:

    User-agent: *
    Disallow: /

    I realize this is opt-out rather than opt-in, but there's just one place you have to opt, and there isn't another way that Google could possibly do their job. Everybody else seems to understand that the internet is a publicly accessible network.

    So who's to blame? You. You put a sensitive document in a publicly accessible location on the internet, and took no precautions to keep it secure. Not linking to it is not a precaution.

    --

    There are no trails. There are no trees out here.
  7. Re:Why Google? by Xenographic · · Score: 4, Informative

    1) This is old. I remember searching for things like '"index +of" vti' and other such things (try it and modify that search if you like, but it was interesting to find out just what sort of interesting tidbits one might find in such a folder).

    2) This is an article from MSN. This information was available long before Google, but it is, at the very least, curious to see this sort of article from Microsoft when they have been going to the press lately about how Microsoft intends to develop their own search technology...

  8. Re:Kazaa and Gnutella are cooler by tsvk · · Score: 5, Informative
    Go into kazaa and gnutella and search for any .doc files. Or some likely sounding names like "resume" or "job application".

    Other examples are ".dbx", the file name extension for mail folders in Outlook Express. Or ".pwl", the Windows 9x system password file (supposedly easily crackable with the correct tool).

    There are unfortunately clueless users who share their whole hard drive. File sharing programs have however started getting better in discouraging or preventing the users from doing this.

  9. What I like by Anonymous Coward · · Score: 5, Informative

    The thing is that most people will literally inadvertantly share their entire hard drive's contents, or at least all "media files".

    What I like to do is go on gnutella or kazaa and search for "DSN" or one of a number of similar prefixes. Why? Because most digital cameras save their files in a specific hardwired format, and the kind of people who leave their entire hard drive shared on kazaa are the kind of people who don't rename their digital cameras.

    You can find the most random, interesting, occationally personal shit that way.

    I'm trying to remember the other common prefixes besides DSN and failing.

    -- Super ugly ultraman

  10. Re:Hard to hide by You're+All+Wrong · · Score: 2, Informative

    """
    one of the central tenets of computer network security: If it is connected to the Internet, it can be accessed
    """

    That's not one of the central tenets of computer network security.
    If it's not connected to the internet, it cannot be accessed, but that doesn't imply what you've said.

    If it's connected to the internet, and there's a daemon which answers requests with the information requested, then it
    can be accessed. There's a subtle difference though - namely the daemon which answers the requests. Without that there's no access, and there can never be any access.

    YAW.

    --
    Your head of state is a corrupt weasel, I hope you're happy.
  11. Re:Um. by lambent · · Score: 2, Informative

    robots.txt doesn't matter worth a damn, if you're not feeling polite.

  12. Get a clue by Chuck+Chunder · · Score: 4, Informative

    The google mediapartners bot which will look at pages for the purposes of advertising such as in Opera is different and seperate from the bot that adds pages to Google's search database. The mediapartners bot does not feed the Google search engine.

    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park
  13. Noindex by Zenmonkeycat · · Score: 1, Informative

    Please webmasters, learn to use the proper code for preventing bots from scanning your page. The Robot meta tag will do that quite effectively. Alternately, you could just /not/ make a webpage with your usernames and passwords, and that would be a lot easier.

    --

    *****
    Dear Mary,
    I yearn for you tragically,
    A.T. Tappman, Chaplain, U.S. Army.

  14. Enough of the bullshit! by Chuck+Chunder · · Score: 3, Informative
    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park
    1. Re:Enough of the bullshit! by Syre · · Score: 4, Informative
      Hmm... if Opera doesn't send URLs to Google, why does it say on the page you linked (bold and italics mine):

      Opera's interaction with the Google ad system:
      • The Opera browser sends Google the URL of the web page you are
        visiting
        and your IP address (with the exceptions Opera filters
        out -- see below)
      • Google tries to determine your general geographic location based on your
        IP address, to better target the ads
      • The Google ad server consults Google's web database to find out what kind of content
        is on that page
      • Ads that are deemed most relevant are then served based on geographic location
        and the Web page accessed
  15. Re:Hardc0re hax0r. by nick0909 · · Score: 2, Informative

    Is googledorks a real hacker movement or just some random key word any one with a high ranking web page can abuse?

    It appears to be a buzzword that Johnny Long just kinda made up. I used Google to "hack" away and find his website: http://johnny.ihackstuff.com/
    It appears his definition of googledorking (?) is not just finding private info, but just anything wacky/weird/different, private is just one of those things.

    Do we now call it g00g|3?

  16. Re:Uh-huh. by Anonymous Coward · · Score: 5, Informative
    > Want to expand on that or are you just trolling? How did the
    > existance of that page get from Opera to Google such that it
    > could pin-point (not crawl) that page?

    Opera submits URLs browsed to by users, to google, when advert support is turned on.

    http://www.opera.com/adsupport/

    From that page:
    --------
    What is the connection between the Web page and the relevant ad displayed by Google?
    Opera's interaction with the Google ad system:

    The Opera browser sends Google the URL of the web page you are visiting and your IP address (with the exceptions Opera filters out -- see below)
    --------

    Exceptions are https, forms, passwords, cgi, and non-http URLs.

    As an example from my apache log file last night, when I gave a friend a URL to a photo:
    xxxxxxx.upc-g.chello.nl - - [10/Feb/2004:02:23:53 +1100] "GET /temporary/sooted.jpg HTTP/1.1" 200 74339 "-" "Opera/7.23 (X11; Linux i686; U) [en-GB]"
    crawler8.googlebot.com - - [10/Feb/2004:02:28:39 +1100] "GET /temporary/sooted.jpg HTTP/1.0" 200 74339 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
    It's surprising how many Opera users will deny this happens, despite the evidence. That's a 5 minute delay, google is pretty quick with its crawling. Personally, I don't mind. I put things up in my temporary directory and pull them down fairly soon after. I know nothing is secure if it's just an unprotected URL, so I'm not worried like the grandparent poster. However, Opera does send URLs to google, and google does come back and check them out.
  17. Re:Now to use it for good by taped2thedesk · · Score: 2, Informative
    Your a credit card holder..... Now go google your credit cards... DO IT NOW. Did you find it? I didn't.
    Oh sure, it's all fun and games until your credit card number gets displayed on the Live Query screen at Google HQ... :-p
  18. finding out whether something has leaked about you by ajagci · · Score: 2, Informative

    You can find out whether personal information about you is available accidentally by searching for your name and a piece of your sensitive information on Google, say, your name and the last four digits of your SSN, the last four digits of a credit card number, parts of your phone number, or your street address. Leaked personal information would have to contain both your name and that other information. Chances are that you will retrieve only a few documents, which you can quickly review.

    Keep in mind, however, that Google queries are not encrypted and are not guaranteed to be private or secure, so, for your search, don't use the full SSN or anything else that shouldn't be disclosed.

  19. It's quite clear if you actually read properly by Chuck+Chunder · · Score: 2, Informative
    I said Opera doesn't "send such urls" to Google. Specifically the post I was replying to talked about pages that are the result of form submissions. The page I linked to states Opera does not send:
    • URLs with CGI arguments (E.g: http://www.example.com?formsdata)
    • Forms data in POST requests
    (as well as a few others).
    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park
  20. Re:Fuck that shit by finkployd · · Score: 3, Informative

    Not if the robots.txt file prevents you from accessing that data, which it does.

    No, it does not. It provides absolutely NO access control what so ever. It simply tells the a search engine crawler "please do not catalogue these pages".

    Finkployd

  21. Military Records by prestidigital · · Score: 2, Informative

    Just tonight I was Googling for "number personnel U.S. military" and I was surprised to find many links along the lines of "How to find U.S. military personnel." The site with the most links to directories has a Netherlands domain name, which seemed odd. I tried to find some family members and did turn up some information. Some sites were DoD and had recognizable warnings about monitoring. Another was a .com for the military community and required standard registration procedures. I don't know if it's a good idea to have this information online and I wonder what military folks think about it. I reckon there are pros & cons.

  22. Some clues for you by Chuck+Chunder · · Score: 3, Informative

    a) Mediapartners-google does check robots.txt
    b) Opera always has the name "Opera" in it's UA string, even when masquerading as IE.
    c) Mediapartners-google doesn't feed the Google search engine. It is only used for Google adverts.

    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park
  23. Re:Could happen to you by Norman+the+Wise · · Score: 2, Informative

    Google does retain information on search queries in some form. If you go and check the Google Zeitgeist (Weekly Version & the Annual Version) they have statistics on most searched terms, time graphs showing, for example the spike in search queries after the California Quake, and lots of other interesting information.

    For the week ending February 2, the top search terms in the US were:

    1. janet jackson
    2. superbowl halftime
    3. mtv
    4. justin timberlake
    5. tom brady
    6. groundhog day
    7. cbs
    8. oscar nominations
    9. kazuhito tadano
    10. john kerry
    --
    Just another two cents from the Norm...
  24. Re:Plagiarism by dedazo · · Score: 1, Informative

    The MSNBC article fully credits the WP. What's your problem?

    --
    Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
  25. Re:There's good stuff out there not on Google by almightyjustin · · Score: 2, Informative
    This might have something to do with it...

    User-agent: *
    Disallow: /Archives
    Disallow: /Archives/bin
    Disallow: /Archives/dev
    Disallow: /Archives/etc
    Disallow: /Archives/ftp
    Disallow: /Archives/gopher
    Disallow: /Archives/tmp
    Disallow: /Archives/usr
    Disallow: /cgi-bin
    Disallow: /bin
    Disallow: /oursite/previews

    --

    Omnes arx vestrum sunt adiuncta nobis.

  26. Re:Fuck that shit by devilspgd · · Score: 2, Informative

    Just wildcard it. Use robots.txt to say that /secretstuff/* should not be indexed, that still won't help the l33t hax0r determine that it's /secretstuff/toodumbtouseapassword/bush-secret-nuk e-codes.lnk.exe.pif.scr which is the hidden file to destroy the world.

    --
    Give a man a fish, he'll eat for a day, but teach a man to phish...
  27. Re:Cited MSNNBC web page severely crippled by Tonttoro · · Score: 2, Informative

    Maybe you should try a later version of Mozilla. You know the older ones have bugs that are fixed in later ones.

    --
    when everyone gives everything, then everyone everything will get
  28. if they put it there themselves, yes, but... by tuxette · · Score: 2, Informative

    A lot of the personal data that is publicly accessible was not made publicly accessible by the data subject, but by a third person/party.

    --
    People say I'm crazy, I got diamonds on the soles of my shoes...
  29. Re:Fuck that shit by saforrest · · Score: 2, Informative

    More specifically, it says "Please do not enter my house and steal my jewelery and banknotes which are in the safe in the bottom-right of the bedroom closet."

    Sure, you could do

    Disallow: /house/closet/bottomright/safe/jewelry
    Disallow: /house/closet/bottomright/safe/banknotes

    Or, if you want to be simpler, you could just do

    Disallow: /house/ :)