Online Search Engines Lift Cover Of Privacy
Rican writes "MSNBC has an interesting article about how 'Googledorks' are using the powerful search engine to do searches across the web for sensitive and/or private information. Some of this information includes 'Medical records, bank account numbers, students' grades, and the docking locations of 804 U.S. Navy ships, submarines and destroyers.'"
The worst example I saw was the FBI NCIC 2000 manual [PDF]. It gives you examples of how to look up criminal records and such... which could be very useful to the criminally vested social engineer.
People have used this for years to find things like Bill Gates' social security number and all kinds of things we think should be private. Chances are, if its in a record somewhere, that information will leak onto the internet sooner than most people think.
it was an AP story- I read the same thing in this morning's washington post.
Gentlemen, you can't fight in here! This is the War Room!
.htaccess anyone?
That, along with an appropriate robots.txt file should be all you would need to prevent a crawl, right?
Overrated / Underrated : Moderation
Here's how it works. Let's say you put a page on your site called
i s/ private_document.html
/
http://yoursite.com/temporary/hidden/dontreadth
And it is not linked to ever.
I realize this is redundant, and you were likely trolling, but Google will leave you right the fuck alone, so long as you put another little file at:
http://yoursite.com/robots.txt
That contains the text:
User-agent: *
Disallow:
I realize this is opt-out rather than opt-in, but there's just one place you have to opt, and there isn't another way that Google could possibly do their job. Everybody else seems to understand that the internet is a publicly accessible network.
So who's to blame? You. You put a sensitive document in a publicly accessible location on the internet, and took no precautions to keep it secure. Not linking to it is not a precaution.
There are no trails. There are no trees out here.
1) This is old. I remember searching for things like '"index +of" vti' and other such things (try it and modify that search if you like, but it was interesting to find out just what sort of interesting tidbits one might find in such a folder).
2) This is an article from MSN. This information was available long before Google, but it is, at the very least, curious to see this sort of article from Microsoft when they have been going to the press lately about how Microsoft intends to develop their own search technology...
Other examples are ".dbx", the file name extension for mail folders in Outlook Express. Or ".pwl", the Windows 9x system password file (supposedly easily crackable with the correct tool).
There are unfortunately clueless users who share their whole hard drive. File sharing programs have however started getting better in discouraging or preventing the users from doing this.
The thing is that most people will literally inadvertantly share their entire hard drive's contents, or at least all "media files".
What I like to do is go on gnutella or kazaa and search for "DSN" or one of a number of similar prefixes. Why? Because most digital cameras save their files in a specific hardwired format, and the kind of people who leave their entire hard drive shared on kazaa are the kind of people who don't rename their digital cameras.
You can find the most random, interesting, occationally personal shit that way.
I'm trying to remember the other common prefixes besides DSN and failing.
-- Super ugly ultraman
The google mediapartners bot which will look at pages for the purposes of advertising such as in Opera is different and seperate from the bot that adds pages to Google's search database. The mediapartners bot does not feed the Google search engine.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
Opera doesn't even send such urls to Google.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
> existance of that page get from Opera to Google such that it
> could pin-point (not crawl) that page?
Opera submits URLs browsed to by users, to google, when advert support is turned on.
http://www.opera.com/adsupport/
From that page:
--------
What is the connection between the Web page and the relevant ad displayed by Google?
Opera's interaction with the Google ad system:
The Opera browser sends Google the URL of the web page you are visiting and your IP address (with the exceptions Opera filters out -- see below)
--------
Exceptions are https, forms, passwords, cgi, and non-http URLs.
As an example from my apache log file last night, when I gave a friend a URL to a photo:It's surprising how many Opera users will deny this happens, despite the evidence. That's a 5 minute delay, google is pretty quick with its crawling. Personally, I don't mind. I put things up in my temporary directory and pull them down fairly soon after. I know nothing is secure if it's just an unprotected URL, so I'm not worried like the grandparent poster. However, Opera does send URLs to google, and google does come back and check them out.
Not if the robots.txt file prevents you from accessing that data, which it does.
No, it does not. It provides absolutely NO access control what so ever. It simply tells the a search engine crawler "please do not catalogue these pages".
Finkployd
a) Mediapartners-google does check robots.txt
b) Opera always has the name "Opera" in it's UA string, even when masquerading as IE.
c) Mediapartners-google doesn't feed the Google search engine. It is only used for Google adverts.
Boffoonery - downloadable Comedy Benefit for Bletchley Park