Online Search Engines Lift Cover Of Privacy

← Back to Stories (view on slashdot.org)

Online Search Engines Lift Cover Of Privacy

Posted by timothy on Monday February 9, 2004 @02:18PM from the bathwater-around-the-baby dept.

Rican writes "MSNBC has an interesting article about how 'Googledorks' are using the powerful search engine to do searches across the web for sensitive and/or private information. Some of this information includes 'Medical records, bank account numbers, students' grades, and the docking locations of 804 U.S. Navy ships, submarines and destroyers.'"

14 of 460 comments (clear)

Min score:

Reason:

Sort:

The worst example.. by centralizati0n · 2004-02-09 14:22 · Score: 5, Informative

The worst example I saw was the FBI NCIC 2000 manual [PDF]. It gives you examples of how to look up criminal records and such... which could be very useful to the criminally vested social engineer.
Nothing new by dattaway · 2004-02-09 14:25 · Score: 3, Informative

People have used this for years to find things like Bill Gates' social security number and all kinds of things we think should be private. Chances are, if its in a record somewhere, that information will leak onto the internet sooner than most people think.
Re:FUD Story to pump MSN Search? by npistentis · 2004-02-09 14:30 · Score: 3, Informative

it was an AP story- I read the same thing in this morning's washington post.

--
Gentlemen, you can't fight in here! This is the War Room!
Re:Um. by mhesseltine · 2004-02-09 14:41 · Score: 4, Informative

.htaccess anyone?

That, along with an appropriate robots.txt file should be all you would need to prevent a crawl, right?

--
Overrated / Underrated : Moderation :: Anonymous Coward : Posting
Re:Um. by Elwood+P+Dowd · 2004-02-09 14:42 · Score: 4, Informative

Here's how it works. Let's say you put a page on your site called

http://yoursite.com/temporary/hidden/dontreadthi s/ private_document.html

And it is not linked to ever.

I realize this is redundant, and you were likely trolling, but Google will leave you right the fuck alone, so long as you put another little file at:

http://yoursite.com/robots.txt

That contains the text:

User-agent: *
Disallow: /

I realize this is opt-out rather than opt-in, but there's just one place you have to opt, and there isn't another way that Google could possibly do their job. Everybody else seems to understand that the internet is a publicly accessible network.

So who's to blame? You. You put a sensitive document in a publicly accessible location on the internet, and took no precautions to keep it secure. Not linking to it is not a precaution.

--

There are no trails. There are no trees out here.
Re:Why Google? by Xenographic · 2004-02-09 14:44 · Score: 4, Informative

1) This is old. I remember searching for things like '"index +of" vti' and other such things (try it and modify that search if you like, but it was interesting to find out just what sort of interesting tidbits one might find in such a folder).

2) This is an article from MSN. This information was available long before Google, but it is, at the very least, curious to see this sort of article from Microsoft when they have been going to the press lately about how Microsoft intends to develop their own search technology...
Re:Kazaa and Gnutella are cooler by tsvk · 2004-02-09 14:45 · Score: 5, Informative

Go into kazaa and gnutella and search for any .doc files. Or some likely sounding names like "resume" or "job application".

Other examples are ".dbx", the file name extension for mail folders in Outlook Express. Or ".pwl", the Windows 9x system password file (supposedly easily crackable with the correct tool).

There are unfortunately clueless users who share their whole hard drive. File sharing programs have however started getting better in discouraging or preventing the users from doing this.
What I like by Anonymous Coward · 2004-02-09 14:46 · Score: 5, Informative

The thing is that most people will literally inadvertantly share their entire hard drive's contents, or at least all "media files".

What I like to do is go on gnutella or kazaa and search for "DSN" or one of a number of similar prefixes. Why? Because most digital cameras save their files in a specific hardwired format, and the kind of people who leave their entire hard drive shared on kazaa are the kind of people who don't rename their digital cameras.

You can find the most random, interesting, occationally personal shit that way.

I'm trying to remember the other common prefixes besides DSN and failing.

-- Super ugly ultraman
Get a clue by Chuck+Chunder · 2004-02-09 15:02 · Score: 4, Informative

The google mediapartners bot which will look at pages for the purposes of advertising such as in Opera is different and seperate from the bot that adds pages to Google's search database. The mediapartners bot does not feed the Google search engine.

--
Boffoonery - downloadable Comedy Benefit for Bletchley Park
Enough of the bullshit! by Chuck+Chunder · 2004-02-09 15:14 · Score: 3, Informative

Opera doesn't even send such urls to Google.

--
Boffoonery - downloadable Comedy Benefit for Bletchley Park
1. Re:Enough of the bullshit! by Syre · 2004-02-09 16:19 · Score: 4, Informative
  Hmm... if Opera doesn't send URLs to Google, why does it say on the page you linked (bold and italics mine):
  
  Opera's interaction with the Google ad system:
  
  The Opera browser sends Google the URL of the web page you are
  visiting and your IP address (with the exceptions Opera filters
  out -- see below)
  
  Google tries to determine your general geographic location based on your
  IP address, to better target the ads
  
  The Google ad server consults Google's web database to find out what kind of content
  is on that page
  
  Ads that are deemed most relevant are then served based on geographic location
  and the Web page accessed
Re:Uh-huh. by Anonymous Coward · 2004-02-09 15:52 · Score: 5, Informative

> Want to expand on that or are you just trolling? How did the
> existance of that page get from Opera to Google such that it
> could pin-point (not crawl) that page?

Opera submits URLs browsed to by users, to google, when advert support is turned on.

http://www.opera.com/adsupport/

From that page:
--------
What is the connection between the Web page and the relevant ad displayed by Google?
Opera's interaction with the Google ad system:

The Opera browser sends Google the URL of the web page you are visiting and your IP address (with the exceptions Opera filters out -- see below)
--------

Exceptions are https, forms, passwords, cgi, and non-http URLs.

As an example from my apache log file last night, when I gave a friend a URL to a photo:
xxxxxxx.upc-g.chello.nl - - [10/Feb/2004:02:23:53 +1100] "GET /temporary/sooted.jpg HTTP/1.1" 200 74339 "-" "Opera/7.23 (X11; Linux i686; U) [en-GB]" crawler8.googlebot.com - - [10/Feb/2004:02:28:39 +1100] "GET /temporary/sooted.jpg HTTP/1.0" 200 74339 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
It's surprising how many Opera users will deny this happens, despite the evidence. That's a 5 minute delay, google is pretty quick with its crawling. Personally, I don't mind. I put things up in my temporary directory and pull them down fairly soon after. I know nothing is secure if it's just an unprotected URL, so I'm not worried like the grandparent poster. However, Opera does send URLs to google, and google does come back and check them out.
Re:Fuck that shit by finkployd · 2004-02-09 17:34 · Score: 3, Informative

Not if the robots.txt file prevents you from accessing that data, which it does.

No, it does not. It provides absolutely NO access control what so ever. It simply tells the a search engine crawler "please do not catalogue these pages".

Finkployd
Some clues for you by Chuck+Chunder · 2004-02-09 17:40 · Score: 3, Informative

a) Mediapartners-google does check robots.txt
b) Opera always has the name "Opera" in it's UA string, even when masquerading as IE.
c) Mediapartners-google doesn't feed the Google search engine. It is only used for Google adverts.

--
Boffoonery - downloadable Comedy Benefit for Bletchley Park