The Problem of Search Engines and "Sekrit" Data
Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.
that you can use "file://[address]" to find pages and directories that are NOT linked to on a server (if the server allows it)?
The search engines use robots, and the robots read your site through links... So unless the file is in the root directory or has a direct link to the information. It should not show up.
So create a folder called "mystuff" and keep everything in it... and don't create a link to it, just remember it and type in the url.
http://www.my-site.com/mystuff
You'll then be sent to your secret folder that no one knows about, even the robots.
So I'm not sure what all the yelling is about. Just do that, or set up the robots.txt correctly, but most people don't realize they can do that....
www.slightlycrewed.com - Because aren't we all?
Then it's a pretty crappy secret. Plaintext passwords sent via GET are weaker than the 4 bit encryption in a DVD or something.
Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).If the page is really truly supposed to be secret, then it won't have external links, and you'll filter it out of your web logs too. Or you could just suck.
Google doesn't kill secrets. PHBs and MCSEs kill secrets.As my first reply to this immediately got modded to 0, I'll post it again. I'll type slower this time to make it easier to understand.
Are the moderators not understanding that my parent is just repeating the second sentence of the Slashdot article, only in a less focussed way? Go and read the second sentence of the Slashdot article. Now, how is my parent "insightful", "interesting", or "informative"? Try "redundant".
When I take the bother to read the Slashdot article, then go and actually read the referenced offsite article, I do not then want to find that the highest modded post is just parroting one of the very simple points that's already been covered. It demonstrates that both the poster and the moderators haven't even done us the courtesy of reading more than the first sentence of the Slashdot article, let alone the reference source. That's lazy and rude, and I'm going to keep shouting that at +1 until my parent gets modded down, or I drop to 26 karma. Then I'll shut up, and the lunatics can run the asylum in peace.
Mod me down (off topic, redundant, flamebait, honest), but do us all a favour and mod the parent down first please. Many thanks.
If you were blocking sigs, you wouldn't have to read this.