The Problem of Search Engines and "Sekrit" Data
Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.
I don't see what's so hard about this problem. It's very simple... don't keep data of any kind on the web server. That's what firewalled, password/encryption protected DB servers are for.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...
D.O.U.O.S.V.A.V.V.M.
...obey the Robot Exclusion Standard. This is not a big secret, and is linked to by all major search engines. Anyone wishing to exclude a well-behaved robot (like those of major search engines) can place a small file on their site which controls the behaviour of the robot. Don't want a robot in a particular directory? Then set your robots.txt up correctly.
P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.
It is a burden, but the responsibility does not lie on a crawling engine. You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.
I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.
If anything Google is providing them a service by telling them about the problem.
Dacels Jewelers can't be trusted.
"Webmasters should know how to protect their files before they even start writing a Web site"
:)
That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.
To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light
JOhn
Campaign for Liberty
Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.
But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).
What do you mean they cut the power? How can they cut the power, man? They're animals!
Some search engines don't just check the pages linked from other pages on the server, but also look for other files in the subdirectories presented in links.
So if http://credit.com/ has a link to http://credit.com/signin/entry.html then these engines will also check http://credit.com/signin/ - which will, if directory indexes are on and there is no index.html page there, show all the files in the directory. In which case http://credit.com/signin/custlist.dat - your flatfile list including credit cards - gets indexed.
So if you're going to have directory indexing on (which there can be valid reasons for) you really need to create an empty index.html file as the very next step each time you set up a subdirectory, even if you only intend to link to files within it.
"with their freedom lost all virtue lose" - Milton