The Problem of Search Engines and "Sekrit" Data
Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.
/me goes to search for "credit card"
/me buys an x-box with stuff he found by reading slashdot
the gods of irony salute!
Just in time for Christmas Shopping!!!!
All the toys and none of the debt!
Just gotta remember to buy a P.O. box first, not give my home address like the last ti....uhhhh...never mind.
Have you read the moderator guidelines? Well, have you, PUNK? (and I want a Karma: Gnarly option)
I don't see what's so hard about this problem. It's very simple... don't keep data of any kind on the web server. That's what firewalled, password/encryption protected DB servers are for.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
... information wants to be free. Right?
Given this premise, the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.
That is to say, the web-indexing bots used by search engines cannot find anything that an ordinary, very patient human could not find by randomly following links.
I do not deploy Linux. Ever.
"...search engines are finding password and credit card numbers while doing its indexing."
This is very serious. Could you please post the exact search engines are query strings so I can make sure my information isn't there?
Knunov
Why do users with IDs under 100,000 or over 700,000 usually have the most worthwhile comments?
How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?
If I want to find lyrics to a song, the site that has them will often be down, but the cache will still have them in there.. Why is what google is doing 'okay' but what the origional site not okay? Or do they just leave google alone?
Brant
Argle. Bargle.
how can someone be so blatantly stupid as to store anything other than their web content, never mind credit card details, in their published folders? how? they redirected my documents to c:\inetpub\wwwroot\%username%\...???
update comments set karma=-1, reason='offtopic' where sid=26315
Google does nothing more than a regular Web user. It simply follows links, and indexes the content in its database.
What's wrong with this?
Nothing. Human stupidity.
Alexis 'jeriqo' BRET
The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...
D.O.U.O.S.V.A.V.V.M.
...obey the Robot Exclusion Standard. This is not a big secret, and is linked to by all major search engines. Anyone wishing to exclude a well-behaved robot (like those of major search engines) can place a small file on their site which controls the behaviour of the robot. Don't want a robot in a particular directory? Then set your robots.txt up correctly.
P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.
Why should Google or any other search engine do anything to save fools from their stupidity? Putting credit card numbers online where anyone can get them is just plain idiotic. Hopefully this will get a lot of publicity along with the names of companies who do stupid things like this and most people will shape up their act.
Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.
The truth about Scientology, Xenu, and you: Operation Clambake
% cd /var/www /
% cat > robots.txt
User-agent: *
Disallow:
^D
%
Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
From the article :
"Webmasters should know how to protect their files before they even start writing a Web site," wrote James Reno, chief executive of Amelia, Ohio-based ByteHosting Internet Services. "Standard Apache Password Protection handles most of the search engine problems--search engines can't crack it. Pretty much all that it does is use standard HTTP/1.0 Basic Authentication and checks the username based on the password stored in a MySQL Database."
And chief executives of a hosting company should know how Basic Authentication works before hosting web sites...
Crewd
Lowering the barrier to entry to web publishing has had a few benefits. Families can share photographs and news in a cheap, efficient manner. Novices can publish information for the benefit of their employees or others easily. However, problems like this do arise quite often, and at their source one can see that the widespread ability of people to publish documents to the web does not coexist well with existing security systems and models.
At any other time in the past few years, this would not ordinarily be a societal problem. Sure, a few peoples' passwords and credit card numbers will leak out. Hopefully they would have to pay for the charges to punish them for their own stupidity. (After all, as a customer of several banks, I don't want my rates to go up because somebody posted his account numbers for the entire world to see.) But now, this is a national security problem, because we are being attacked by a foreign force who might abuse leaked passwords to access critical systems and cause chaos in this country. President Bush and his staff are very concerned about a cyberwar, because it can be waged without physically having Arabs in the States to commit the terrorism. That is very dangerous indeed.
I'm not sure what the solution is, but a good first step is for companies to raise the barrier to entry to publishing web pages. Geocities and Angelfire should force users to demonstrate their competence before uploading their first page. Perhaps requiring an A+ certification number would help? And Microsoft should take away the parts of FrontPage that allow users to generate documents without writing in HTML. That would help ease the problem, I reckon.
In conclusion - if everybody does their part to help solve this problem and stop information leakage, we will be a safer, more secure society without giving up any more civil liberties.
~wally
Hmmm....Microsoft's .Net possibly helping the problem?
Microsoft said it was safe.....
Vote early. Vote often. Vote CowboyNeal.
Please change the title of this article to:
The Problem Incompetent System Administrators
If data is 'sekrit'/sensitive/confidential - don't put it on the web. It's as simple as that. If that data is available on the web, search engines can't be blamed for finding it.
-----------------------
Moderator's essentials
I'm a web developer, and I don't know how many times I've heard people who are just getting into the scene talking about making 'hidden' pages. I'm reffering to those that are only accessible to those who click on a very tiny area of an image map, or perhaps find that 'secret' link at the bottom of the page. Visually, these elements seem 'hidden' to a user who doesn't really understand web pages and source code. However, these 'hidden' pages look like giant 'Click Here' buttons to search engines, which is what I'm presuming some of this indexing is finding.
The search engines cannot feasibly stop this from happening, each occurance is unique unto itself. The only prevention tool is knowledge and education, and bringing to the masses a general understanding of search engine spidering theory.
Just my 2 cents.
To make a pun demonstrates the highest understanding of a language
These people who store credit card numbers on the web server are the same people who don't patch for IIS worms until all hell breaks loose.
Good for them.
I recently joined an angel organisation to publicise my business in an attempt to raise funds. The information provided to the organisation is supposed to be secret, and only available to members of the organisation via a paper newsletter which was reproduced in the secure area of the organisations website.
/secure directory.
/secure WAS!
A couple of months down the line a couple of search engines, when asked about 'mycompanyname' were giving the newsletter entry in the top 5.
Alongside my details were those of several other companies. Essentially laying out the essence of the respective business plans.
How did this happen? The site was put together with FP2000, and the 'secure' area was simply those files in the
I had no cause to view the website prior to this. The site has been fixed on my advice. How did this come about? No one in the organisation knew what security meant. They were told that
It didn't do any damage to myself, but a few of the other companies could have suffered if their plans were found. Its not googles job to do anything about this, its the webmasters. But a word of warning - before you agree for your info to appear on a website ask about the security measures. They mey well be crap!
Brilliant, huh? ;-)
On second thought, maybe I shouldn't post this... some PHB might actually think it's a good idea.
Maybe we need to demand "approved" server-side implementation of credit-card webservers, besides SSL. How could this be verified? I don't have a clue.
Here's a really easy solution - the bank has these crazy things called "bills." Go to the bank and get some. Then go to the store and use aforementioned "bills." Voila - hax0rs go bye-bye.
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
The fact that this guy claims the responsibility lies with google for not allowing this type of search is just plain crazy. If you are publishing critical information on a site that is not at least secure and preferaby encrypted you are just asking for trouble. It should not be google's responsibility in any way shape or form to not find this information. If the content providers wish it they can put the robot file out but that is not fixing anything merely sidesteping one super easy hack. They still need to have a decent or at least SOME security design.
Oh well.
I am 31337 or something.
% cat > /var/www/html/robots.txt /
User-agent: *
Disallow:
^D
%
Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
"As the article outlines, this has been a problem for a long time -- and with no easy solution in sight."
How about using basic mySQL passwords?
Sounds pretty simple to me
go to Google, type in "site:yourdomain.com xxxx-xxxx-xxxx-xxxx" where the x's are the credit card number of a known customer. If you get hits, your security is less than ideal.
Unfortunately, website security is not as simple as locking a door... but keeping your customer data out of the webserver's document root would be a good start.
Such problems have existed for quite a while. Hackings, Crackings, internet sniffing etc.
...
The real issue is not if you can.. but if you actually do use the information. Regardless of if it is available or not, it IS ILLEGAL. (Carding does give rather long prison times as well)
People had the chance to steel from other people for as long as mankind existed. This is just another form... perhaps a bit simpler though
Probable impossibilities are to be preferred to improbable possibilities.
Aristotele
People often wonder how their "secret" sites get into web indices. Here's a scenario that's not too obvious but is quite common:
i st rator
Suppose I have a secret page, like:
http://mysite.com/cgi-bin/secret?password=admin
Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).
Now suppose elsewhere.com runs analog on their web logs, and posts them in a publically-accessible location. Suppose elsewhere.com's analog setup also reports the contents of the "referer" header.
Now suppose the web logs are indexed (because of this same problem, or because the logs are just linked to from their web page somewhere). Google has the link to your secret information, even though you never explicitly linked to it anywhere.
One solution is to use proper HTTP access control (as crappy as it is), or to use POST instead of GET to supply credentials (POST doesn't transfer into a URL that might be passed as a referrer). You could also use robots.txt to deny indexing of your secret stuff, though others could still find it through web logs.
Of course, I don't think credit card info should *ever* be accessible via HTTP, even if it is password protected!
You should be writing that type of data on the backs of envelopes and leaving them scattered around your living room...
WHere have you put your license to speak yoour mind on slashdot? Surely, people cant go around putting anything they want to say into a public forum. They might say anything. A a matter of fact, we must revoke peoples phone privliges until lthey can proove theyre smart enough not to give out credit card numbers to telemarketers. As a matter of fact, lets just legislate intelegence. We can tack it on as a rider for that bill to make Pi = 3.
Youre a nitwit. Im revoking your speech licnese on slashdot.
All Troll + "offtopic" mods are meta moderated as "Unfair", because you abused the system.
Your crawler is caching credit card numbers you say? Simple, check the content you cache for 16 digit numbers. Any that you find, you check with a simple LUHN (mod 10) algorithm. If it passes, you replace the number with "################" or a similar masking.
There, all credit card numbers will now filtered from your cache.
I understand the severity of the issue, and it's good to know this is happening, but the solution is simple.
I run a website that pulls a lot of content from other servers. We use to have a newsfeed via ITN's RDF feed - until I got a call from their Director of New Media asking me to take it off. Seems they charge a hefty fee for such a feed - around £30,000 - but hadn't taken any attempts to protect it with a .htaccess file or something. How did I find it? By searching Google!
[know how Basic Authentication works before hosting web sites]
... and know that it's a wholly inadequate way of "protecting" credit card numbers!
The other day I was using google to explore the files of an annoying spammers site [referralware.com]. Simply searching for a few numbers with the query site:.referralware.com brought up search results in their unprotected source.referralware.com directory that included all the credit card logs for the past week. And I am just an average computer joe user ... this is a problem if I can be a "hacker" with less knowledge than a script kiddie!
________________
All my sig are fjdklafjkldafjkldafdaklf
From my web logs, I see that a lot of HTTP bots don't care crap about /robots.txt. Another thing which happens is that they read robots.txt only once and cache it forever in the lifetime of accessing that site, and do not use a newer robots.txt when it's available. It'd be useful to update what a bot knows of a site's /robots.txt from time to time.
HTTP bot writers should adhere to using information in /robots.txt and restricting their access accordingly. In a lot of occasions, webmasters may setup /robots.txt to actually help stop bots from feeding on junk information which they don't require.. or things which change regularly and need not be recorded.
Banu
I do not know if this is still the case, but Microsoft's IE offline browsing page crawler (collects pages for you to read offline) ignored robots.txt last time I checked. I know many other crawlers do likewise.
I could be a rich man...
(Not, of course that I'd ever do anything like that...)
Searching with regular expressions would be cool, though...
Most of tihs is coming from leaving directory listing turned on. Generally, this should only be used on an HTTP front-ends to FTP boxes, and for development machines. IIS has "directory browsing" turned off by default. Maybe Apache has it turned on by default? You'd be surprised to see how many public webservers have this on, making it exceedingly likely that search engines will find files they weren't meant to find. The situation arises when there's no "default" page (usually index.html or default.html, default.asp, etc.) in a directory and only a file like content.html in a directory. IF a SE tries http://domain.com/directory/, it'll get the directory listing, which it can, in turn, continue to spider.
INetPub means "INetPublic" not "INetPubrobably a great place to put my credit card numbers".
Why are stupid people not to blame for anything anymore?
Let's not stir that bag of worms...
So, where might one find an 'evil' robot that looks specifically in places robots.txt tells it not to? hypothetically speaking, of course...
A while back there was a thread here about the weakness of the revenue model for search engines. Maybe we have found the answer, think about all the revenue that Google could generate with this data!
Anybody knows when Google is going public?
search for: password admin filetype:doc
c s/ Setup_Procedures_Release_1.0e.doc
My first hit is:
www.nomi.navy.mil/TriMEP/TriMEPUserGuide/WordDo
at the bottom of the html:
UserName: TURBO and PassWord: turbo, will give you unlimited user access (passwords are case sensitive).
Username: ADMIN and PassWord: admin, will give you password and system access (passwords are case sensitive).
It is recommend that the user go to Tools, System Defaults first and change the Facility UIC to Your facility UIC.
oh dear, am I now a terrorist?
"Webmasters should know how to protect their files before they even start writing a Web site"
:)
That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.
To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light
JOhn
Campaign for Liberty
that you can use "file://[address]" to find pages and directories that are NOT linked to on a server (if the server allows it)?
The search engines use robots, and the robots read your site through links... So unless the file is in the root directory or has a direct link to the information. It should not show up.
So create a folder called "mystuff" and keep everything in it... and don't create a link to it, just remember it and type in the url.
http://www.my-site.com/mystuff
You'll then be sent to your secret folder that no one knows about, even the robots.
So I'm not sure what all the yelling is about. Just do that, or set up the robots.txt correctly, but most people don't realize they can do that....
www.slightlycrewed.com - Because aren't we all?
Allow me to disagree. This fellow apparantly agrees with Microsoft that people shouldn't publish code exploits and weaknesses. Sorry, but anyone who had secret information available to the external web is in the exact same boat as someone who has an unpatched IIS server or is running SQL Server without a password.
Let's assume that Google had (a) somehow figured out from day one that people would search for passwords, credit card numbers, etc, and (b) figured out some way to recognize such data to help keep it secret. Should they have publisized this fact or kept it a secret? Publicity would just mean that every script kiddie would be out writting their own search engines, looking for the things that Google et al were avoiding. Secrecy would mean that a very few black hats would write their own search engines, and the victims of such searches would have no idea how their secrets were being compromised.
But this assumes that there's someway of accomplishing item (B), which I claim is very difficult indeed. In fact, it would be harder to accomplish than natural language recognition. Think about it... Secrets are frequently obscure, to the point that to a computer they look like noise. Most credit cards, for example, use 16 digit numbers. Should Google not index any page containing a string of 16 consecutive digits? How about pages that contain SQL code? How would one not index those, but still index the on-line tutorials at MySQL, Oracle, etc?
The only "solution" is to recognize that this problem belongs in the lap of the web site's owner, and the search engine(s) have no fundamental responsibilty.
Nothing for 6-digit uids?
Am I the only one scared by this? The problem is googles, simply because they follow links? I find it hard to believe this stuff sometimes!
<rant>When will people learn that criminals don't behave? That is what makes them criminals!</rant>
As our second year uni project we were required to write a web index bot. Guess what? It didn't "behave". It would search through a robots.txt roadblock. It would find whatever their was there to find. This stuff is so far from being rocket science it is ridiculous!
Sure, using Google might ease a tiny fraction of the bad guys work, but if Google wasn't there, the bad guys tools would be. In fact, they still are there.
Saying that you have to write your client software to work around server/administrator flaws is like putting a "do not enter" sign on a tent. Sure, it will stop some people, but the others will just come in anyway, probably even more so just to find out what you are hiding.
It will stop the casual perusal of your data.
The way to stop the determined snooper is to not keep your data in a directory that can be accessed by your web server.
At any rate--scary it is.
I'm a nature photographer.
Well, terrorism can easily be waged without having Arabs in the States, even without resorting to cyberwar. As Oklahoma City has shown, it's enough to have Rednecks in the States. Kudos though for disguising your racist drivel well enough to get modded up to 2.
From the article:
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
Search and replace "Google" with "Microsoft". The lack of security is in the operating system and the applications which launch the malicious files without warning the user. Google just tell you where to get 'em, not what to do with 'em.
If it ain't broke, it doesn't have enough features yet.
Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.
But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).
Maybe we do need some kind of accreditation. Any idiot can claim to be a security expert in the computer field. Can any convicted burglar claim to be a locksmith?
What do you mean they cut the power? How can they cut the power, man? They're animals!
Visit their website today !!!
this guy's just looking for free hype for his book. if that's the kind of advice he offers, he's doing more harm than good.
This is the voice of World Control. I bring you Peace.
all major credit card numbers follow the same patterns - it would be very easy for google (or any other bot manufacturer), acting benevolently, to write code which recognizes these patterns, excludes them from results and possibly even emails the site admin (automatically? if their META properties are set correctly) to notify them of the security problem.
www.pixelectric.com
"The underlying issue is that the infrastructure of all these Web sites aren't protected."
.asp and VBS in webpages.
Agreed. Such lax security via the use of Frontpage, IIS,
You might as well do and impression of Duncan in the movie Shrek "Ooo! Ooo! pick me! pick me!"
Webmasters queried about the search engine problem said precautions against overzealous search bots are of fundamental concern.
Uhh...they are "bots"...they don't think, they do.
Does the bot say "Oh, look, these guys did something stupid...let's tell them about it."
No, they search, they index and they generate reports.
I've seen this problem crop up before when a coworker was looking for something totally unrelated on google.
Sad part was it was an ISP I had respect for, despite moving from them to broadband.
What killed my respect was at the very top of the pages was "Generated by Frontpage Express"...gack!
I don't recall if it was a user account or one of their admin accounts...but for modem access I kind of stopped recommending them, or pointed out my observations.
I have to parrot, and agree, with the "Human Error" but add "Computer accelerated and amplified".
It happens, but that does not mean we have to like it, much less let it keep happening.
Have you read the moderator guidelines? Well, have you, PUNK? (and I want a Karma: Gnarly option)
Manufacturers Googmat today came under fire for their new transparent doormats.
An industry source who would not be named said "A number of our most important keys were compromised by this new Googmat feature. I realise they wanted to give exciting new options to their customers, but they should really give more careful thought to security before releasing a product like this."
I for one, would like to see a complete list of those sites that are listing passwords, credit card numbers and other similar information. (By listing I am refering to those sites/admins that allow such information to be easily obtained). If I know this, then I will NEVER do business with them, or at least until they can prove that they realize their error (which includes the error of policy and mentality for ever letting that happen in the first place) and fix it. After all, this is yet another example of what kind of foolishness can happen when we fail to apply critical thought to our actions. Yet again, this is an example of an error in thinking that the web (and the services performed on it) are 'new ideas' when they are in fact just newer implementations.
Question to those who know the legal side of things... Hypothetically. If I find that a restaurant is leaving all of their receipts (with complete numbers, addresses, names and expiration dates) out in the open, perhaps in a special bin marked 'receipts and ledgers' that is left near the bar and is easily viewed by anyone who walks by there... and if in this hypothetical situation, I find that mine and others' credit card numbers are then used illegally for purchases, what would be the legal situation here? Would I be able to sue the company to recover the cost? (and ONLY the cost, I don't care about any 'litigation as an income' scum out there) Also, shouldn't credit card companies and the banks that provide them be responsive to such situations and police any irresponsible companies that do let such sensitive information loose?
Personally, I am getting rid of all credit cards (and debit cards as well) because of this irresponsibility. However, when making large purchases of computer equipment (well, anything really) it is nice to have the extra leveraging power of the credit card company behind you if the vendor tries any funny stuff. Please contribute with what you think... especially the legal issues
I seek not only to follow in the footsteps of the men of old, I seek the things they sought.
Well, Robots have a more sinister usage. The purpose is to keep 'bots' out, but it also tells ME where to snoop more. Cool.
I have yet to see any evidence that a "cyberwar" is imminent or even possible. Realistically - how many critical systems are connected to the Internet? Sure, a determined enemy might be able to take out Amazon or Yahoo, but who cares? Most Internet businesses aren't making much money anyway, so who cares if bad security puts the final nail in their coffin?
And think about other systems too. Is the phone network on the Internet? One wouldn't think so, because there's no benefit in adding the extra layer of complexity. How about the power grid? Or water supplies? There is literally zero business need to make any of these systems Internet accessible, so why would it happen? The answer is that it wouldn't, but our leaders just want an excuse to stay hysterical and keep their ratings high.
-sting3r
Try doing a search of the file WinsockFTP leaves(WS_FTP.LOG?). You'll get hundreds of hundreds of results, and you just might find unlinked files mentioned in it.
/images/ directory(since virtual directories are on by default) on any angelfire user pages. Often you'll find images that the user didn't intend on the public to see.
Of course, there's always good fun going into the
Of cousre, there's the old fashioned way. If you see an image at http://www.whatever.com/boobie3.jpg, chances are there's a boobie1 and boobie2.jpg.
The moron in the IBM ads vs. the moron in the DELL ads? The IBM moron in the 3rd round.
are you kidding?
they are talking about sensitive personal information - just don't store this online.
if you really need to access something (that isn't a credit card number... just don't do that!) and don't have physical access to the box, try SSH or at least make sure it's a secure directory (httpS://blah/mystuff...)
A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
If Google can find it, then a human with a web browser can find it. That's all there is to it. Have info you don't want to share? Then don't share it!
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
If you somehow manage to post your credit card info on the web, exactly whose fault is it? The only way it *can't* be your fault is if it's a poorly-constructed e-commerce site that leaks out that kind of info.
I just don't see what the big deal here is.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
I thought a long time ago that things like this would have been cured by meta tags, such as "robot=nofollow" and "robot=noindex", along with deleting your CGI pages once they've been understood.
Google along with most search engines will take heed of these and and not index the page, or follow any of it's links are required - unless all these webmasters are lazy gits using MS Word and generating all that crap.
I remember when men were men and used EMACS, and girls were girls and said ``how do I get ms-windows on this'' when using a Solaris...
-JCC
Can they tell us which ones so we won't shop there? That ought to fix that problem right on up.
"But other critics said Google bears its share of the blame."
Why?!
Google is finding documents that any web browser could find. The fault belongs to the idiots who publicly posted sensitive documents in the first place. Why doesn't the article mention this anywhere? Garbage reporting if I've ever seen it.
is competition good, or is duplication of effort bad?
Yes! Good! Google finds credit card numbers which are publicly available to anyone who can find them! It should do that.
If someone's credit card number is accessible on the internet then search engines should be finding it, because the baddies already can. Security through obscurity doesn't work; if someones broadcasting credit card numbers over the internet, this will telling you who's doing it and what numbers are insecure. The next step is for MasterCard, Visa, et. al. to start searching google for their credit card numbers and contacting people whose numbers are compromised (oh year, and cancelling the accounts of people stupid enough to do that).
What are you talking about? Of course there's an easy solution in sight. Don't put your credit card number on the web! Don't give your credit card number to a website that does! Come on, how hard IS that? If it's sensitive information, it doesn't belong on a publicly-available website.
Al Qaeda has ninjas!
Is like blaming the Highway department for speeders...
Thanks to file sharing, I purchase more CDs
Thanks to the RIAA, I buy them used...
Not sure if those are all the completely correct addresses to use, but in the face of some blatant FUD, they'll probably do okay . . .
Al Qaeda has ninjas!
At my last startup, before we launched one of our services (pseuds.org), we controlled access via DNS by pointing "www.pseuds.org" to a placeholder page. Testing showed that it was indeed secure. Unfortunately, nobody noticed that the DNS entry for "pseuds.org" pointed to the unannounced site, which led it to appear on several search engines before long. Since we hadn't linked to it from anywhere public, I'm guessing that at least some of the search engines use domain registration info to find starting points for their robots.
The real irony of this was that my co-founder ran the InterNIC for a while and the employees responsible were former employees of Network Solutions. Of course, some of you will not be a bit surprised by that...
Gosh, this is fun.
cLive ;-)
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
Many years ago on comp.risks somebody actually looked at the contents of a number of robot.txt files - he wondered if they could be used as a quick index into "interesting" files. At the time, erroneous use of the file was still pretty rare... but I'm sure that was a selection effect that is no longer valid.
Bottom line: that standard may be intended for one behavior (robots don't look in these directories), but there's absolutely nothing to prevent it from being used to support other behaviors (robots look in these directories first). If you don't want information indexed, don't put the content on your site. Or at a minimum, don't provide directory indexes and use non-obvious directory names.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Some search engines don't just check the pages linked from other pages on the server, but also look for other files in the subdirectories presented in links.
So if http://credit.com/ has a link to http://credit.com/signin/entry.html then these engines will also check http://credit.com/signin/ - which will, if directory indexes are on and there is no index.html page there, show all the files in the directory. In which case http://credit.com/signin/custlist.dat - your flatfile list including credit cards - gets indexed.
So if you're going to have directory indexing on (which there can be valid reasons for) you really need to create an empty index.html file as the very next step each time you set up a subdirectory, even if you only intend to link to files within it.
"with their freedom lost all virtue lose" - Milton
Second, if the sensitive information is going to a select few people, consider PGP encrypting the data, and only putting the encrypted version online. Doing this makes many of the HTTP security issues less critical.
Assuming you still have to put something sensitive online, make sure of the following:
- Only use HTTPS, never use just plain HTTP.
- Use CGI, Java Servlets, or some other server-side program technology to password-protect the site. I will refer to the resulting program(s) as the security program
- Never accept a password from a GET request, only accept them from POST requests.
- Never make the user list or password list visible from the internet, not even an encrypted password list.
- Never place the sensitive information in a directory the web server software knows how to access. Only the security program should know how to find the info.
- Review all documentation for your web server software and the platform used for the security program. Pay special attention to seciurity issues, make sure you aren't inadvertently opening up holes. Keep current, do this at minimum four times a year.
- Subscribe to any security mailing lists for your web server platform operating system web server software, and for the programing platform you used for the security program. If there is anything else running on this machine, subscribe to their security mailing lists too.
- Subscribe to cert-advisory and BugTraq. Read in detail all the messages that are relevant to your setup. Review your setup after each relevant message.
- Don't use IIS.
- Don't use Windows 95/98/Me. Don't use Windows XP Home Edition.
- Don't use any version of MacOS before OS X.
- Don't use website hosting services for sensitive information.
- Never connect to this webserver using telnet, ftp or FrontPage. SSH is your friend.
- Never have Front Page Extensions (or its clones or workalikes) installed on a webserver with sensitive data.
- If there is anything above that you don't understand, or if you can't afford the time for any of the above, hire a professional with security experience and recommendations from people you trust who have used his or her services. It's bad enough that amateurs are running webservers, much less running ecommerce sites and other sites with sensitive data.
The above is an incomplete list. It is primarly there to start giving people an idea of how much effort they should expect to put into a properly administered secure website with sensitive information. Do you really need to distribute this via a web browser?----
Open mind, insert foot.
"There is no limit to what can be accomplished if you don't mind who gets the credit." --Ronald Reagan
The new issue of "2600" all but gives a kiddie
script for extracting credit card numbers from
the Passport database. Scary. Dont buy anything
through it until they fix it.
I'm going to start offering a free service. Just send me your credit card number, and I'll make sure it's not being used maliciously.
Don't worry about my expenses. I'll cover them somehow. After all, the net is full of "Good deals".
Let's not stir that bag of worms...
I haven't looked into how the new crawlers are working. I assume that they still follow links from page to page, but are there new types of crawlers that could be searching the directory sturctures of a site? Not that this excusses the webmasters, but it might explain some of the new search results.
THIS SPACE FOR RENT
Google's comment was:
"The primary burden falls to the people who are incorrectly exposing this information."
This is where they should have stopped. Those who find their credit card information in a search engine will learn a lesson and use services that actually take care of their customers' security and privacy. Google shouldn't have to clean up incompetent people's mess.
In the long run, these things can only lead to the ignorant (wannabe?) players in the market slowly dying because they don't know what they are doing.
I personally hope someone gets a taste of reality here, and that only the serious players survive. The MCSE crowd may finally learn that there's more to it than blind trust in their own (lacking) ability.
Clever signature text goes here.
You have to remember, how it SHOULD be done and how it IS done are two completly different things. Larger sites such as Amazon and Barnes n Noble may have elaberate systems set up, but for the average small time e-commerce site securtiy is normally fairly lax. I have worked for companies that put un-encrypted credit card numbers in the database and rely on database security to keep hackers out. Granted the machine may be behind a firewall to block netbios/trojans/etc but when you open the ports to do database administration remotly, you're just asking for trouble. None of the companies have had any problems that I am aware of, but it's a timebomb waiting to happen.
Check out this link...A :w ww-cgsc.army.mil/nrs/CGSOC/course_material_sy00/S5 10/lsnguide/lg6.doc+%22area+51%22+filetype:doc+sit e:.mil&hl=en
http://www.google.com/search?q=cache:KpZEOi1W8r
and read question 23! The proof is there! Bwahaha.
"The guys at Google thought, 'How cool that we can offer this to our users,' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief." - Gary McGraw (quoted in the CNet article).
;)
*blinks*
Well, actually, Gary, it seems to me that it isn't Google that's been caused any grief here, but, those wembasters who didn't "think about security from the beginning." In fact, it looks like Google runs a pretty tight ship.
This is the kind of guy who blames incidents.org for his web server getting hacked. After all, they weren't thinking about security from the beginning, were they?
Riight.
BRx
Life after capitalism? The participatory economics project
Yes you are, Mr. Bin Laden wants to arrange a meeting with you tomorrow
This is still a MAJOR screwup on the part of the admins and/or the coders!
So what if it's accidental, does that make the CC#'s any less real?
On some servers, if you make a query for http://<server>/<path>/?M=A or http://<server>/<path>/?S=D you will get a directory listing instead of the default page. This is a result of FancyIndexing in Apache and can be disabled through methods detailed in the Bugtraq discussion.
The originator of the discussion pointed out that his log files had get requests from Google that specifically looked for these directory listings, so it's pretty clear they were (are?) doing it intentionally.
I agree with all of your assertions, except
"Don't use IIS."
This just isn't an option for a lot of people. I would change this to:
"If you use IIS, you need to make sure you check BugTraq/cert EVERY day."
I would also add:
"If you use IIS with COM components via ASP, make sure the DLL's are not in a publicly accessible directory."
This happens a lot, and makes DLL's lots easier to break.
Let's not stir that bag of worms...
/conf/merchant_conf
Then get hold of the cybercash API and have fun ;-)
eg
I'll never forget the day I first saw a .pdf in Google search result. Not that long ago I saw my first .ps.gz in a search result. I mean, how dope is that!? They're ungzipping the file, and then parsing the postscript! Soon they'll start uniso-ing images, untarring files, unrpming packages, .... You'll be able to search for text and have it found inside the README in an rpm in a Red Hat ISO.
Can't wait until images.google.com starts doing OCR on the pix they index...
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security..."
Interesting that this is being pushed off onto Google. I think a more appropriate phrase would be "The guys at though, 'How cool that this website is so easy to set up' without thinking about security...."
RFC2119
> Or you could just suck.
Look, I was just explaining how it happens. I point out ways to avoid this in a later paragraph. Let's direct that anger towards something more productive, ok?
Could someone tell me what the hell a PHB is. I thought I used to think I was smart.
this sig is deprecated
Alright, so I admit, I was a little curious about how dumb people are with their passwords so I tried the search. It's simply amazing how careless people are with their security...
Here was a simple document found using the exact search method listed above that is just the Minutes from some board meeting. In it, they actually LISTED a website to log into as well as the password required to get in! Right in the minutes! The website is no longer available, so I'll actually post the text from the minutes...
Minutes of the Gulliver Meeting at Carlton Library 17.8.01
...
7. Assessment of database products
B_ spoke briefly about the online tool that is an outcome of work done at Monash University for Libraries online, he will make the URL available so that evaluation of the usefulness of the tool may commence.
The tool is at http://130.194.38.42
Password admin
Talk about careless. Even if you're positive that the minutes document won't be posted on the web, you certainly don't go and actually write it onto something that will be distributed to the public! A hard copy (aka paper) of access to a server is just as dangerous as it being stored online.
The problem is that people don't realize that it's not save to distribute private information through ANY public medium.
I dropped a note on his comments
"We have a problem, and that is that people don't design software to behave itself.. etc.."
Me(typoes and all)- You honestly believe that a crawler that finds a private page is responsible fro exposing private info?
Seriously? Cmon, Under 0 circumstances should my CC information be available to anyone visiting a website, if it is, the owner of that site should be criminally liable.
The response -
Hi Sean,
I agree. I actually made that point too, but the reporter chose to focus on other things I said.
gem
PHB means "Pointy-Haired Boss". Popularized by Dilbert.
--
Runnin' around, robbin' banks all whacked on the Scooby Snacks...
I had a client who some reason got their 404 page the highest rank on google. So if typed their company name into the search engine the 404 page was number one so you could imagine the problems that caused. So I put their error pages in the robots.txr and everything worked out.
Hollow words will burn and hollow men will burn.
Dont use any version of MacOS before MacOSX? MacOS was pretty secure, more so than OSX rather difficult to crack MacOS remotely considering that there is no command line, and it comes with no services installed (let alone enabled).
Liberty in your lifetime
A few months ago I was looking up the email address of a friend. I knew that he occasionally posted messages with his email address, so I figured I'd give google a shot. After plugging in his email, I found a link to a text file of names, phone numbers, addresses, ages, and comments for a local Miami radio station. I sent an email to the web administrator there -- he was rude enough to suggest that I'd found the page using illicit means. So lets blame google rather than some incompetent web administrator.
"Push the plunger and meet Allah" only works on the Arabs.
What about those Irish terrorists, assmouth? Don't they blow themselves up for jesus?
Alright, before you go flaming me for thinking of this, posting this or whatever, just realize that I belong to the school of thought that says, "If you can think of something malicious, someone else can think of it too, best to be prepared"
What would be the option to detect and protect yourself from a virus that did this:
1) Gets on your webserver through whatever means.
2) Grabs your password files (either *nix or Win*) and creates a text file with a strange, multi-character and fairly unique name that stores that data, then adds a robots.txt file in your root to allow searches for that file.
3) Now the attacker can do a search on google for that file name and likely come up with few matches that are NOT what s/he is loking for.
*shrug* Just a thought
Xeph
>>>"You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain. "
Especially since credit card numbers are 16 digits long...
4+4+4+4 = 16
filetype:htpasswd htpasswd
Scary how many
-- Azaroth
I mean like when you put data onto a publicly accessible webserver.... well, that's exactly what your doing. Duh!
I just accidently faxed everything in my wallet to everyone in Michigan. Please, everyone turn off your fax machines!
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
Seems to me that credit card companies need to start suing websites which jeopardize the integrity of credit card numbers-- that until online merchants feel a real monetary incentive to get their security right, there will be no incentive at all. Courts, credit card companies, and insurance companies are the ones who really need to step up to the plate here. I firmly believe that a big part of the failure of online commerce has to do with the negative press which these compromises create...
And "Interesting" posts should know what they're saying, but one rarely gets everything one wants.
The point: the poster is implying that there's some mismatch between looking the password up in a mysql database and doing HTTP/1.0 Basic Authentication. There isn't - the phrase "HTTP/1.0 Basic Authentication" refers to how the password is sent over the wire. The server can look up the password by carrier pidgeon for all that that matters.
It's true that the standard Apache password mechanisms look things up in flat files and not a mysql database, but that's not what the poster said.
it is quite clear, even here on slashdot. Just take a look at how many people are posting messages as "Anonymous Coward". That poor guy must have left his login somewhere.
ahrm.
If something is sensitive, it shouldn't be publicly available to everyone through HTTP. Search engines aren't causing security problems, they're exposing them.
max
I'm surprised that Jules Verne and H.G. Wells have not been mentioned. They are probably the two most influential early science fiction writers.
;)
H.G....
Island of Doctor Moreau: Predicts genetic engineering.
War of the Worlds: Aliens, flying saucers and the like...not exactly a prediction, but it does cover some modern "interests," a la X-Files and, uh, Battlefield Earth (?).
Time Machine: Again, not a prediction but a current concern of many modern minds such as Stephen Hawking and popular culture like Timecop, Back to the Future and Quantum Leap.
The World Set Free: Predicted the nuclear bomb and the resulting arms race and stalemate.
Jules...
20,000 Leagues: Deep-sea submersibles!
Around the World in Eighty Days: Rapid transmit..hehe.
From the Earth to the Moon: Space travel.
Paris in the Twentieth Century: Never read it, but I heard some of the predictions are quite accurate.
> You should be writing that type of data on the backs of envelopes and leaving them scattered around your living room...
Not much worse than some "commercial-grade" encryption...
Maybe somebody should consider suing Google under the DMCA. I haven't studied the DMCA with enough detail to be sure of this (and much less studied law, for that matter), but i guess Google is easily guilty of the following "crimes" against modern society:
- linking to decryption algorithms
- linking to reverse enginnering tools
- linking to passwords that could be used to circumvent somebody's copyright.
- storing and distributing all the above (with google's cache)
As I understand current legislation, Google should not even have the right to define what is public or not like they're trying to do. Even the safe-harbour provisions do not immunize them from having to remove unlawful content.
Such a lawsuit would make for an interesting debate, and with a bit of luck could get us all rid of this stupid law.
C.
C.
Anyone can generate a credit card number very easily in 2 or 3 minutes, starting with random numbers and putting it through the LUHN scheme used to verify numbers. So you can easily come up with a working credit card number - but it's the matching name that's important.
Hell, you could shluck any old (LUHN-passing) CC number into a site, with a random name from the telephone book and it would verify nine times out of ten...
We just need a tag that tells the search engine not to include the page in its output.
QUESTION 23: What national-level intelligence assets are available to you, the warfighter?
ANSWER: Area 51 -- Maintains flying saucers and keeps alien bodies in the freezer.
Okay, how did you do that?
(If you read Slashdot enough, sooner or later you see everything.)
Bush's education improvements were
The easiest way to fix this...and it works for nearly every webserver... is to put a dummy index.htm file in every directory. It doesn't have to say anything important at all. It just has to prevent the server from displaying a directory listing
Slashdot, the site where everything's made up and the points don't matter
If you throw a cat out the window of a car, does it become kitty litter?
Bush's education improvements were
One of the ways the web is really different from printed media is that pages have an implication of currentness and ownership, even when they say otherwise. Unlike a newspaper publisher that A) cannot possibly recall all issues of a screwed up edition and B) no longer owns the physical copies of the paper that have been sold to the public, webmasters can always take down a page, and in fact must continue to provide the page in order for it to be available. That proactive providing of the page might to some people imply that archivers of any sort are responsible for the content.
I have a story similar to that. We were setting up a polling site for our organization and had slapped a quick HTTP login/password auth on the results. The only reason they had that was to keep people from playing with the input to see what happened on the output stage. The idea was to pull the passwords when the input stage was done and let everyone view the results.
Then they had a board meeting, and one of the directors gave a presentation using our results. A reporter from the local paper was paying attention, and sure enough, the next day in the paper:
The results are at (URL), which requires a login and password, both of which are "blah".
When I saw that the next morning during breakfast, I came close to spewing my cereal across the room.
The moral of the story is: temporary things never intended for public viewing frequently leak when random twits are involved.
Download the entire document from the U.S. military web site: lg6.doc
Third bullet under question 28: "If you throw a cat out the window of a car, does it become kitty litter?"
Hey, military commanders, don't be mis-treating cats!!!
How U.S. government policy contributed to terrorism: What should be the Response to Violence?
Bush's education improvements were
i found a page with transaction data from some small web merchant. it utilized the unfathomably secure method of Black Text on Black Background. something any Neo who's surfing Source Code could pick up.
From the article: Recent Internet worms such as Code Red and Nimda prove that massive, automated hacking exploits have no need of search engines to find vulnerable computers.
:)
Internet worms? "Microsoft worms", rather. But of course Cnet can't say that in an article, although it can be demonstrated that the Internet is not required for those worms to propagate (i.e. a local network with tcp/ip is enough). Ok, arguably.
Your pouint #2 is plain wrong. It is almost always best to rely on the server's mechanisms for authentiaction. Whatever half-arsed pathetic lame attmept you implement yourself on the server-side for validation will result in problems. Either that or your talents are wasted doing web sites....
Wait, you missed another funny one:
"If you throw a cat out the window of a car, does it become kitty litter?"
Most [legitamit] web servers have an option to disable directory listing. Works wonders.
but then what does this do for anyone?
I found on various search engines links to pages I didnt even know I had. I guess it doesnt take an index.html
Though all the junk traffic has stopped thanks to port 80 blocking.. dumbassian..
-- 'The' Lord and Master Bitman On High, Master Of All
I found this link by searching Google for "index of
http://www.centurionsoft.com/password/
It's a small page that asks for your name and email before downloading the trial version of their software. Clicking "pass_down2.html" using the above link bypasses this requirement. While not a huge security risk, it does show laziness.
What does the company sell? Security software.
I still cant believe how many companyies have admins that cant turn off directory listing, one of my friends found out the bios passwords for the websurfer because the directory listing was not enabled, he got into thier dev servers and all over thier webservers, because of this oversite.
So people making passwords and credit card information available isnt a surprise, considering how many "mcse's" there are out there...
its to the point now that i dont even talk about being certified in public beacuse i dont want anyone to think im a n00b and wanna brag. damn i need a better job, oh well my rant is over.
moo.
/ME would be better off searching for a known-good credit card number.
If you find it, it might lead you to many other credit card numbers - but first cancel the one that you found, and sue the company exposing it. Ask them if they have the box their computer came in. (-: And maybe post the URL to CERT as a vulnerability
Got time? Spend some of it coding or testing
Because 28 days after you took your page offline it will disappear from the Google cache.
Google reindexes web pages, and if they 404 on the next visit, then good bye pork pie! You have to get them while they are hot, eg, when a site has JUST been Slashdotted.
Perhaps it would be a good idea after reading this article to examine publicfile.
It was written by a very security conscious programmer who realises that your private files can easily get out onto the web. That is why publicfile has no concept of content protection (eg, Deny from evilh4x0r.com or .htaccess) and will only serve up files that are publically readable.
From the features page:
A good healthy does of paranoia would do people good.
There's always good stuff at http://abcnews.go.com/robots.txt and http://www.cnn.com/robots.txt. I check that every so often just because I get such a kick out of viewing their logs and such.
-Waldo Jaquith
I have found web server based authentication systems limited, weak, and hard to integrate into authorization systems (to determine whether the given user is allowed to access the given information). Granted, my experience is limited to Apache and Netscape Enterprise Server.
In addition, the client-based authentication scheme that is triggered by web server based authentication doesn't allow for logging out in a manner that is consistant across browsers. Having the ability to log out is critically important for a good security system.
Your mileage may vary.
----
Open mind, insert foot.
Like probably everyone else reading this story, I did a quick little search and sure enough found a small data file of cards, names, adresses almost immediately.
/. which i guess is encouraging..
Emailed them to let them know of the problem, so they can clean it up.. They responded back saying they'd got a flood of emails from people who read
However, since presumably i wasnt the first to notify them, and i was still able to find the data.. it got me wondering.. During this cleanup, is there a process/procedure to notify google (and other engines that cache?) that you wish them to flush/expire/remove your offending document from their cache. Pulling the page simply isnt enough, it's still there in the cache (I assume it expires, but definitely not instantly, which in a situation like this is what you want)
Here is a better page that includes starting digits for the different cards, and includes the checking algo used to verify cards.
QED
The customer does not get hurt by stolen CC#, it is the merchants that get charged by the CC companies. If a customer reports their card stolen as soon as they realize their is a problem then by law, at max they face a loss of $50. Quite often it is free.
The only cost to consumers is the hassle of getting a new CC#, and the downtime while you don't have a CC#.
How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?
Huh, ? Maby because it isn't illegal at all ?
Think about it, where does it say it's illegal ?
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
No, seriously, do it !
Print it out and hand it on the wall, then put a post-it note on top of it saying : "The best example of 'blaiming the messenger' ever !!!"
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
if you trained a bunch of monkeys to recognize credit card numbers, sat them down and let them click away for hours on end, who knows what they would find?
then again, if we could train those same monkeys to fill out the stack of credit card applications I get in the mail all the time...
at least search engines don't have to deal with popups...
Thanks for the link. Now I know what I want for Christmas.
Thanks again, a kinky AC.
This reminds me of a site I once visited. In that site I tried to search for somehting and the first result of the search was the admin page for the site. Just to have some fun I clicked on the link, and to my surprise the admin page came up withought asking any passwords and allowed me to do lots of intresting site admin things.
Not to my knowledge. When I asked Google whether such a facility exists, they said no -- but they did point me to Google Zeitgeist, which gives "Search patterns, trends, and surprises according to Google". Usually published once a week and showing e.g. the top 10 gaining and losing queries of that week. So you get some interesting info, but it's not realtime by any description of the word.
Esli epei etot cumprenan, shris soa Sfaha.
I ran across this when trying to get a cgi-bin directory listing. Unavailable Tripod Directory Tripod does not allow the automatic listing of directory or subdirectory contents. who'da thought Tripod would do something right..
There's no "I" in Linux.. err..