The Problem of Search Engines and "Sekrit" Data

teehee by Clay+Mitchell · 2001-11-26 04:45 · Score: 0, Offtopic

/me goes to search for "credit card"

/me buys an x-box with stuff he found by reading slashdot

the gods of irony salute!

YES! YES! by A_Non_Moose · 2001-11-26 04:47 · Score: 0, Offtopic

Just in time for Christmas Shopping!!!!

All the toys and none of the debt!

Just gotta remember to buy a P.O. box first, not give my home address like the last ti....uhhhh...never mind.

--
Have you read the moderator guidelines? Well, have you, PUNK? (and I want a Karma: Gnarly option)

Re:YES! YES! by Anonymous Coward · 2001-11-26 05:47 · Score: 0

Uh, can you actually buy
a POBox without giving your real
identity?
Re:YES! YES! by sam@caveman.org · 2001-11-26 05:54 · Score: 1, Offtopic

all you need is a valid form of ID.

or alternatively, an ID which appears to be valid.

-sam

--
burn the computers. go back to the abacus.
Re:YES! YES! by A_Non_Moose · 2001-11-26 07:03 · Score: 1

Sheesh, it was a joke...and ontopic, no less.

Oh, well, whatever.

Seriously, this is how some of these criminals operate...data in the form of cc#'s is captured over the wire, or from insecure IIS boxes or where ever VISA/MC/Discover is accepted and re-used during the busy season, where, it may not be discovered or disputed.

Funny? Yes, if you made the connection.
Offtopic? Yes, if you did not make the connection.

IMO, most humor is made up of "questionable ethics/decisions"...probably the #1 reason federal, state a local govt. are constant sources of amusement.

It is getting close to the "silly season", and I am forced to wonder: I know there is a moderator FAQ, but is there a moderator "aptitude test".
There have been several posts I've seen in the past year or so, that I have bought "hook, line and sinker" as a troll or flamebait...until the "tone of voice" sets in as sarcastic, the it is the funniest stuff I've read.

--
Have you read the moderator guidelines? Well, have you, PUNK? (and I want a Karma: Gnarly option)

A symptom of poor programming... by Bonker · 2001-11-26 04:48 · Score: 4, Insightful

I don't see what's so hard about this problem. It's very simple... don't keep data of any kind on the web server. That's what firewalled, password/encryption protected DB servers are for.

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!

Re:A symptom of poor programming... by hiroko · 2001-11-26 04:56 · Score: 1

don't keep data of any kind on the web server
Erm... what about my html data? ;)

--
Just because you can't, doesn't mean you shouldn't.
Re:A symptom of poor programming... by ChazeFroy · 2001-11-26 04:58 · Score: 5, Interesting

Try the following searches on google (include the quotes) and you'll be amazed at what's out there:

"Index of /admin"
"Index of /password"
"Index of /mail"
"Index of /" +passwd
"Index of /" password.txt
Re:A symptom of poor programming... by Brainless · 2001-11-26 05:04 · Score: 4, Funny

I manage a Cold Fusion web server that we allow clients to post their own websites to. Recently, their programmer accidentally made a link to the admin section. Google found that link and proceeded into the admin secion and indexed all the "delete item" links as well. I found it quite amusing when they asked to see a copy of the logs complaining the website was hacked and I discovered GoogleBot deleted every single database entry for them.
Re:A symptom of poor programming... by Bonker · 2001-11-26 05:06 · Score: 1

From a site indexed by google:

PASSWORD PROTECTION is one way to guard your stack against unauthorized access.

Unlike locking your stack which prevents others from making changes, this surprisingly simple script won't allow anyone to view your stack without the password. For you Ursula K. Le Guin fans, the password for this stack is "Antwerp".

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 05:08 · Score: 1, Insightful

What ignorance to security. Security is a problem that cannot be solved with technology alone. If you think encryption and/or firewalls will prevent this sort of issue, you totally misunderstand the purpose/capabilities of these tools. In this case, privacy is better protected through people (education) and process (security policy). If I write bad code that exposes credit card numbers (irregardless of whether I store data on the web server, use encryption, and use firewalls), the numbers will still be disclosed.
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 05:10 · Score: 0

http://congress.nw.dc.us/lwv/custom/password.txt

Hmmm. Interesting.
Re:A symptom of poor programming... by ChazeFroy · 2001-11-26 05:15 · Score: 2, Informative

Something I forgot to mention in my other post:

The October 2001 issue of IEEE Computer has some articles on security, and the first article in the issue is titled "Search Engines as Security Threat" by Hernandez, Sierra, Ribagorda, Ramos.

Here's a link to it.
Re:A symptom of poor programming... by -douggy · 2001-11-26 05:18 · Score: 1

http://www.mit.edu/afs/sipb/system/config/passwd/i 386_nbsd1/master.passwd

MIT fun
Re:A symptom of poor programming... by ichimunki · 2001-11-26 05:19 · Score: 5, Informative

A big part of why this is a problem is the fact that many web servers are, by default, set up to display file listings for directories if there is no "index.html" file in the directory and the user requests a URL corresponding to that directory.

Personally I like to make sure that there is an .htaccess file that prevents this (on Apache-- I'm sure IIS and others have similar config options). I like to turn off the directory listing capability if possible, and certainly assign a valid default page, even if index.html is not present.

And don't forget "index of /cgi-bin" for some real fun. ;)

--
I do not have a signature
Re:A symptom of poor programming... by Legion303 · 2001-11-26 05:29 · Score: 5, Interesting

Please give credit where credit is due. Vincent Gaillot posted this list to Bugtraq on November 16.
-Legion
Re:A symptom of poor programming... by cavemanf16 · 2001-11-26 05:29 · Score: 1

Haha! This totally sucks. If you go to the very first entry on Google returned on the search for "Index of /admin" you'll be able to find a document entitled "biddings.doc" that is some sort of agenda on meeting to discuss various ways of securing the network for that website. Funny considering they're entire /admin folder is available for full usage by the internet community. ;)
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 05:39 · Score: 0

Yeah I just saw that one. MIT! M I fucking T! The AI lab hackers would be turning over in their cots!
Re:A symptom of poor programming... by gazbo · 2001-11-26 05:40 · Score: 1

Well, on the one hand it gives us usernames.

Fortuantely, their unix security is better than their http security - they're using shadow passwords, so this is not as bad as it seems.
Re:A symptom of poor programming... by Bonker · 2001-11-26 05:42 · Score: 2

Funnily enough, IIS defaults to hide directory contents where Apache doesn't. The option to display directory contents can be turned on easily enough, but an administrator does actually have to make the concious decision to do so.

This is a good reason not to let developers have administrator access to any boxen they are developing on.

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Re:A symptom of poor programming... by ddyer-bennet · 2001-11-26 05:49 · Score: 1

The issue isn't whether the data is stored on the web server; it's whether the data is accessible through the web server. If you intend to make the data accessible for some users or some purposes, you need to implement your own security to prevent it being accessed by ALL users for ALL purposes.
Re:A symptom of poor programming... by greenrd · 2001-11-26 05:51 · Score: 3

Do you realise that the web developer who made the admin section accessible via a GET request, without any additional authentication, is the biggest moron here, not the client? You shouldn't rely on people not knowing where your wide-open doors are - lock them!

--
Female Prison Rape in NY
Re:A symptom of poor programming... by cyclist1200 · 2001-11-26 06:00 · Score: 1

Uhh...you keep cc numbers in your html? May I have the address? ;)
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 06:01 · Score: 0

except for root *cough*
Re:A symptom of poor programming... by gazbo · 2001-11-26 06:02 · Score: 1

My mistake, I misread the * as x. I guess they are bloody idiots after all.
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 06:14 · Score: 0

http://congress.nw.dc.us/lwv/secure/password.txt

looks interesting
Re:A symptom of poor programming... by raindr · 2001-11-26 06:54 · Score: 1

Yep, something else! a quick glance and 1st box running iis.......D

--
Things Are The Way They Are
Re:A symptom of poor programming... by subsolar2 · 2001-11-26 06:57 · Score: 2, Interesting

I've seen this myself searching for information on linksys routers about a year ago. I found somebody with a page that listed the password for their linksys router along with other systems and information. I e-mailed the guy who seemed very supprised that the information was available there and thanked me for letting him know. The information was gone when I checked again.
It's a silly mistake, I don't have a clue as to how google came accross the link. Like with anything new it's going to take some time before this becomes "common sense" and people do not put this information on public servers.
- subsolar
P.S. It's possible to generate a url that when clicked by somebody behind a linksys router to enable remote administration if you know the password. I've turned it in to linksys but gotten nothing but silence from them.
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 07:33 · Score: 0

It's possible to generate a url that when clicked by somebody behind a linksys router to enable remote administration if you know the password. I've turned it in to linksys but gotten nothing but silence from them.

Isn't this what Bugtraq is for?
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 07:55 · Score: 1, Funny

First thing I do when bored and surfing (porn sites) is whenever I see a new link with a non-standard index page (anything other than index.html) I chop it off and see if I can get a listing. Since most porn sites run on Apache, and Apache by default does not disable directory listings, and since most porn sites are designed on Windows (can you say index.htm?), and since the default Apache index page is index.html, this leads to a great deal of free fun.
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 08:57 · Score: 1, Interesting

I don't recommend looking at the results of these searches. Recent laws define viewing private files as terrorism, and you might end up in jail. The only question is: Whose definition of "private" is being used.
Re:A symptom of poor programming... by portnoy · 2001-11-26 09:17 · Score: 1

Well, I'm from the AI Lab. I suppose I'd be concerned if that were actually a password file from any box, either at SIPB or at MIT proper.

Just because a file is called master.password, doesn't necessarily mean that it's a way in, people.
Re:A symptom of poor programming... by Karellan · 2001-11-26 09:33 · Score: 1

Yes, yes, YES, Bonker. It is called the Wide World Web and if you do not want the Wide World to see it, put it somewhere else.

I hope that Google comes to it's senses and does not change a thing. They are doing a great, great job.
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 11:32 · Score: 0
And for youse lazy bastards:
Re:A symptom of poor programming... by chenwah · 2001-11-26 11:54 · Score: 1

mmm... lathe of heaven. Could somebody convince me to dream myself up a new TiBook G4?

.
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-26 12:21 · Score: 0

Ah! Now we know why it's not safe to leave the LAN side unsecured.
Re:A symptom of poor programming... by brad3378 · 2001-11-26 17:26 · Score: 1

Just for fun, I tried searching google for "Index of /password" and came accross this website:

http://www.centurionsoft.com/password/

I then followed this link:
http://www.centurionsoft.com/password/pass_down2 .h tml

Which allows me to download this

$20.00 Password manager software without paying!

Needless to say, I didn't download it!

ha ha ha!!

--
Re:A symptom of poor programming... by bluebomber · 2001-11-27 01:42 · Score: 1

Looks like the pr0n sites have figured this out:
'index of /password' at "expensive babes" and "britney video" are among the top 10...

--
The Daily Build
Re:A symptom of poor programming... by Anonymous Coward · 2001-11-27 05:16 · Score: 0

PeRhaps sOmeone Foolish will Run their cpu lOng enOugh To crack it.

But ... by Anonymous Coward · 2001-11-26 04:48 · Score: 0

... information wants to be free. Right?

How can this happen? by Nonesuch · 2001-11-26 04:48 · Score: 4, Redundant

To the best of my knowledge, search engines all work by indexing the web, starting with the base of web sites or submitted URLs, and following the links on each page.

Given this premise, the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.

That is to say, the web-indexing bots used by search engines cannot find anything that an ordinary, very patient human could not find by randomly following links.

--

I do not deploy Linux. Ever.

Re:How can this happen? by Garfunkel · 2001-11-26 04:59 · Score: 1

Index pages. Index pages often have the ../ parent link and that can get you to some places people tend not to think of as being accessible. IMHO, it's their own fault for putting that stuff somewhere even remotely close to being accessible. My guess is that many of them are run off of Micorosoft Personal Webservers or something that they may not even be sure they are running.

--
-jay
Re:How can this happen? by Anonymous Coward · 2001-11-26 05:04 · Score: 0

Pretty easy. Some websites (and especially hit counters) post referrers lists. These lists contain the pages visitors viewed before they came to the tracking site. They obviously might contain urls like login:password@some.url and if a search engine follows the links in the referrer list it will find secret information.

hdmx
Re:How can this happen? by Rogerborg · 2001-11-26 05:19 · Score: 0, Offtopic
- the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.
Do you always feel the need to recap the second god damn sentence in articles? And you get modded up? Has someone been handing out free beer with each bunch of moderator points today? ;-)
--
If you were blocking sigs, you wouldn't have to read this.
Re:How can this happen? by Aloekak · 2001-11-26 05:37 · Score: 1

Yes, web crawlers follow links in html webpages, but some do much more than that. New web crawlers know the difference between websites that have directory listings enabled and disabled. If it is disabled, then it has to follow the html/whatever files that it knows how to index. If it is enabled, then the website is free game to what ever is on the website.
Re:How can this happen? by Anonymous Coward · 2001-11-26 06:04 · Score: 0

don't you mean handing out free crack?
Re:How can this happen? by Rogerborg · 2001-11-26 06:12 · Score: 1, Troll
- the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.
As my first reply to this immediately got modded to 0, I'll post it again. I'll type slower this time to make it easier to understand.

Are the moderators not understanding that my parent is just repeating the second sentence of the Slashdot article, only in a less focussed way? Go and read the second sentence of the Slashdot article. Now, how is my parent "insightful", "interesting", or "informative"? Try "redundant".

When I take the bother to read the Slashdot article, then go and actually read the referenced offsite article, I do not then want to find that the highest modded post is just parroting one of the very simple points that's already been covered. It demonstrates that both the poster and the moderators haven't even done us the courtesy of reading more than the first sentence of the Slashdot article, let alone the reference source. That's lazy and rude, and I'm going to keep shouting that at +1 until my parent gets modded down, or I drop to 26 karma. Then I'll shut up, and the lunatics can run the asylum in peace.

Mod me down (off topic, redundant, flamebait, honest), but do us all a favour and mod the parent down first please. Many thanks.
--
If you were blocking sigs, you wouldn't have to read this.
Re:How can this happen? by gazbo · 2001-11-26 06:19 · Score: 1

And just what would somebody inadvertantly running Microsoft Personal Webserver be doing with a list of CC numbers in their WWW tree, hmm?
Re:How can this happen? by kilgore_47 · 2001-11-26 06:43 · Score: 2

Index pages often have the ../ parent link and that can get you to some places people tend not to think of as being accessible.

I sincerley hope that there aren't any widely used webservers that would actually let you request "../" and get something above the designated webspace. That is one of the most obvious exploits ever, and I think even microsoft is smarter than that now.

--
___
The way to see by faith is to shut the eye of reason. --Ben Franklin
Re:How can this happen? by GigsVT · 2001-11-26 07:35 · Score: 2

Actually directory traversal exploits happen all the time, but it's not likely you would put a hyperlink to exploit your own (or someone else's) site just on the web.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:How can this happen? by Nonesuch · 2001-11-26 07:41 · Score: 2

Did you not see the question mark at the end of the subject on the parent comment?
FYI, I did read both the Slashdot article and the referenced offsite article, and neither answers my question as to how Google (or any other web-crawler 'bot) finds 'secret' files that presumably are never linked to from a 'non-secret' page.
Other users here have offered constructive suggestions about how this can happen (apache bug, referer data exposed by analog, etc) , meanwhile you waste your time and karma composing rants about why my question is redundant.

--

I do not deploy Linux. Ever.
Re:How can this happen? by Anonymous Coward · 2001-11-26 08:05 · Score: 0

No numbnuts, what the parent poster means is that when you have a directory listing on http://www.xxx.com/dir1/dir2/, it may not be obviously apparent, to some people, that http://www.xxx.com/dir1/ has an actual link to it from that page, and that it would be naturally followed by a spider. Point is, sometimes interesting places are accessible by means you haven't considered.
Re:How can this happen? by Anonymous Coward · 2001-11-26 08:53 · Score: 0

Well, fuckface, it doesn't take a ../ link on xxx.com/dir1/dir2/ to view xxx.com/dir1/ and as such ../ links are pretty irrelevent to this discussion, right? Right.

Oh Yeah? by Knunov · 2001-11-26 04:49 · Score: 4, Funny

"...search engines are finding password and credit card numbers while doing its indexing."

This is very serious. Could you please post the exact search engines are query strings so I can make sure my information isn't there?

Knunov

--
Why do users with IDs under 100,000 or over 700,000 usually have the most worthwhile comments?

Re:Oh Yeah? by Karma+50 · 2001-11-26 04:51 · Score: 5, Funny

Just search for your credit card number.

By the way, does google have that realtime display of what people are searching for?

--
http://www.thehungersite.com
Re:Oh Yeah? by Anonymous Coward · 2001-11-26 04:59 · Score: 0

Could you please post the exact search engines are query strings so I can make sure my information isn't there?

hehe, yeah, we believe you ;-)
Re:Oh Yeah? by 4of12 · 2001-11-26 05:01 · Score: 2, Funny

Yeah!

I just typed in my credit card number and found 15 hits on web sites involving videos of hot young goats.

--
"Provided by the management for your protection."
Re:Oh Yeah? by NTSwerver · 2001-11-26 05:04 · Score: 1

Could searching for your own credit card number also be risky - ie: could it be intercepted?

--
-----------------------
Moderator's essentials
Re:Oh Yeah? by morcego · 2001-11-26 05:59 · Score: 2, Insightful

Yes, it could. Actualy, it very trivial to do.
I actualy tried to search for my credit card number, but only searched for 8 digits, in various forms (always the same digits, mind you), like:
"XXXX XXXX"
"XXXX-XXXX"
"XXXXXXXX"

Thanks god, nothing ...
This is something I sugest you people to do. I would sugest using the last 8 digits, onde the "last 4 digits" are commonly used, but you won't be exposing something that is probably already everywhere.

--
morcego
Re:Oh Yeah? by Hiro+Antagonist · 2001-11-26 08:24 · Score: 2

Your information is perfectly safe.

Oh, and by the way -- thanks for the christmas gift. I've always wanted a silver Ferrari.

--

--
I Hit the Karma Cap, and All I Got Was This Lousy .sig.
Re:Oh Yeah? by rkent · 2001-11-26 10:58 · Score: 2

I would sugest using the last 8 digits...

Or just search for the first 4, which identify the card manufacturer (eg Discover card is 6011), and so pose no risk to you at all. See if any of the results for "6011" have 12 more digits following...
Re:Oh Yeah? by rkent · 2001-11-26 11:01 · Score: 2

I realize this is a joke, but that works, and it's fuckin' scary! Of course I didn't search for my own credit card number, but I did search for the first 4 digits, which is just a card issuer's indentity string anyway. For example, (some) visa cards start with "4128."

So if you search google for "Visa 4128"... watch out. I'd estimate about a third of the results I got actually had whole visa numbers within. Scary.
Re:Oh Yeah? by Karma+50 · 2001-11-26 17:40 · Score: 1

I'd estimate a lot less than a third.

90% are phone numbers with 4128 in. Usually 800) 754-4128 which is the phone number for reporting lost visa cards.

In fact, I didn't spot a single number that could be considered revealed in the terms this story suggests.

--
http://www.thehungersite.com
Re:Oh Yeah? by Anonymous Coward · 2001-11-26 18:28 · Score: 0

After a google search for "credit 6011" I found this site listing card numbers vs card type.

Here are the results:
*CARD TYPES *PREFIX *WIDTH
'American Express 34, 37 15
'Diners Club 300 to 305, 36 14
'Carte Blanche 38 14
'Discover 6011 16
'EnRoute 2014, 2149 15
'JCB 3 16
'JCB 2131, 1800 15
'Master Card 51 to 55 16
'Visa 4 13, 16

So just do "enroute 2149" and see what you get.

Tangential Google Question by banuaba · 2001-11-26 04:50 · Score: 5, Interesting

How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?
If I want to find lyrics to a song, the site that has them will often be down, but the cache will still have them in there.. Why is what google is doing 'okay' but what the origional site not okay? Or do they just leave google alone?

--

Brant

Argle. Bargle.

Re:Tangential Google Question by SamBeckett · 2001-11-26 04:51 · Score: 1

Intent
Re:Tangential Google Question by CaseyB · 2001-11-26 04:59 · Score: 3, Interesting

Good question.
Given that they do have (for now) some sort of immunity, it opens a loophole for publishing illegal data. Simply set up your site with all of Metallica's lyrics / guitar scores (all 5 of them, heh). Submit it for indexing to Google, but don't otherwise attract attention to the site. When you see the spider hit, take it offline. Now the data is available to anyone who searches for it on Google, but you're not liable for anything. The process could be repeated to update the cache.
Re:Tangential Google Question by passion · 2001-11-26 04:59 · Score: 2

I doubt most prosecuting teams are savvy enough to think about google's cache.

--
- passion
Re:Tangential Google Question by Suidae · 2001-11-26 05:27 · Score: 2, Interesting

Don't bother taking it offline, just set up your web server so it only responds to the google indexing server. Cache stays up all the time, but no one else can (easily) see that you are serving it.
Re:Tangential Google Question by mikeboone · 2001-11-26 05:38 · Score: 1

Doesn't Google rank data based on how many other people link to it? If you only had Google pointing to it, wouldn't it be very low on a search list?
Re:Tangential Google Question by Xzzy · 2001-11-26 05:44 · Score: 3, Informative

> If you only had Google pointing to it, wouldn't
> it be very low on a search list?

If it's a very specific search term, Google will still return it in the list. If it's unique enough, it's very possible that it will even be the top ranked page. If you put a unique string of characters (like a password or something) on a page, and google indexed it, typing that "password" into the search engine will give you your page.

You can also type domain names into google to retrieve the cache page for that website, which would accomplish much the same thing as long as it's not geocities or something.
Re:Tangential Google Question by snake_dad · 2001-11-26 06:21 · Score: 3

Make that: were savvy enough.

--
karma capped .sig seeking available Slashdot poster for long-term relationship.
Re:Tangential Google Question by kilgore_47 · 2001-11-26 06:49 · Score: 2

You could always search for exact phrases from your site, save the resulting cache link, setup a forwarder (or frameset) on a free account somewhere to point to the google cache URL, and distribute the URL to your free website. Technically, the offending data will be served by google only.

Now if only google would start letting their spider index .mp3 files!

--
___
The way to see by faith is to shut the eye of reason. --Ben Franklin
Re:Tangential Google Question by stg · 2001-11-26 07:51 · Score: 1

Or even better, do a CGI that only returns the real page if the user-agent is GoogleBot.
Otherwise return a 404.

That way it will still remain cached on Google on the next refresh, but it still won't be acessible by normal users...
Re:Tangential Google Question by Cpyder · 2001-11-26 08:52 · Score: 1

Better filter based on the agent's ip (googlebot.com) rather than its user agent, since the latter is can easyly be faked (trough configuration, changing source code or using a simple intermediate proxy identifying itself as "Googlebot").
Re:Tangential Google Question by gol64738 · 2001-11-26 09:25 · Score: 1

i think search engines are immune to this. do a search on google for DeCSS, and you will find many hits with the source. how come google isn't slapped with cease/desist order for violating the DMCA?

you know, i don't think search engines are immune, i just think that no one has brought up this issue with any legal authority.....yet.
Re:Tangential Google Question by LinuxHam · 2001-11-26 09:47 · Score: 2, Interesting

Don't bother taking it offline, just set up your web server so it only responds to the google indexing server. Cache stays up all the time, but no one else can (easily) see that you are serving it.

Oooh.. that's a particularly good one.. kinda like getting high-bandwidth web service FOC, if you build your site URLs to ride along the google cache instead of your own... (gears cranking)..

--
Intelligent Life on Earth
Re:Tangential Google Question by randomgeek · 2001-11-26 16:40 · Score: 2, Interesting

Random thought, it'd be possible to use Google as a kind of morpheus/napster-like distributed service? Make a HTML "page" that looks something like:

FileName: MyFile
Size: FileSize

Encode in Base64, rot13 it, and then call it protected under DMCA, bonus points.

Of course, your web server would only accept connections from the google spiders, and you'd effectively have a free file distribution service. Not saying this would actually work, but I think there's a chance it'd work.
Re:Tangential Google Question by Anonymous Coward · 2001-11-26 18:47 · Score: 0

Use English words as your encoding scheme. For each number between 0 and 65535 have a unique english word. So that superficially, your warez looks to be plain text. Bonus points if you use the hipcrime text generator that seems to generate gramatically valid, but meaningles english.

The only problem I see, is how do you get google to visit your site. They have an algorithym that selects whether a site is linked to enough to warrent a googlebot visit. I guess one way, is have a huge website, all encoded warez mind you, and then google will googlebot it because you have enough self links.
Re:Tangential Google Question by Anonymous Coward · 2001-11-28 13:35 · Score: 0

For each number between 0 and 65535 have a unique english word

methinks downloading ISO's would be a painful process.

how the FUCK is this possible? by posmon · 2001-11-26 04:50 · Score: 2, Insightful

just because google is only picking them up now doesn't mean that they haven't been there for years!

how can someone be so blatantly stupid as to store anything other than their web content, never mind credit card details, in their published folders? how? they redirected my documents to c:\inetpub\wwwroot\%username%\...???

--

update comments set karma=-1, reason='offtopic' where sid=26315

Re:how the FUCK is this possible? by Karma+50 · 2001-11-26 04:56 · Score: 2, Insightful

Google has just added the ability to index PDFs, word docs etc. So, yes, the information was there before, but now it is much easier to find.

--
http://www.thehungersite.com
Re:how the FUCK is this possible? by Neon+Spiral+Injector · 2001-11-26 04:57 · Score: 2, Insightful

In published folders? How about on machines that are on the Internet at all.

In an ideal setup the machine storing credit card information wouldn't have a network card, or speak any networking protocal. You'd have a front end secure webserver. That machine would would pass the credit card information to the backend across a serial link. The backend machine would process the card and return the status. The CC data would only be a one way transfer, with no way of retrieving it back off of that machine.
Re:how the FUCK is this possible? by Anonymous Coward · 2001-11-26 05:49 · Score: 0

I want a magical serial cable that doesn't speak *any* networking protocol but is somehow able to communicate with another computer over a serial network. Mmmm, magical serial cable...
Re:how the FUCK is this possible? by Anonymous Coward · 2001-11-26 06:18 · Score: 0

A serial line doesn't have to be doing "networking" in any meaningful sense of the word. It's just running a front-end to the machine on the other end of the serial line. There is not even a getty on that line.
Re:how the FUCK is this possible? by WNight · 2001-11-26 12:33 · Score: 2

Yup. Much the same as dumping critical logs to the printer, as soon as they are generated. And if you're paranoid, dump SHA hashes of the other logs at certain point. (ie, the apache log, as of 11/26/01 16:04:53 was 253,035 bytes, SHA 0x84BE2C9A1029A3C1(etc))

That way you've got critical logs that a hacker can't modify, and by checking every few days (with a simple script to roll the logs back to a given size) you can tell if the less-important logs have been modified. You may not ever know what they said if they have been, but the mere fact someone other than you or the server modified the logs is a big indicator of a problem.

For this reason, dot-matrix printers are still fairly often found in server rooms.

The serial cable thing is an extension of this. It just has a computer logging the data. You may start sending corrupt data, but it's not going to a shell you had to log into so the most you could do is confuse the logging script, but you could never get data back out.

If you're *really* paranoid, cut the 'send' line from the secure server. Even if the truly 31337-hacker could break into the machine they could never get anything back out. (And they'd have to find a buffer-overflow in '>' and manage to exploit it, blind, on an unknown system...)

Nothing to do by jeriqo · 2001-11-26 04:50 · Score: 1

Google does nothing more than a regular Web user. It simply follows links, and indexes the content in its database.

What's wrong with this?

Nothing. Human stupidity.

--
Alexis 'jeriqo' BRET

Re:Nothing to do by Jburkholder · 2001-11-26 05:08 · Score: 2

Far as I can tell from checking out the article and then trying this myself on Google is that you can now target your search to specific filetypes. If you are dumb enough to store passwords or creditcard numbers in an xls file on your website, google makes it easy to find.

I'm at a loss to explain how someone puts sensitive information on the web in an unprotected location and then points the finger at google because they made it easier to find.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
Re:Nothing to do by arkanes · 2001-11-26 06:07 · Score: 1

I've tried a number of methods, and wasn't able to get any lists of credit cards. Lots and lots and lots of phone numbers, tho...

Stopping Google won't stop the problem... by Kr3m3Puff · 2001-11-26 04:51 · Score: 5, Insightful

The big complaint of the article is that Google is searching for new types of files, instead of HTML. If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...

--
D.O.U.O.S.V.A.V.V.M.

Re:Stopping Google won't stop the problem... by Zspdude · 2001-11-26 05:29 · Score: 3, Interesting

It's definately very true that if there were no stupid people these things would not be an issue of controversy. However, society has struggled for a very long time to resolve the question, "Should stupid people be protected from themselves?" There will always be those who( whether they're just technologically inept or for whatever reason) will not act sensibly and not realize they are being foolish. Do they deserve protection as well, even though they don't know how to protect themselves? That's a question which is not quite as easy to answer....

--
What's in a Sig?
Re:Stopping Google won't stop the problem... by greed · 2001-11-26 05:39 · Score: 2, Interesting

So maybe the fix should be in making it harder to share things on the Web, rather than trying to have search bots guess whether someone really meant to post the file?

Web servers could ship configured to not AutoIndex, only allow specific file types (.jpeg, .html, .png, .txt), and disable all those things that I disabled in Apache without losing anything I needed for my site, and so on. Then, the burden is placed on the person who started sharing these other filetypes that have sensitive data on the public internet.

Of course, putting something in public that you don't want someone to see is just plain stupid, but apparently we need to make stupid people feel like they're allowed on the 'net.
Re:Stopping Google won't stop the problem... by nivedita · 2001-11-26 05:41 · Score: 1

The truly frightening bit is that the moron who made the statement about Google not thinking through its software design is a CTO, and author of a book on writing secure software. (shudder)

The point about the new kinds of files (Word docs, Excel sheets etc) being more vulnerable to virii is significant, though.
Re:Stopping Google won't stop the problem... by DaoudaW · 2001-11-26 06:07 · Score: 2

If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

This seems to be the most common early response to the article and I agree up to a point. The problem is where to stop. Several times I've found stuff in Google's cache that I know were password-protected on the website. I was grateful, but wondered how they retrieved them. Did they purchase a subscription? Did the owners give them access for the benefit of having the site catalogued?

Another issue appears when they start crawling directories. It's never obvious which directories were meant to be public readable and which ones weren't, but Google undoubtedly uses techniques beyond that of the casual browser. As what point do they become crackers?

A number of years ago, I had a shell account on a Unix system. It was amazing where I could go, what I could see on the system with a little bit of ingenuity. When I pointed this out to the sysadmin, he treated me like a criminal. Okay, maybe I should have stopped when I started getting warning messages ;-), but the fact is that Google could probably get behind at least 50% of firewalls if they wanted to.

How far is too far in the search for information?
Re:Stopping Google won't stop the problem... by cballowe · 2001-11-26 06:35 · Score: 0

How can Google be responsible for documents that are in the public domain

Just a minor warning here -- be careful not to confuse "available to the public" with "public domain"

sorry, it just jumped out at me...
Re:Stopping Google won't stop the problem... by mobiGeek · 2001-11-26 06:37 · Score: 5, Funny
but Google undoubtedly uses techniques beyond that of the casual browser
Uhh...no.
HTTP is an extremely basic protocol. Google's bots simply do a series of GET requests.
It would be possible that Google's bots have a database of username/passwords for given sites, but the more likely scenario is that they have stumbled across another way to get the "protected" information:
- a link which contains a username and/or password
  /protected/show_article.pl?username=foo&passwo rd=bar&num=1
- a link to the pages which by-passes the protection scheme
  /no_one_can_find_this_cause_Im_3l33t/article1.html
- someone else posted the information elsewhere, and this is what is actually crawled
I ran robots for nearly 2 years and was harassed by many a Webmuster who could prove that my robots had hacked their site. They'd show me protected or secret data. It typically took 3 to 5 minutes to find the problem...usually the muster was the problem themself.
HERE'S A NOTE OF WARNING TO WEBMASTERS:
Black text links on black backgrounds in really small fonts are NOT secure.
Maybe I should get this posted to BugTraq...or would MS come after me??
--
...Beware the IDEs of Microsoft...
Re:Stopping Google won't stop the problem... by Suppafly · 2001-11-26 06:52 · Score: 1

obviously with google around, the whole security through obscurity thing just isn't an option anymore.
Re:Stopping Google won't stop the problem... by Webmonger · 2001-11-26 07:24 · Score: 3, Insightful

Umm, I don't think that's how it happens. I think Google indexes the page and THEN the idiots put on the password protection.

If Google accessed it via a special link, then Google would store that link, and you'd use that link, and you'd see it yourself.

(another form of not-secret link:
http://user:password@domain/path/file)
Re:Stopping Google won't stop the problem... by Anonymous Coward · 2001-11-26 08:08 · Score: 0

Shouldn't be surprising.

Everyone these days is the CTO of their pathetic little company.

And I hope no one is surprised to find that a "security expert" is an idiot selling snake oil.
Re:Stopping Google won't stop the problem... by Anonymous Coward · 2001-11-26 08:12 · Score: 0

It's never obvious which directories were meant to be public readable and which ones weren't, but Google undoubtedly uses techniques beyond that of the casual browser.
Another reply to this said pretty much what half of my reply would have been, but I also want to ask you this question: since you're so certain of this, name at least one technique that Google uses that is not accessible to the casual browser. One. Just one. I dare you. I double dare you! What does Google use that a regular browser cannot?
Re:Stopping Google won't stop the problem... by Anonymous Coward · 2001-11-26 08:44 · Score: 4, Insightful

Years ago cable companies cried foul that ordinary citizens were grabbing satelite communications off the air with their fancy 6' dishes and whatching whatever they wanted for free. The companies raised a big stink and tried to get people to pay for the content. The FCC said "tough luck buddy. If you put it out there then people have a perfect right to grab it." Since that time most satelite traffic has been encrypted.

If you run a web site on the public internet then you should be paying attention to this basic fact: If you put it out there then people have a perfect right to grab it, even if you don't specifically tell them it's there. (I know FCC rulings don't apply, but the principle is the same). You should encrypt EVERYTHING you don't want people to see.

Encryption is like your pants, it keeps people from seeing your privates. Hiding your URLs and hoping is like running realy, realy fast with no pants on - most people wont see your stuff, but there's always some bastard with a handy-cam.
Re:Stopping Google won't stop the problem... by Scoria · 2001-11-26 10:17 · Score: 1

Neither are pi symbols in the lower right corner!

:)

--
Do you like German cars?
Re:Stopping Google won't stop the problem... by garbuck · 2001-11-26 13:15 · Score: 1

Several times I've found stuff in Google's cache that I know were password-protected on the website. I was grateful, but wondered how they retrieved them. Did they purchase a subscription?
No need. I too have found stuff in Google's cache that was secured on the actual site. It's almost certainly a question of timing. I.e., the webmaster published the page to the world accidently, and then only later realized his mistake and fixed hit. Meanwhile, the googlebot stopped by and scarfed up the unsecured content.
Did the owners give them access for the benefit of having the site catalogued?
This is also possible. Some webmasters are highly devious in dealing with search engines (especially the porn meisters). But I would bet 99% of the cases are a matter of publish first and secure later.
Re:Stopping Google won't stop the problem... by nahdude812 · 2001-11-27 02:11 · Score: 2

obviously with google around, the whole security through obscurity thing just isn't an option anymore.

I assume that statement was sarcastic, because it never really was. Security through obscurity is an oxymoron.

--
Slay a dragon... over lunch!
Re:Stopping Google won't stop the problem... by Suppafly · 2001-11-27 19:23 · Score: 1

yeh.. it was meant to be funny.. apparently /. just isn't the place for humour.. oh well..

Well Behaved Crawlers by tomblackwell · 2001-11-26 04:51 · Score: 4, Insightful

...obey the Robot Exclusion Standard. This is not a big secret, and is linked to by all major search engines. Anyone wishing to exclude a well-behaved robot (like those of major search engines) can place a small file on their site which controls the behaviour of the robot. Don't want a robot in a particular directory? Then set your robots.txt up correctly.

P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.

Re:Well Behaved Crawlers by melvin22 · 2001-11-26 05:05 · Score: 1

Actually htey should probably get the hell out as fast as they can. But that's just my opinion...
Re:Well Behaved Crawlers by Nos. · 2001-11-26 05:06 · Score: 2

This is not the way to do it, as the article mentions. This may stop Google, but suppose I'm running my own search engine that doesn't follow "robots.txt" rules?
Re:Well Behaved Crawlers by ryanvm · 2001-11-26 05:12 · Score: 5, Insightful

The Robot Exclusion Standard (e.g. robots.txt) is mainly useful for making sure that search engines don't cache dynamic data on your web site. That way users don't get a 404 error when clicking on your links in the search results.

You should not be using robots.txt to keep confidential data out of caches. In fact, most semi-intelligent crackers would actually download the robots.txt with the specific intention of finding ill-hidden sensitive data.
Re:Well Behaved Crawlers by sunking2 · 2001-11-26 05:14 · Score: 1

But I thought OPT OUT was bad and everything should be OPT IN!

Seriously tho, there are a lot of ways that this sort of information can make it onto the web other than blaming companies. For example, how many times have people bought things online and then saved the html document that was returned as the receipt. It's very easy to imagine that people could save this to a directory that is inadvertanetly crawled.
Re:Well Behaved Crawlers by Phroggy · 2001-11-26 06:00 · Score: 1

I don't routinely save HTML receipts in a directory under public_html, and if I did, it wouldn't be in a directory with an index (i.e., a directory without an index.html file so the server returns a list of every file in the directory).

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:Well Behaved Crawlers by Anonymous Coward · 2001-11-26 06:04 · Score: 0

I have a small web site I use for myself and my family on my home computer (no URL). I have had web crawlers, etc. come knocking and I don't have the bandwidth to keep them fed (I have DSL).

I've made a robots.txt file and told Apache it's a script. When run, it modifies the firewall to block access to that IP address. If any malicious users stop by to see what's available, they get blocked out completely.

Is that considered cruel?
Re:Well Behaved Crawlers by Pyrosz · 2001-11-26 06:10 · Score: 1

The following is an example of robots.txt for those too lazy to look.

# robots.txt for Slashdot.org
User-agent: *
Disallow: /index.pl
Disallow: /article.pl
Disallow: /comments.pl
Disallow: /users.pl
Disallow: /search.pl
Disallow: /palm
Disallow: index.pl
Disallow: article.pl
Disallow: comments.pl
Disallow: users.pl
Disallow: search.pl

--

An optimist believes we live in the best world possible; a pessimist fears this is true.
Re:Well Behaved Crawlers by Stormin · 2001-11-26 06:17 · Score: 1

I once did sort of the opposite. I had robots.txt setup so that it would log the IP and return a bunch of directories with confidential sounding names. (Since it was on a public site for my company I obviously didn't want to lock out the real search engines.) The files in the robots.txt file were there, as a sort of honey pot. Password files? Sure, in standard unix format. Except the passwords don't go to anything and if you run crack on them, you'll find it takes a long, long time - since they aren't real passwords we encrypted strings of letters,numbers, and symbols that aren't in any known crack dictionary. One IP downloaded the file three times! But we never saw the fake passwords being used at the honeypot linux box. I guess he got tired of waiting.
Re:Well Behaved Crawlers by Anonymous Coward · 2001-11-26 08:02 · Score: 0

We need a law requiring 133t h4x0r5 follow the robots.txt file. That's the only way to stop these terrorist acts!!!
Re:Well Behaved Crawlers by MadAhab · 2001-11-26 12:33 · Score: 2

This is true, but if it really concerns you that someone might try looking at things listed in robots.txt (I've done it), try adding an exclude to robots.txt that specifies a bogus, but tempting directory, say "/mp3/*" or "/warez/*" or "/bspears/*". Then create the directory, make index.cgi mail you immediately when someone requests the directory index.... haha.

--
Expanding a vast wasteland since 1996.
Re:Well Behaved Crawlers by kimihia · 2001-11-26 13:57 · Score: 2

The way around having crackers look at your robots.txt for a pointer to sensitive information is be guarded about what you put in it. For example, if I had the url:

/secret-stuff-here/credit-card.txt

I would sure as heck not be putting that in my robots.txt, because your robots.txt could eventually end up in the index too. Instead I would put this:

/sec

That will match and disallow the first address, and the h4x0r still has a lot to guess.
Re:Well Behaved Crawlers by merlin_jim · 2001-11-27 06:28 · Score: 2

P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.

Long and hard? Anyone keeping credit info in a web directory really shouldn't have to think that long about getting out of the internet.

As soon as someone points out to you what a colossal mistake that is, it's time to go back to McDonald's and hope they'll give you your old job back, because this whole Internet thing just ain't workin out for you no more.

--
I am disrespectful to dirt! Can you see that I am serious?!

Google shouldn't lift a finger by sketerpot · 2001-11-26 04:51 · Score: 2, Interesting

Why should Google or any other search engine do anything to save fools from their stupidity? Putting credit card numbers online where anyone can get them is just plain idiotic. Hopefully this will get a lot of publicity along with the names of companies who do stupid things like this and most people will shape up their act.

Re:Google shouldn't lift a finger by nomadic · 2001-11-26 05:37 · Score: 2

Yeah, but the people that suffer the most aren't the idiots posting the data, they're the people whose credit card numbers they are. Why should they suffer because the store they bought something from doesn't understand the concept of security?
Re:Google shouldn't lift a finger by Anonymous Coward · 2001-11-26 06:56 · Score: 0

Why should they suffer because the store they bought something from doesn't understand the concept of security?
Because they bought something from doesn't understand the concept of security.

Simple but burdensome solution by camusflage · 2001-11-26 04:52 · Score: 4, Informative

Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.

--
The truth about Scientology, Xenu, and you: Operation Clambake

Re:Simple but burdensome solution by Xerithane · 2001-11-26 04:56 · Score: 5, Insightful

It is a burden, but the responsibility does not lie on a crawling engine. You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.

If anything Google is providing them a service by telling them about the problem.

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by new-black-hand · 2001-11-26 05:10 · Score: 1

might not be that bad, since to do the calcluation on the number it will have to be XXXXXXXXXXXXXXXX format, so it has to be converted regardless
Re:Simple but burdensome solution by Codifex+Maximus · 2001-11-26 05:18 · Score: 2

While the idea is workable in a small subset of data, what about other sensitive data that is found in the public domain? Will Google and other search engines be responsible for hiding that too? Where does it end?

The burden of hiding information, that should *NOT* be there in the first place, should rest on the entity that posted the information publicly - the web-site.

Once information is *Published* can it be *UnPublished*?

--
Codifex Maximus ~ In search of... a shorter sig.
Re:Simple but burdensome solution by joe52 · 2001-11-26 05:47 · Score: 1

I agree with another poster who pointed out that this is simply someone else's problem. If you have "secret" data that your webserver will give me over the public Internet, then that is your problem.

One major downside to trying to screen out CC numbers is that they are only one type of data that should not be exposed. SSN's, passwords, bank account numbers, and all sorts of other information should also be kept private. Having a search enginge sensor publicly accesible documents that contain such information is a waste of time. The only real solution is to simply make those documents inaccesible to the public.

-Joe
Re:Simple but burdensome solution by Anonymous Coward · 2001-11-26 05:51 · Score: 0

The odds are that many other mod10 numbers appear on the Web beyond credit card numbers. Should Google block every scientific paper (for example) that accidentally contains such a number in its table of results?
Re:Simple but burdensome solution by camusflage · 2001-11-26 05:56 · Score: 2

I never said it wasn't web monkey's fault. Yes, anyone who would do something like this doesn't deserve even the title of web monkey. This is simply a reaction, like a provider filtering inbound port 80 to staunch code red's effects.

--
The truth about Scientology, Xenu, and you: Operation Clambake
Re:Simple but burdensome solution by The+Pim · 2001-11-26 06:28 · Score: 2

Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.
Among other things, this would have the amusing effect of blacklisting most web pages about credit card number validation.

--

The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
Re:Simple but burdensome solution by ethereal · 2001-11-26 06:44 · Score: 1

No.

--
Your right to not believe: Americans United for Separation of Church and
Re:Simple but burdensome solution by deblau · 2001-11-26 07:28 · Score: 1

You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

bash$ cat creditcards.txt | perl -pe's/[^0-9]//g'

--
This post expresses my opinion, not that of my employer. And yes, IAAL.
Re:Simple but burdensome solution by Xerithane · 2001-11-26 07:42 · Score: 2

I wasn't saying you said that. More supporting your argument and putting in my own 2c.

I just hope that the proper heads roll on this one.

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by Xerithane · 2001-11-26 07:47 · Score: 1

That wont work at all. For instance, you seem to think that all numbers are credit cards and what about files that aren't named 'creditcards.txt'

besides, that makes no sense how that would have any desired effect whatsoever..

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by scrytch · 2001-11-26 11:19 · Score: 2

but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.? \d .?\d.?\d.?\d

should match >99% of cc numbers. And a lot of other dross, but you can just pipe it into a mod10 checker. Search engines shouldn't have to do this -- unless they're looking for cc numbers that is. Anyone who publishes confidential data that's crawlable without any kind of password protection should be liable for damages (hint: even if you have a CGI braindead enough to allow password=foo, you can still defeat it with sessionid's that time out.... 'course if you do that, just use the sessionid as the per-request auth in the first place)

--
I've finally had it: until slashdot gets article moderation, I am not coming back.
Re:Simple but burdensome solution by Bronster · 2001-11-26 15:38 · Score: 3, Informative

\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d .?\d.?\d.?\d

should match >99% of cc numbers. And a lot of other dross, but you can just pipe it into a mod10 checker

Putting the burden on me, the poor sap who wants to have my web pages indexed, to make sure that I don't accidently put any numbers on a web site that might be mis-interpreted as a credit card number (i.e. a tab or comma separated list of numbers would be likely to hit the above, especially if it was much longer than a CC number).

Not to mention the problem of recursive lookup on
a long number (the first 2000 digits of pi are 3.1415926535.......) - it would take an age to make sure there were no CC no's in that.

All together, it would cause 'innocent' pages to not be indexed, which is distinctly sub optimal.
Re:Simple but burdensome solution by Xerithane · 2001-11-27 06:03 · Score: 2

That wont match anything useful, maybe I'm just missing something but it looks ... pointless.

That would match this hex string:
3c5e2a992b3c2a151...(dont feel like finishing typing it out)

and a plethora of other valid data. The reason why this algorithm is ugly is because all numbers that are mod10 are not credit card numbers.
What does your hint mean anyway? What does that have to do with anything? Expiring sessions that remove themselves when they timeout wont matter (which is relatively easy, you just have a scrub process stat the session lock and purge session if access time is > timeout time.)

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by scrytch · 2001-11-27 14:15 · Score: 2

zzzzzzzzZZZZZZZOOOOOOMMMMM

the sound of my point going over the heads of both respondents. I was demonstrating how it was far from "prohibitively expensive" to scan for cc#'s, and a little more reading would have also revealed that I was saying that search engines SHOULDN'T have to be burdened with this nonsense, but it sure would be useful to people specifically looking for a cc#.

And my hint had to do with the fact that even password protected pages (that are not using http auth) can often be defeated by a link that contains a password, a fact that was pointed out well earlier in the discussion. Sessions that time out aren't susceptible to that problem, even a link with all the authentication info would still be invalid by the time most search engines got around to indexing it.

I'm gonna get pissy again: doesn't anybody read anymore?

--
I've finally had it: until slashdot gets article moderation, I am not coming back.
Re:Simple but burdensome solution by Xerithane · 2001-11-28 04:47 · Score: 2

Only if you made sense. Your regex was absolute drivel and didn't address any problem, nor would it work correctly because my guess you have no idea how credit card numbers are generated. They are mod10 numbers that have an algorithm based upon the expiration date (called the Lune Check, go google if curious) so if you have the expiration date (any 4 number string broken up in numerous different ways) and a 16 digit number you can match them up -- also anyone searching for credit card numbers with your regex is absolutely stupid and destined for failure.

The point wasn't lost on us, it was just simply a silly and overly pointless response. Links that contain passwords are a bad idea, but most "real" sites that offer password protection with session authentication don't have problems like sessions laying around for hijacking.
Also, a search engine that crawls upon link that contains password information would initiate a login/session pair on that site (given it is a session-based site) or just a persistent login (which is a bad idea anyway)
Feel free to get pissy; you are still wrong. And if that regex is a real demonstration of your coding abilities, you are way out of your league but I wont make a judgement about that - just a statement from a developer (who, incidently is writing a credit card processing engine at the moment) who knows the workings of credit card algorithms. And, on an unrelated side note just to vent: veriphone sucks.

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by scrytch · 2001-11-28 08:17 · Score: 2

My regex would be the first of a chain of filters, using a pipe or lazy evaluation, that would still throw out 99% of pages it crawled, passing the remaining piece to a more computationally expensive algorithm. This is called a heuristic, see? This is a win when you have two machines, see? The second filter would probably try to look for something like an expiration date, which is a smaller string and thus is less optimal as a search.

I didn't realize I was going to have to teach a damn CS class about how to layer because some slashdoterati find it necessary to flaunt their coding dick size at every turn.

--
I've finally had it: until slashdot gets article moderation, I am not coming back.
Re:Simple but burdensome solution by Xerithane · 2001-11-28 08:53 · Score: 2

Well, why don't you start out by writing a decent initial layer. And, you wouldn't want to use a regex because of computational overhead that is unnecessary for a fixed-length comparison as you put it. If you understood the obvious problems with your initial filter you might be qualified to teach CS on slashdot. Although, I really would like to see you run a query like that as your primary 'non-expensive' regex... it'd be a good laugh when you could do the same test in 2 lines of code that requires about half the overhead. Hell, with the overhead you save you could do a full mod10 check too - and actually get valid results. You wouldn't throw out 99% of the invalid pages too because your regex wasn't encapsulating only 16 objects - not even bounded so any page that had any number of sequential numbers would be passed to the second layer. That's fine and dandy but why not just do a comparison check that verifies it's a mod10 number in the first go? Oh right, because you are so l33t you don't need to worry about algorithm efficiency and overhead... my bad.

Posting code on slashdot does this, especially when you follow it up with a claim that you are some cool guy who understands all these principles of CS when it ends up you really have no clue how to design a high-load low-overhead algorithm to filter based on a relative scoring algorithm (in this case, it's a mod10 16-digit number scored at either 100% or 0% in the initial algorithm thereby ommitting any necessity for a pipe) the expiration date can also be computed after a CC number is found and inserted into an index query (passed two a second layer) and then using a lune check to calculate the proper pairing (with a score index based off of page/domain to filter the CC's to check against)

But hey.. you knew that right? This isn't coding, it's science. Big difference, go back to school.

--
Dacels Jewelers can't be trusted.

Google exploit patch for Apache by Anarchofascist · 2001-11-26 04:52 · Score: 4, Funny

% cd /var/www % cat > robots.txt User-agent: * Disallow: / ^D %

--
Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!

Insert foot in mouth.... by Crewd · 2001-11-26 04:52 · Score: 2, Interesting

From the article :

"Webmasters should know how to protect their files before they even start writing a Web site," wrote James Reno, chief executive of Amelia, Ohio-based ByteHosting Internet Services. "Standard Apache Password Protection handles most of the search engine problems--search engines can't crack it. Pretty much all that it does is use standard HTTP/1.0 Basic Authentication and checks the username based on the password stored in a MySQL Database."

And chief executives of a hosting company should know how Basic Authentication works before hosting web sites...

Crewd

Re:Insert foot in mouth.... by simong · 2001-11-26 05:01 · Score: 2, Funny

Not necessarily, they are chief executives after all.
Re:Insert foot in mouth.... by Vairon · 2001-11-26 08:07 · Score: 1

You must be talking about your own foot, because apparently you're not aware of mod_auth_pgsql and mod_auth_mysql

http://freshmeat.net/projects/mod_auth_pgsql/ and http://freshmeat.net/projects/mod_auth_mysql/.

In fact a simple search of "mod_auth" on http://www.freshmeat.net returns 22 hits of software that uses HTTP Basic Authentication and checks usernames/passwords against databases, ldap, pam, kerberos, etc.

What did they expect? by Walter+Bell · 2001-11-26 04:53 · Score: 1

Lowering the barrier to entry to web publishing has had a few benefits. Families can share photographs and news in a cheap, efficient manner. Novices can publish information for the benefit of their employees or others easily. However, problems like this do arise quite often, and at their source one can see that the widespread ability of people to publish documents to the web does not coexist well with existing security systems and models.

At any other time in the past few years, this would not ordinarily be a societal problem. Sure, a few peoples' passwords and credit card numbers will leak out. Hopefully they would have to pay for the charges to punish them for their own stupidity. (After all, as a customer of several banks, I don't want my rates to go up because somebody posted his account numbers for the entire world to see.) But now, this is a national security problem, because we are being attacked by a foreign force who might abuse leaked passwords to access critical systems and cause chaos in this country. President Bush and his staff are very concerned about a cyberwar, because it can be waged without physically having Arabs in the States to commit the terrorism. That is very dangerous indeed.

I'm not sure what the solution is, but a good first step is for companies to raise the barrier to entry to publishing web pages. Geocities and Angelfire should force users to demonstrate their competence before uploading their first page. Perhaps requiring an A+ certification number would help? And Microsoft should take away the parts of FrontPage that allow users to generate documents without writing in HTML. That would help ease the problem, I reckon.

In conclusion - if everybody does their part to help solve this problem and stop information leakage, we will be a safer, more secure society without giving up any more civil liberties.

~wally

Re:What did they expect? by Anonymous Coward · 2001-11-26 14:56 · Score: 1, Interesting

However, problems like this do arise quite often, and at their source one can see that the widespread ability of people to publish documents to the web does not coexist well with existing security systems and models.

Not really, more a problem that incompetent administrators don't know, don't care, or don't think it's their problem; the stupidest of users don't know what they shouldn't do, and unscrupulous folk take advantage of that.
The existing systems and models work, it's just that badly run sites don't use them correctly, if at all. Most users have the basic knowledge (or concern/fear) not to post credit card numbers in a public location online or to an untrusted site, but some stupid ones will (perhaps deservedly) pay for their mistakes and others may get screwed over by badly secured 'trusted' sites or convincing but spurious sites.

At any other time in the past few years, this would not ordinarily be a societal problem. Sure, a few peoples' passwords and credit card numbers will leak out.... But now, this is a national security problem, because we are being attacked by a foreign force who might abuse leaked passwords to access critical systems and cause chaos in this country. President Bush and his staff are very concerned about a cyberwar, because it can be waged without physically having Arabs in the States to commit the terrorism. That is very dangerous indeed.

This is hardly now a concern related to regular people publishing on the web. It IS a concern about security, but sensitive information should be protected by a competent admin with appropriate controls. Passwords to truly critical systems are protected both through online security methods and physical requirements; you aren't going to find them published for all to read on the family page of the guy with access to the button, nor in any cache of his online activities and postings.

There is reason for concern there, but the solution to that concern is to make sure that the appropriate procedures are followed. As I say, it's completely irrelevant to 'standards' of publishing on the web or who can put up a homepage - only (if anything) to the competence and security awareness of those running the servers.

There are probably millions of "here's my cat, I've joined 500 webrings, I like icecream, here's some annoying MIDI music that a button put on the page for me, pleeeease sign my guestbook" pages on Geocities and its like (Homestead is a really bad example) but I wouldn't call them a threat to National Security.

Anything that might be called such a threat should not even be stored in an unprotected computer, let alone online. If anything the main problem might be having sensitive information on a private computer, where a cable ISP has discouraged or failed to mention appropriate firewalls (as is often the case). But for seriously sensitive stuff, this situation would never be permitted.

I'm not sure what the solution is, but a good first step is for companies to raise the barrier to entry to publishing web pages. Geocities and Angelfire should force users to demonstrate their competence before uploading their first page.

Why? Crap page design results in something people don't want to look at. All it does is waste space on Geocities' servers, you never have to see it if you don't want to look for it. The only possible gripes are if
a) they publish sensitive information on there - which only rebounds on them if it's CC numbers or passwords. An adult with access to truly sensitive information would be bound by employment/secrecy clauses in their contracts so are hardly likely to 'accidentally' reveal a government secret.
or b) they clog up the search engine results with lots of crappy listings. Which is true, but good search engines take into account how popular the page is, which tends to push the crappy pages down to the end of the listings.

Perhaps requiring an A+ certification number would help? And Microsoft should take away the parts of FrontPage that allow users to generate documents without writing in HTML. That would help ease the problem, I reckon.

No it wouldn't, because the problem is not with page designing abilities, or with posting sensitive information on the web. If someone puts on there homepage "My password is: " they take responsibility for that. If they were to put "The Nuclear Launch Code is:" they would probably be shot at dawn, but that's not going to happen with the people trusted with such information.
If the host has a password list that is world readable, then there's a serious security problem. If anything the A+ certification should be required for those hosting, not creating, the pages.

In conclusion - if everybody does their part to help solve this problem and stop information leakage, we will be a safer, more secure society without giving up any more civil liberties.

Probably, but the point is that the large-scale sensitive material you reference is secured by appropriate technology, contract agreements and competent staff; but for personally sensitive material, apart from not being at fault for doing anything incredibly stupid, the people we entrust with such information must be competent and concerned. It's not about the codes for war, it's not about the abilities of Frontpage users, the actual problem lies somewhere in the middle.

(now using Frontpage to display the codes for war on the US government homepage, there might be a problem).

Microsoft .Net.... by MrWinkey · 2001-11-26 04:54 · Score: 1

Hmmm....Microsoft's .Net possibly helping the problem?
Microsoft said it was safe.....

--
Vote early. Vote often. Vote CowboyNeal.

The Problem of Search Engines and "Sekrit" Data by NTSwerver · 2001-11-26 04:55 · Score: 4, Funny

Please change the title of this article to:

The Problem Incompetent System Administrators

If data is 'sekrit'/sensitive/confidential - don't put it on the web. It's as simple as that. If that data is available on the web, search engines can't be blamed for finding it.

--
-----------------------
Moderator's essentials

Re:The Problem of Search Engines and "Sekrit" Data by Garfunkel · 2001-11-26 05:09 · Score: 1

It's not always "System Administrators". How many DSL/Cable subscribers do you think run Microsoft's Personal Webserver. Technically, yes, they are administering a system, but I don't think anybody would really call them a SysAdmin.

--
-jay
Re:The Problem of Search Engines and "Sekrit" Data by NTSwerver · 2001-11-26 05:21 · Score: 1

It's still their fault - how can Google be blamed for others' ineptitude? I wouldn't try and blame someone else if I left the keys in my car's ignition and someone stole it.

--
-----------------------
Moderator's essentials
Re:The Problem of Search Engines and "Sekrit" Data by Genom · 2001-11-26 06:53 · Score: 2

I wouldn't try and blame someone else if I left the keys in my car's ignition and someone stole it.

While true, that it's a bonehead move to leave your keys in the ignition, the presumption that you would solely be to blame for the theft of your car would be wrong. The person stealing the car would still be to blame for the actual stealing, you're just making it hellishly easy for them to do so.

Now, in regards to search engines, it would be similar to leaving your keys in the ignition, and having a search helicopter see your car, land, and put up a big flashing neon sign saying "Hey! Whoever left this car here, your keys are still in the ignition!"

A car is a pretty bad analogy, though, when it comes to Google's cache - because cars don't replicate =)
Re:The Problem of Search Engines and "Sekrit" Data by mosch · 2001-11-26 07:18 · Score: 2

terrible analogy. a much better one would be if you left the keys in your car, and parked it under a sign that said 'free cars, it is 100% legal and will be appreciated if you would please remove any car you'd like from this parking lot'

This is what happens when you use frontpage... by Grip3n · 2001-11-26 04:55 · Score: 5, Informative

I'm a web developer, and I don't know how many times I've heard people who are just getting into the scene talking about making 'hidden' pages. I'm reffering to those that are only accessible to those who click on a very tiny area of an image map, or perhaps find that 'secret' link at the bottom of the page. Visually, these elements seem 'hidden' to a user who doesn't really understand web pages and source code. However, these 'hidden' pages look like giant 'Click Here' buttons to search engines, which is what I'm presuming some of this indexing is finding.

The search engines cannot feasibly stop this from happening, each occurance is unique unto itself. The only prevention tool is knowledge and education, and bringing to the masses a general understanding of search engine spidering theory.

Just my 2 cents.

--
To make a pun demonstrates the highest understanding of a language

Re:This is what happens when you use frontpage... by onion2k · 2001-11-26 05:02 · Score: 2

Often worse than that.. the dreaded visibile:hidden CSS/DHTML that the likes of Dreamweaver is so keen on.. what the eye can't seen the robot certainly can..

--
http://twitter.com/onion2k
Re:This is what happens when you use frontpage... by y86 · 2001-11-26 05:18 · Score: 0

HAHAHA..... ever see the movie "THE NET", those hidden pages work out REAL good..... lol... the process of Natural Selection strikes again... the predators prey upon the stupid..... Darwin, you ROCK!
Re:This is what happens when you use frontpage... by EccentricAnomaly · 2001-11-26 05:51 · Score: 2, Insightful

C|Net seems to think the security problem is with Google:

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

This is crazy. Google isn't doing anything wrong. The problem is with the idiots who don't spend five minutes to check that their secret data is really hidden.

This is like blaming a dog owner when his dog bites a burgler... er uh, nevermind.

--
There are 10 types of people in this world, those who can count in binary and those who can't.
Re:This is what happens when you use frontpage... by Anonymous Coward · 2001-11-26 09:16 · Score: 0

Robots and every non-css compliant browser.
Re:This is what happens when you use frontpage... by Anonymous Coward · 2001-11-26 13:32 · Score: 0

> I've heard people who are just getting into the scene talking about making 'hidden' pages.

I wanted to test that out really quick. So I went over to google, typed in "secret hidden page", and lo and behold all kinds of things popped up!

I found one page with a kid doing all kinds of drugs (drugs are bad), and all kinds of nifty stuff. Made for an entertaining read
Re:This is what happens when you use frontpage... by Sabalon · 2001-11-26 16:02 · Score: 2

Yuh...Ford thought the same thing - we have this car that we can offer to our users without thinking about how they could be used in a crime.

Didn't the courts just say this was a bogus argument in gun crimes?

Heh by Anonymous Coward · 2001-11-26 04:55 · Score: 0

These people who store credit card numbers on the web server are the same people who don't patch for IIS worms until all hell breaks loose.
Good for them.

Example by squaretorus · 2001-11-26 04:55 · Score: 5, Informative

I recently joined an angel organisation to publicise my business in an attempt to raise funds. The information provided to the organisation is supposed to be secret, and only available to members of the organisation via a paper newsletter which was reproduced in the secure area of the organisations website.
A couple of months down the line a couple of search engines, when asked about 'mycompanyname' were giving the newsletter entry in the top 5.

Alongside my details were those of several other companies. Essentially laying out the essence of the respective business plans.

How did this happen? The site was put together with FP2000, and the 'secure' area was simply those files in the /secure directory.

I had no cause to view the website prior to this. The site has been fixed on my advice. How did this come about? No one in the organisation knew what security meant. They were told that /secure WAS!

It didn't do any damage to myself, but a few of the other companies could have suffered if their plans were found. Its not googles job to do anything about this, its the webmasters. But a word of warning - before you agree for your info to appear on a website ask about the security measures. They mey well be crap!

Re:Example by Shimbo · 2001-11-27 04:25 · Score: 1

I had no cause to view the website prior to this. The site has been fixed on my advice.
Reporting security holes is a dangerous business: check this thread. Incidentally, under EU protection laws, failing to take reasonable steps to secure personal data may be an imprisonable offence.

I've got a solution! by CraigoFL · 2001-11-26 04:56 · Score: 5, Funny

Every web server should have a file in their root directory called "secret.xml" or somesuch. This file could list all the publicly-accessible URLs that have all the "secret" data such as credit card numbers, root passwords, and private keys. Search engines could parse this file and then NOT include those URLs in their search results!

Brilliant, huh? ;-)

On second thought, maybe I shouldn't post this... some PHB might actually think it's a good idea.

Re:I've got a solution! by Plutor · 2001-11-26 07:29 · Score: 2

That's a fantastic idea! Altho it should be a text file, and we should call it robots.txt or something like that.

(Note: this kind of thing already exists, and its already called robots.txt)
Re:I've got a solution! by po_boy · 2001-11-26 07:36 · Score: 2

s/secret.xml/robots.txt/g
http://www.robotstxt.org/wc/norobots.html
Re:I've got a solution! by siokaos · 2001-11-26 08:40 · Score: 1

.htaccess

'nuff said

--
http://siokaos.org/

SSL only takes you so far by imrdkl · 2001-11-26 04:56 · Score: 1

And then you are at the mercy of ridiculous temp-file and text database schemes. I've never deployed a credit-card web, but I get enough spam from people trying to sell me their own implementation for my server, that this is not surprising at all.

Maybe we need to demand "approved" server-side implementation of credit-card webservers, besides SSL. How could this be verified? I don't have a clue.

Re:SSL only takes you so far by imrdkl · 2001-11-26 11:26 · Score: 1

I am finished debating with shadows about my sig. Come out and talk to me, or go jump in the lake with the rest.
Re: SSL only takes you so far by Inthewire · 2001-11-26 13:02 · Score: 1

I've been seeing your sig around for a while, but never felt like commenting on it. No, I'm not any of the ACs who may have said anything. Just curious...what's fair in this case?

--

Writers imply. Readers infer.
Re: SSL only takes you so far by imrdkl · 2001-11-26 13:53 · Score: 1

A fair question. But I asked it first.
I can say what is not fair, but I think we may just agree on that definition.
A fair trial is our greatest and most powerful weapon against hatred and fear. How about that?
Re: SSL only takes you so far by Inthewire · 2001-11-26 18:03 · Score: 1

Well...all I know is what I'm told. I'm told Mr. bin Laden has taken credit / responsibility for the actions of 9/11.
Those were not mere criminal acts. Those were acts designed to underline demands. Those were acts of war.
I have to believe that what I am told is true, that ObL is the man most responsible, that the Taliban supported him. Again, assuming that is the case, I would love to see him paraded through the streets of America. I would forever hate myself for missing the opportunity to piss on his face.
A trial assumes there is guilt or innocence to be established. Claiming guilt removes the need for a trial. Move directly to punishment.

Now, the man may be innocent, and wish to defend his honor. This is not the story I've been told. I don't know anyone I trust to translate the tapes. So I place my (limited) faith in the US government. I'm willing to be proven wrong. I do not wish to sanction the execution of those wrongly accused. But I've seen no credible rebuttal.

And should others be guilty, let them die as well.

--

Writers imply. Readers infer.

No easy solution in sight?!?! by Anonymous Coward · 2001-11-26 04:56 · Score: 0

Here's a really easy solution - the bank has these crazy things called "bills." Go to the bank and get some. Then go to the store and use aforementioned "bills." Voila - hax0rs go bye-bye.

Re:No easy solution in sight?!?! by JatTDB · 2001-11-26 06:45 · Score: 1

Yeah, but what if you need to buy a Pony Tail Buttplug? Nobody's gonna buy that at a store...

--
"That's Tron. He fights for the Users."

Bad manager ideas by Mr+Krinkle · 2001-11-26 04:56 · Score: 1

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
The fact that this guy claims the responsibility lies with google for not allowing this type of search is just plain crazy. If you are publishing critical information on a site that is not at least secure and preferaby encrypted you are just asking for trouble. It should not be google's responsibility in any way shape or form to not find this information. If the content providers wish it they can put the robot file out but that is not fixing anything merely sidesteping one super easy hack. They still need to have a decent or at least SOME security design.
Oh well.

--
I am 31337 or something.

Google exploit patch 0.2 for Apache by Anarchofascist · 2001-11-26 04:57 · Score: 2, Funny

Oops! Version 0.2 already:

% cat > /var/www/html/robots.txt User-agent: * Disallow: / ^D %

--
Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!

Umm by Anonymous Coward · 2001-11-26 04:57 · Score: 0

"As the article outlines, this has been a problem for a long time -- and with no easy solution in sight."

How about using basic mySQL passwords?
Sounds pretty simple to me

To test your credit-card ordering site... by 5n3ak3rp1mp · 2001-11-26 04:57 · Score: 1

go to Google, type in "site:yourdomain.com xxxx-xxxx-xxxx-xxxx" where the x's are the credit card number of a known customer. If you get hits, your security is less than ideal.

Unfortunately, website security is not as simple as locking a door... but keeping your customer data out of the webserver's document root would be a good start.

Re:To test your credit-card ordering site... by Legion303 · 2001-11-26 05:02 · Score: 2

go to Google, type in "site:yourdomain.com xxxx-xxxx-xxxx-xxxx" where the x's are the credit card number of a known customer.
Then watch the fraudulent charges fly when the person who was sniffing cleartext HTTP traffic gets it in his logs.
-Legion
Re:To test your credit-card ordering site... by Anonymous Coward · 2001-11-26 10:48 · Score: 0

Actually, if you know any of your user's credit cards, your security is less than ideal. There's no reason your users' cardNumbers should exist in plaintext anywhere on your site!
Re:To test your credit-card ordering site... by DavidTC · 2001-11-26 18:01 · Score: 2, Insightful

Erm...so you're just going to magically verify them without knowing them?
Here's a big hint: Not everyone is running some sort of completely automated, completely external validation service, and, duh, if they aren't, they need to know the numbers so they can actually charge the people.
About the only reason they shouldn't be in your computers somewhere is if you're using a third party to handle all that stuff...and then they will be in their computer. They, rather obviously, have to exist somewhere to be send to the CC companies.

--
If corporations are people, aren't stockholders guilty of slavery?

Bad.. but by boaworm · 2001-11-26 04:58 · Score: 2

Such problems have existed for quite a while. Hackings, Crackings, internet sniffing etc.

The real issue is not if you can.. but if you actually do use the information. Regardless of if it is available or not, it IS ILLEGAL. (Carding does give rather long prison times as well)

People had the chance to steel from other people for as long as mankind existed. This is just another form... perhaps a bit simpler though ...

--
Probable impossibilities are to be preferred to improbable possibilities.
Aristotele

Re:Bad.. but by Anonymous Coward · 2001-11-26 06:40 · Score: 0

Carding does NOT give rather long prison times, in fact 1 year probation and 200 hours of community service was a bit harsh :]

How this happens by Tom7 · 2001-11-26 04:59 · Score: 5, Informative

People often wonder how their "secret" sites get into web indices. Here's a scenario that's not too obvious but is quite common:

Suppose I have a secret page, like:
http://mysite.com/cgi-bin/secret?password=admini st rator

Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).

Now suppose elsewhere.com runs analog on their web logs, and posts them in a publically-accessible location. Suppose elsewhere.com's analog setup also reports the contents of the "referer" header.

Now suppose the web logs are indexed (because of this same problem, or because the logs are just linked to from their web page somewhere). Google has the link to your secret information, even though you never explicitly linked to it anywhere.

One solution is to use proper HTTP access control (as crappy as it is), or to use POST instead of GET to supply credentials (POST doesn't transfer into a URL that might be passed as a referrer). You could also use robots.txt to deny indexing of your secret stuff, though others could still find it through web logs.

Of course, I don't think credit card info should *ever* be accessible via HTTP, even if it is password protected!

Re:How this happens by Garfunkel · 2001-11-26 05:05 · Score: 2, Informative

ah yes, analog's reports (and other web stat programs) are a big culprit as well. Even on local sites. If I have a /sekrit/ site that isn't linked to from anywhere on my site, but I have a bookmark that I visit often. That shows up in web logs still and usually gets indexed by a web log analyzer which can "handily" create links to all those pages when it generates the report.

--
-jay
Re:How this happens by frankie · 2001-11-26 05:20 · Score: 2, Troll

Suppose I have a secret page, like: http://mysite.com/cgi-bin/secret?password=administ rator
Then it's a pretty crappy secret. Plaintext passwords sent via GET are weaker than the 4 bit encryption in a DVD or something.
Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).
If the page is really truly supposed to be secret, then it won't have external links, and you'll filter it out of your web logs too. Or you could just suck.
Google doesn't kill secrets. PHBs and MCSEs kill secrets.
Re:How this happens by Anonymous Coward · 2001-11-26 05:41 · Score: 0

Google doesn't kill secrets. PHBs and MCSEs kill secrets.

Damn straight! Especially when your PHB is an MCSE.

MCSEs...feh.. what a bunch of lowlives.
Re:How this happens by kilgore_47 · 2001-11-26 06:40 · Score: 1

The part of your scenario with logs being indexed doesn't even have to happen; say you are at http://mysite.com/cgi-bin/secret?password=administ rator and you go to a search engine! I bet most smart search engines, upon seeing a referer URL it wasn't familiar with, would quickly run out and index the page.

Regardless of how the search engine gets the link, however, the indexing software SHOULD drop anything after the '?' character anyway (to avoid indexing the same cgi repeatedly with different arguments).

Regardless, sending passwords in GET requests is ALWAYS a bad idea, as is putting up lists of passwords on a public webserver. But this is slashdot, and we all already knew that.

--
___
The way to see by faith is to shut the eye of reason. --Ben Franklin
Re:How this happens by dse · 2001-11-26 06:51 · Score: 1

kilgore_47 wrote:

Regardless of how the search engine gets the link, however, the indexing software SHOULD drop anything after the '?' character anyway (to avoid indexing the same cgi repeatedly with different arguments).

Unfortunately there are many sites containing large numbers of articles that use URLs that contain article ID numbers in the query string to link to stories, and those would stop getting indexed. Slashdot itself is one of them. Example URL for this story:

http://slashdot.org/article.pl?sid=01/11/26/154324 5
(ignore the space above)

Yes it's a really dumb design but unfortunately nobody takes URLs into consideration when designing web sites. Hasn't anyone heard of PATH_INFO? Hack article.pl to use it and make your URLs look something like this, for example:

http://slashdot.org/article.pl/2001-11-26/1543245
(PATH_INFO would be "/2001-11-26/1543245", parse as needed)

You could even use QUERY_STRINGS in conjunction for things like "mode=nested".

--
--
This web site will cure all your ailments.
Re:How this happens by LinuxParanoid · 2001-11-26 07:11 · Score: 1

Heck, even if the page doesn't have external links on it, but the manager clicks on his bookmark list to visit his favorite goofoff site immediately afterward, the referrer has his password.
Re:How this happens by Anonymous Coward · 2001-11-26 07:36 · Score: 0

Only in Netscape 4, which you should have replaced already. Correctly designed browsers do not send any Referrer unless the new page was an actual link (or form submission) on the previous page.
Re:How this happens by mlinksva · 2001-11-26 07:36 · Score: 2

If you do have a "sekrit" internal page with external links, you can prevent the referer from being sent by changing your links from
<a href="http://foo.com">Before Bar</a>
to
<a href="javascript:window.location='http://foo.com'" >Before Bar</a>
Re:How this happens by Anonymous Coward · 2001-11-26 07:57 · Score: 0

As a lynx user, I find this really fucking annoying. Do you never browse with javascript off?
Re:How this happens by mlinksva · 2001-11-26 08:15 · Score: 1

I would never suggest replacing links with javascript on a public page. On a "sekrit" page that only you or members of your organization will browse, I think it's quite ok to require javascript for the sake of obskuring the page as a referer.
Re:How this happens by damiam · 2001-11-26 08:22 · Score: 1

Slashdot doesn't want these dynamically generated pages indexed, which is why they're blocked by robots.txt. The archived stories do have static html pages with normal URL's.

--
It's hard to be religious when certain people are never incinerated by bolts of lightning.
Re:How this happens by raynet · 2001-11-26 11:43 · Score: 1

using javascript to change URLs is stupid and wont work with Lynx etc. (Or with my IE because I have javascript turned off)

Just use HTTP-authentication (Apache or Perl/PHP-script [or IIS]) and you have no probs.

Or if you really have a secret page use HTTPS or SSH and less.

--
- Raynet --> .
Re:How this happens by mlinksva · 2001-11-26 13:06 · Score: 1

Even if you use authentication and/or https or ssh, you still may wish to prevent referrers from going out. Let's say on your intranet you have a set of pages detailing your competition. To make these pages useful internally, you want to link to competitors' web sites. You don't want "https://intranet.foo.com/competetion/bar.html" to show up in their referer log even if they can't view the page in question.
You don't want them to know that you're tracking them, nor do you want to make it easy for them to zero in on relevant documents if you do have a security breach (imagine that).

Hell, No. by tomblackwell · 2001-11-26 04:59 · Score: 1, Funny

You should be writing that type of data on the backs of envelopes and leaving them scattered around your living room...

Where is YOUR speech license? by Unknown+Poltroon · 2001-11-26 04:59 · Score: 2

WHere have you put your license to speak yoour mind on slashdot? Surely, people cant go around putting anything they want to say into a public forum. They might say anything. A a matter of fact, we must revoke peoples phone privliges until lthey can proove theyre smart enough not to give out credit card numbers to telemarketers. As a matter of fact, lets just legislate intelegence. We can tack it on as a rider for that bill to make Pi = 3.
Youre a nitwit. Im revoking your speech licnese on slashdot.

--
All Troll + "offtopic" mods are meta moderated as "Unfair", because you abused the system.

Easy solution by Arethan · 2001-11-26 05:00 · Score: 1, Redundant

Your crawler is caching credit card numbers you say? Simple, check the content you cache for 16 digit numbers. Any that you find, you check with a simple LUHN (mod 10) algorithm. If it passes, you replace the number with "################" or a similar masking.

There, all credit card numbers will now filtered from your cache.

I understand the severity of the issue, and it's good to know this is happening, but the solution is simple.

Re:Easy solution by Cederic · 2001-11-26 05:33 · Score: 1

Meanwhile, Mr Bad Guy does a search in Google for ################ and finds all the pages with credit card numbers on them. Then goes to those pages and accesses the raw data.

Maybe the solution isn't so simple?

~Cederic
Re:Easy solution by Grit · 2001-11-26 05:38 · Score: 1

This only helps if the document is only in Google's cache. If it isn't, the attacker can just follow the link and read the original document.
Excluding entire documents that happen to match the checksum from the search results would work, but it'd be interesting to see how many false positives would result...
Re:Easy solution by Anonymous Coward · 2001-11-26 05:50 · Score: 0

ok, genius... what about passwords?

What is the cost to google to implement some work-around like this? Surely their crawler/indexer has enough to do without testing content for 16-digit numbers? Oh, and if they were to detect CC#s, would they then have the responsibility to inform the indexed site of the problem? Where would it end?

No, their stand is (and ahould remain) that they are indexing anything public and it is the responsibility of the webmaster to make sure that private information is not made public. duh
Re:Easy solution by Arethan · 2001-11-26 06:05 · Score: 2

My appologies. I should have been more clear in my intent. Yes, simply masking credit card numbers in pages would allow people to simply search for the mask and follow the same link google did in order to see the unmasked result.

However, my intention was simply to remove Google's legal implication of storing credit card numbers that were not willingly given by the cardholder. They could also autonomously send an email to webmaster@offendingsite.com notifying them of the potentially vulnerable link, entirely from the kindness of their hearts. But, legal issues in the past have shown that this would result in a cease-and-desist and a lawsuit against Google claiming that the crawler/spider has been hacking their website.

Judging from the past, from a legal standpoint, the best thing they can do is simply filter their cached content. If you are worried that people are going to search for ################, then disallow searching for something so erroneous. Or simply change the mask from all # to random special characters.

It's really not that difficult of a solution. Yes, it's a little disturbing that some websites are this easily hacked, but are we really all that surprised? Get into the low-end ecommerce business sometime. You'll be surprised (frightened even) with what some people have been using for their online stores.

Not just credit cards by Jodrell · 2001-11-26 05:01 · Score: 1

I run a website that pulls a lot of content from other servers. We use to have a newsfeed via ITN's RDF feed - until I got a call from their Director of New Media asking me to take it off. Seems they charge a hefty fee for such a feed - around £30,000 - but hadn't taken any attempts to protect it with a .htaccess file or something. How did I find it? By searching Google!

Re:Not just credit cards by GigsVT · 2001-11-26 09:14 · Score: 2

Why not just tell them to fuck off. If they want to control who links to their feed, then they should.

As long as it is publicly available, I seriously doubt they could successfully charge you for it.

It's like setting up a large arch in a public park, and when people walk under it, demanding $100,000 from them.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:Not just credit cards by penguinboy · 2001-11-26 14:21 · Score: 2

As long as it is publicly available, I seriously doubt they could successfully charge you for it.

If their newsfeed is copyrighted, I doubt they'd have much trouble prosecuting someone for duplicating it elsewhere without authorization (especially if elsewhere was a commercial site). Looking at unsecured data may be possible, but using illegally is still illegal.

Basic Authentication by KyleCordes · 2001-11-26 05:01 · Score: 3, Insightful

[know how Basic Authentication works before hosting web sites]

... and know that it's a wholly inadequate way of "protecting" credit card numbers!

Re:Basic Authentication by cjpez · 2001-11-26 06:08 · Score: 1

... and know that it's a wholly inadequate way of "protecting" credit card numbers!
Well, yeah. I think the quote was more meant to imply that even something so mindlessly simple as basic HTTP auth was okay, thus illustrating the astounding triviality of keeping your sensitive data off of search engines. :)

--
Al Qaeda has ninjas!
Re:Basic Authentication by kilgore_47 · 2001-11-26 07:02 · Score: 1

The only adequate way to protect creditcard numbers on a publicly accessable machine is strong encryption. If you aren't going to encrypt your sensitive data, you better make sure it is stored somewhere physically disconnected from the net.

--
___
The way to see by faith is to shut the eye of reason. --Ben Franklin

spam by flollywebfrog · 2001-11-26 05:01 · Score: 1, Informative

The other day I was using google to explore the files of an annoying spammers site [referralware.com]. Simply searching for a few numbers with the query site:.referralware.com brought up search results in their unprotected source.referralware.com directory that included all the credit card logs for the past week. And I am just an average computer joe user ... this is a problem if I can be a "hacker" with less knowledge than a script kiddie!

--

________________
All my sig are fjdklafjkldafjkldafdaklf

robots.txt by mukund · 2001-11-26 05:02 · Score: 2, Interesting

From my web logs, I see that a lot of HTTP bots don't care crap about /robots.txt. Another thing which happens is that they read robots.txt only once and cache it forever in the lifetime of accessing that site, and do not use a newer robots.txt when it's available. It'd be useful to update what a bot knows of a site's /robots.txt from time to time.

HTTP bot writers should adhere to using information in /robots.txt and restricting their access accordingly. In a lot of occasions, webmasters may setup /robots.txt to actually help stop bots from feeding on junk information which they don't require.. or things which change regularly and need not be recorded.

--
Banu

Re:robots.txt by mobiGeek · 2001-11-26 07:56 · Score: 2

Though I do concur that 'bots should respect the robots.txt protocol, one must remember that /robots.txt does not solve the problem being highlighted by this article.

--
...Beware the IDEs of Microsoft...
Re:robots.txt by WNight · 2001-11-26 18:50 · Score: 2

Why should bot authors have to follow robots.txt as if it's a law or something?

Robots.txt is as much for their benefit as it is for the benefit of the site author. It saves the bot from indexing something that might confuse it, CGIs that auto-generated an infinite number of pages for example.

If the bot author knows this, and wants to see these, why shouldn't they read them?

Bringing this up in an article about google suggests that they don't follow the robots.txt (even though you didn't say it directly). And it implies you think this would have been fixed had they.

When do site authors actually have to take responsibility for this? If you really object to someone mirroring or indexing your site, block that. Either with the user agent, or by detecting sequential accesses, or something.

Many crawlers ignore robots.txt by Ars-Fartsica · 2001-11-26 05:02 · Score: 3, Interesting

I do not know if this is still the case, but Microsoft's IE offline browsing page crawler (collects pages for you to read offline) ignored robots.txt last time I checked. I know many other crawlers do likewise.

Re:Many crawlers ignore robots.txt by Syberghost · 2001-11-26 06:33 · Score: 2

Other crawlers that do listen to robots.txt can be duped into effectively ignoring it.

For example, try this with wget sometime:

wget -r somesitethathasrobots.txt
su -
chown root:root robots.txt
cat /dev/null >robots.txt
chmod 0000 robots.txt
exit
wget -r somesitethathasrobots.txt

voila, wget now thinks it's observing robots.txt, but robots.txt is a zero-length file, and it can't overwrite it because only root can write to that file...
Re:Many crawlers ignore robots.txt by tjwhaynes · 2001-11-26 06:51 · Score: 2

That is a little extreme! Just add 'robots = off' to your .wgetrc file and wget will ignore any robots.txt on the site it is crawling.
This is extremely bad netiquette so DON'T DO IT

Cheers,

Toby Haynes

--
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
Re:Many crawlers ignore robots.txt by Zagadka · 2001-11-26 12:25 · Score: 1

This is much simpler:

mkdir -p example.com touch example.com/robots.txt wget -r -nc http://example.com/

Note the -nc ("no clobber") option, so you don't have to screw around with su. (and you don't need to download the whole site twice if you take a minute to think about where the robots.txt will go...)

Yes, I know about the .wgetrc setting, but this is good if you want to grab something off of just one site (like http://www.xml.com/axml/testaxml.htm) but don't want to have to worry about forgetting to re-enable robots.txt handling.
Re:Many crawlers ignore robots.txt by jesser · 2001-11-26 15:13 · Score: 2

Why is it bad netiquitte to use wget on sites that use robots.txt? Robots.txt is aimed at search engines and is primarily used to keep search engines from downloading dynamic data or an infinite number of pages. I only use wget to avoid downloading a large number of links manually, and I'm always careful to make sure I only download what I'm trying to download.

--
The shareholder is always right.
Re:Many crawlers ignore robots.txt by Syberghost · 2001-11-26 17:44 · Score: 2

True, but my example is more extendible to the general case of ANY crawler that observes robots.txt.
Re:Many crawlers ignore robots.txt by Zagadka · 2001-11-29 11:36 · Score: 1

Not really. One would think that most reasonable crawlers would cache the parsed robots.txt in memory while crawling a site, rather than re-reading the one on disk that it just tried (and failed) to write. It's very surprising that wget behaves the way it does. Your solution relies on a quirk of wget, as does mine. Mine works for ordinary users though, while yours requires root access. Mine is also shorter. :-)
Re:Many crawlers ignore robots.txt by Syberghost · 2001-11-29 12:53 · Score: 1

Mine is also shorter. :-)

Now there's something you don't see men bragging about often. :-)

Oh, for regular expression searching in Google by EnglishTim · 2001-11-26 05:03 · Score: 5, Funny

I could be a rich man...

(Not, of course that I'd ever do anything like that...)

Searching with regular expressions would be cool, though...

Directory listings by NineNine · 2001-11-26 05:03 · Score: 2, Informative

Most of tihs is coming from leaving directory listing turned on. Generally, this should only be used on an HTTP front-ends to FTP boxes, and for development machines. IIS has "directory browsing" turned off by default. Maybe Apache has it turned on by default? You'd be surprised to see how many public webservers have this on, making it exceedingly likely that search engines will find files they weren't meant to find. The situation arises when there's no "default" page (usually index.html or default.html, default.asp, etc.) in a directory and only a file like content.html in a directory. IF a SE tries http://domain.com/directory/, it'll get the directory listing, which it can, in turn, continue to spider.

Re:Directory listings by kilgore_47 · 2001-11-26 08:13 · Score: 1

Most of tihs is coming from leaving directory listing turned on.

Not really.

It's easy to blame directory listing, but having directory listing off and leaving sensitive data laying around is still very very bad. If a file CAN be viewed, you should assume it will be, and having directory listing off won't change that. And once the URL gets sent along to someone else's server in a referer, it's really not so hidden anymore. Security by obscurity is always a bad idea. Directory listing hardly makes a difference.

--
___
The way to see by faith is to shut the eye of reason. --Ben Franklin

Must... blame... someone.... by JMZero · 2001-11-26 05:04 · Score: 3, Funny

INetPub means "INetPublic" not "INetPubrobably a great place to put my credit card numbers".

Why are stupid people not to blame for anything anymore?

--
Let's not stir that bag of worms...

Re:Must... blame... someone.... by Hektor_Troy · 2001-11-26 05:25 · Score: 1

Because that would mean that liability would no longer apply, making a whole lot of lawyers unemployed.

How come the US judicial system doesn't demand that you use common sence? Should it really be nescesary to write "keep out of children"(*) on a warning label for a kitchen knife?

*No, I didn't leave any words out of the sentence.

--
We do not live in the 21st century. We live in the 20 second century.
Re:Must... blame... someone.... by Anonymous Coward · 2001-11-26 07:56 · Score: 0

sense is not spelled with the letter "c"

Evil Robot? by StevenHallman76 · 2001-11-26 05:05 · Score: 1

So, where might one find an 'evil' robot that looks specifically in places robots.txt tells it not to? hypothetically speaking, of course...

Re:Evil Robot? by Nick+Arnett · 2001-11-26 06:25 · Score: 1

It would be pretty darn simple to write one, given the various public domain sources for robots, most of which support robots.txt. All you'd have to do is reverse the logic of which pages to look at.

More to the point, anybody who is using robots.txt to keep information secret totally misunderstands its purpose. It has absolutely nothing to do with secrecy. Its purpose is to keep robots out of pages that shouldn't be crawled for other reasons. Examples include pages that change so fast that by the time they roll into a search engine index, they'll already have changed, or pages generated dynamically, which can send robots into black holes of recursion.

If you're seriously interested in robots.txt, there's a mailing list for it (I'm the owner). Send "subscribe robots" in the body of a message to "listar@mccmedia.com". Robots.txt could get smarter, but it'll never, ever, be a privacy mechanism. It is focused on publicly accessible pages, after all.

Nick

Business Model by Alomex · 2001-11-26 05:05 · Score: 5, Funny

A while back there was a thread here about the weakness of the revenue model for search engines. Maybe we have found the answer, think about all the revenue that Google could generate with this data!

Anybody knows when Google is going public?

Re:Business Model by Anonymous Coward · 2001-11-26 09:31 · Score: 0

Try blackmail...

well golly gosh, it works! by Anonymous Coward · 2001-11-26 05:05 · Score: 2, Informative

search for: password admin filetype:doc

My first hit is:

www.nomi.navy.mil/TriMEP/TriMEPUserGuide/WordDoc s/ Setup_Procedures_Release_1.0e.doc

at the bottom of the html:

UserName: TURBO and PassWord: turbo, will give you unlimited user access (passwords are case sensitive).

Username: ADMIN and PassWord: admin, will give you password and system access (passwords are case sensitive).

It is recommend that the user go to Tools, System Defaults first and change the Facility UIC to Your facility UIC.

oh dear, am I now a terrorist?

Bring out the legal eagles by Milican · 2001-11-26 05:06 · Score: 4, Insightful

"Webmasters should know how to protect their files before they even start writing a Web site"

That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.

To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light :)

JOhn

--
Campaign for Liberty

Re:Bring out the legal eagles by malkavian · 2001-11-26 06:52 · Score: 2

Hmmm.. All I've seen lawyers do so far (in the most part anyhow) is be employed by the people with the money (i.e. the stupid people who are likely to put insecure documents on the web) to make it illegal to look at the stuff they don't want you to see.
The credit card data, although in a public area was "Not authorized for Transmission". Which means that any access by a bot was unauthorized access to the machine in question.
This, as I understand it, is now being classified as a terrorist act. If not, at least a highly illegal action.
Thus, Search engines are now tools of terrorists/criminals. The provision of those numbers (if they are ever used) could be set up in a case as direct theft by the search engine, or complicity in the final actions. The owners of the search engine could probably end up in court if owners of 'sensitive information' ended up deciding to pass the blame on and sue.
This is going to be another license to print money by a few Lawyers who decide it's worth fighting a few cases.
I'm hoping sense prevails, but I think in the long run, some silly person is likely to sue...
Re:Bring out the legal eagles by LoonXTall · 2001-11-27 05:07 · Score: 1

"...these guys need to get a firewall..."
How can a firewall stop an HTTP request? It would still get through, even if there was a proxy, since it's indistinguishable from a request for any other page. If you know how to (and do) make your proxy filter requests for sekrit files, then you are probably also smart enough to know that those files shouldn't be in publicly accessible directories.

--
~~~LXT~~~
Life is like a computer program: anything that can't happen, will.
Re:Bring out the legal eagles by Milican · 2001-11-27 07:41 · Score: 2

Thats a good point. I guess the only way a firewall could help is if you setup another server (web or other) on another port and then only allowed selective access to that directory. Of course, a better approach (ando more secure) would be to only allow secret data to be accessible via the intranet or VPN. However, this isn't very likely since most small shops are hosted remotely and have multiple (100s) of web stores per server. Great point though firewalls would not help in this case.

JOhn

--
Campaign for Liberty

Did you know by hyyx · 2001-11-26 05:06 · Score: 0, Troll

that you can use "file://[address]" to find pages and directories that are NOT linked to on a server (if the server allows it)?

Re:Did you know by smcv · 2001-11-26 05:26 · Score: 1

Only if you are running the server, or have it mapped to a network drive or in Network Neighbourhood (Windows) or mounted on your filesystem somewhere (Unixes).

file:// links to files on your local filesystem.

Examples:

file:///c:/Windows/Notepad.exe - a well-known executable on a Windows 9x box
file:///bin/bash - a well-known executable on a Linux box

If you click either of these, you will be downloading the copy of the file on *your* computer. Not anyone else's, certainly not a remote server's.

The only way you can access files located on another computer by file:// URLs is if the other computer has shared them by Windows Networking or Samba or NFS or something, and you've mounted them on your local filesystem. (Actually that's not quite true, Windows will interpret file://mywinntbox/sharename/file as the NetBIOS UNC path \\mywinntbox\sharename\file, so if the file is shared with Windows Networking or Samba you don't need to have mounted the share as a network drive)

On the other hand, typing a URL rather than following a link will retrieve any resource that URL points to, so yes, you can get a certain amount of security-through-obscurity like that. (example: many web servers have /serverstatus or /admin which isn't usually linked to, although those should be password-protected too)

Robots search by links right? by linuxrunner · 2001-11-26 05:07 · Score: 1, Troll

The search engines use robots, and the robots read your site through links... So unless the file is in the root directory or has a direct link to the information. It should not show up.

So create a folder called "mystuff" and keep everything in it... and don't create a link to it, just remember it and type in the url.
http://www.my-site.com/mystuff
You'll then be sent to your secret folder that no one knows about, even the robots.
So I'm not sure what all the yelling is about. Just do that, or set up the robots.txt correctly, but most people don't realize they can do that....

--
www.slightlycrewed.com - Because aren't we all?

Re:Robots search by links right? by amattie · 2001-11-26 07:44 · Score: 1

Idiot.... Did you not READ the article? Maybe you did and you just don't understand. Regardless, when you go to a web site, it is very easy for that web site to grab the last web site you were just at, so if you go to your secret folder and then go to some other web site that will catalog that data for whatever purpose and then publish it, well then there you go -- problem arises. Just like everyone keeps saying, DON'T PUT CONFIDENTIAL DATA ON HTTP! It's not that hard people....
Re:Robots search by links right? by mobiGeek · 2001-11-26 08:13 · Score: 2

So create a folder called "mystuff" and keep everything in it...
That might work fine for you, but when your pin-headed manager or PFY find out about this cache of documents, you can bet your bottom dollar they'll add a link off their homepage or some brilliant location like that.
You mightn't link to it...but someone eventually will.
Also, realize that many robots don't guess at URLs...but it would be trivial to create a 'bot which did just hack away.
for i in `strings /usr/lib/aspell/* | sort -u` do wget "http://www.3l33tsyt.kom/$i" done

--
...Beware the IDEs of Microsoft...

No easy solution in sight? by vrmlguy · 2001-11-26 05:08 · Score: 2

From the article: "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security." -- Gary McGraw

Allow me to disagree. This fellow apparantly agrees with Microsoft that people shouldn't publish code exploits and weaknesses. Sorry, but anyone who had secret information available to the external web is in the exact same boat as someone who has an unpatched IIS server or is running SQL Server without a password.

Let's assume that Google had (a) somehow figured out from day one that people would search for passwords, credit card numbers, etc, and (b) figured out some way to recognize such data to help keep it secret. Should they have publisized this fact or kept it a secret? Publicity would just mean that every script kiddie would be out writting their own search engines, looking for the things that Google et al were avoiding. Secrecy would mean that a very few black hats would write their own search engines, and the victims of such searches would have no idea how their secrets were being compromised.

But this assumes that there's someway of accomplishing item (B), which I claim is very difficult indeed. In fact, it would be harder to accomplish than natural language recognition. Think about it... Secrets are frequently obscure, to the point that to a computer they look like noise. Most credit cards, for example, use 16 digit numbers. Should Google not index any page containing a string of 16 consecutive digits? How about pages that contain SQL code? How would one not index those, but still index the on-line tutorials at MySQL, Oracle, etc?

The only "solution" is to recognize that this problem belongs in the lap of the web site's owner, and the search engine(s) have no fundamental responsibilty.

--
Nothing for 6-digit uids?

And please close the door on the way out.... by pwagland · 2001-11-26 05:09 · Score: 2

But other critics said Google bears its share of the blame.
"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Am I the only one scared by this? The problem is googles, simply because they follow links? I find it hard to believe this stuff sometimes!

<rant>When will people learn that criminals don't behave? That is what makes them criminals!</rant>

As our second year uni project we were required to write a web index bot. Guess what? It didn't "behave". It would search through a robots.txt roadblock. It would find whatever their was there to find. This stuff is so far from being rocket science it is ridiculous!

Sure, using Google might ease a tiny fraction of the bad guys work, but if Google wasn't there, the bad guys tools would be. In fact, they still are there.

Saying that you have to write your client software to work around server/administrator flaws is like putting a "do not enter" sign on a tent. Sure, it will stop some people, but the others will just come in anyway, probably even more so just to find out what you are hiding.

True. by tomblackwell · 2001-11-26 05:09 · Score: 2

It will stop the casual perusal of your data.

The way to stop the determined snooper is to not keep your data in a directory that can be accessed by your web server.

Sure enough. by Joe+Decker · 2001-11-26 05:12 · Score: 3, Interesting

Looked up the first 8 digits of one of my own CC numbers, and, while I didn't find my own CC # on the net, I did immediately find a large file full of them with names, expiration dates, etc. (Sent a message to the site manager, but this case is pretty clearly an accidental leak.)

At any rate--scary it is.

--
I'm a nature photographer.

Re:Sure enough. by Nate+Fox · 2001-11-26 09:33 · Score: 2

OMG...I did the same thing. I found a 1.3M text file with ~3500 CC#s, names, expiry dates, et al. Heck, its a comma delimited file with email addresses and even comments on 'how did you hear about our site?' (to which one person responded: My friend at school told me about it and then i heard it with my ears.)
Actually, it seems as if it was obtained from PimpIT.com, so I suggest to anyone who has bought anything from them online to look into it.
(thankfully my name isnt in there)
Re:Sure enough. by Joe+Decker · 2001-11-26 10:50 · Score: 1

Actually, it seems as if it was obtained from [site], so I suggest to anyone who has bought anything from them online to look into it. (thankfully my name isnt in there)
I strongly recommend that you inform a contact at the web site as soon as possible. I got a very fast response in my own case. You may find a good contact by looking at the web site at the main domain you found the file at and looking for a contact address, or alternatively by looking up the domain contact information at internic or something. For best results, provide them with the URL of the file you found, that it's generally web accessable, and that you found it through a search engine. I also recommend politeness, despite the gross negligence involved--you'll get a faster response, and the person you'll be talking to is probably not the person who actually coded the web site.
Better to light a single candle than to curse the darkness.

--
I'm a nature photographer.

OKC by TheMidget · 2001-11-26 05:12 · Score: 1

President Bush and his staff are very concerned about a cyberwar, because it can be waged without physically having Arabs in the States to commit the terrorism. That is very dangerous indeed.

Well, terrorism can easily be waged without having Arabs in the States, even without resorting to cyberwar. As Oklahoma City has shown, it's enough to have Rednecks in the States. Kudos though for disguising your racist drivel well enough to get modded up to 2.

Don't know that this is Google's problem.. by sid_vicious · 2001-11-26 05:13 · Score: 2

From the article:
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Search and replace "Google" with "Microsoft". The lack of security is in the operating system and the applications which launch the malicious files without warning the user. Google just tell you where to get 'em, not what to do with 'em.

--
If it ain't broke, it doesn't have enough features yet.

Web Sites are public by definition by hattig · 2001-11-26 05:14 · Score: 4, Insightful

It is a simple rule of the web - any directory or subdirectory thereof that is configured to be accessible via the internet (either html root directories, ftp root directories, gnutella shared directories, etc) should be assumed to be publically accessible. Do not store anything that should be private in these areas.

Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.

But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).

Re:Web Sites are public by definition by Anonymous Coward · 2001-11-26 05:43 · Score: 1, Insightful

It is a simple rule of the web - any directory or subdirectory thereof that is configured to be accessible via the internet (either html root directories, ftp root directories, gnutella shared directories, etc) should be assumed to be publically accessible. Do not store anything that should be private in these areas.

am i misreading this, or are you suggesting that no private information should ever be accessible via the web, regardless of precautions taken during implementation?

personally i think that's going a bit too far. for example, i'm fairly confident that my banking information, accessible online at my bank's website over https and protected by password, is safe. and if it's not? well, that's why the bank is insured.
Re:Web Sites are public by definition by hattig · 2001-11-26 05:58 · Score: 1

Yes, but your bank doesn't store its customers details in a giant text file "customers.txt" that it stores in /var/httpd/secure/ does it? It doesn't store your account details in a file called "customer2947734.txt" either...
It will be a long time before search engines can log into a (secure) web based system (brute force username/password attack :) ) and then browse it as if the details being provided are actually in that public webspace.
In reality, the details are most likely in a tightly locked down Oracle database that is not accessible from the Internet (ideally, I would bet that it was though, if only for remote maintainence in emergencies or whatever).
Re:Web Sites are public by definition by sxpert · 2001-11-26 07:15 · Score: 1

I checked google for "customer.txt" and you have no idea on how often professional developers magazine give advice on making such files :(
Re:Web Sites are public by definition by Anonymous Coward · 2001-11-26 11:47 · Score: 0

I dunno, my user name is just my social security number with my 4 digit ATM pin number as the password for my bank's online site. :-) I think it's pretty stupid but there's no way for me to change it online without talking to a human (ick). No big deal since their web site only allows you to look at your statements and not transfer funds and such.

What a bunch of idiots, was Re:Bad manager ideas by pdqlamb · 2001-11-26 05:15 · Score: 1

Absolutely. And this is supposed to be a network security outfit. (disgusted grimace) Trouble is, these idiots fail to make clear where the responsibility for "solid software design" lies -- right on the shoulders of the people putting the information out in the open. It's like taking Martha Stewart's idea to the extreme -- collect all the credit cards in town and line the public swimming pool with them, and then put signs up saying, "Please do not copy down credit card numbers!"

Maybe we do need some kind of accreditation. Any idiot can claim to be a security expert in the computer field. Can any convicted burglar claim to be a locksmith?

Disagree With Gary McGraw by devnullkac · 2001-11-26 05:16 · Score: 4, Insightful

Near the end of the article, there's a quote from Gary McGraw:

The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief.

I must say I couldn't disagree more. To suggest that web site administrators can somehow entrust Google to implement the "obscurity" part of their "security through obscurity" plan is unrealistic. As an external entity, Google is really just another one of those "bad guys" and the fact that they're making your mistakes obvious without actually exploiting them is what people where I come from call a Good Thing.

--
What do you mean they cut the power? How can they cut the power, man? They're animals!

Re:Disagree With Gary McGraw by Software · 2001-11-26 06:49 · Score: 2

I agree with your disagreement. The amusing part is that, in the proper context, McGraw's second sentence in his statement makes perfect sense. However, given the context here, it's nonsense. Google is not the insecure system here. It's the silly webmasters who have secret data at publicly accessible URLs that are the problem. Nobody cracked Google to get sensitive data - it's doing what it said it would do. From the quote, it would seem like people are abusing Google; instead, it's the webmasters who are abusing the users who entrusted them with sensitive data.
I would not say, though, that Google is making the webmasters' mistakes obvious. Google doesn't notify webmasters, "Hey, you're an idiot. Fix your site". Furthermore, if I'm a webmaster who thinks there might be some sensitive info from my site in Google, how do I use Google to find it? OK, I could figure out how to search Google for pages only from my site that contain "passwords" or something like that, but that's a bit much for a clueless webmaster to do. If he thought that might reveal a problem, he should know where to look without checking Google. I'm not faulting Google; it's not Google's responsibility to hit webmaster with the clue stick.
Unless McGraw's statement was taken hopelessly out of context (which is quite likely), he's an idiot. It's not Google's responsibility to think about security of other people's sites.
Re:Disagree With Gary McGraw by gemcigital · 2001-11-30 07:13 · Score: 1

I vehemently disagree with Gary McGraw too...and I *am* Gary McGraw.
I actually made a number of reasonably salient points during the long interview, but the reporter seems to have latched on to a twisted version of what I meant. Alas, this happens all the time. It's one of the classic risks of talking to the press!
Sorry to have sounded like a bufoon! We all know that security problems are really the fault of the telecom providers...right?!
gem
Gary McGraw

Buy Dell by Anonymous Coward · 2001-11-26 05:17 · Score: 0

I'm advocating the purchase of DELL computers, don't be tricked into buying any other brand.

Visit their website today !!!

standing naked in front of the window by eddy+the+lip · 2001-11-26 05:17 · Score: 3, Interesting

But other critics said Google bears its share of the blame.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

also known as ostrich security...if you're s00p3r s3cr37 files are just lying around waiting for idle surfers, search engines are the least of your worries. if you don't know enough to protect your files (by, say, not linking to them, or .htaccess files, or encrypting them), it's not the search engines fault. it's you're own dumb ass.

this guy's just looking for free hype for his book. if that's the kind of advice he offers, he's doing more harm than good.

--

This is the voice of World Control. I bring you Peace.

Re:standing naked in front of the window by SirSlud · 2001-11-26 05:34 · Score: 2

I agree. I was going to say the same thing as the subject of your post .. should the cencus guy get charged with being a peeping tom if he comes up to your house while you're buck naked in front of an open livingroom window?

> .. "software to behave itself."

When asked to clarify furthur, Gary said, "Uh .. you know, like .. uh, C3PO .. and .. uh, Data." I hate idiots like that ... I suppose he thinks Windows should 'ask' you if you want to install viruses (cause heaven forbid a user should have to know anything about protecting their computer), and your hard drive should kindly suggest you upgrade a few days before it bites the dust? Yeah .. technology .. thats the problem .. we just havn't invented anything perfectly enough yet. Sigh .. grab the O'Reilly, it's time for a good old-fashioned CTO beating.

--
"Old man yells at systemd"
Re:standing naked in front of the window by nanojath · 2001-11-26 05:56 · Score: 1

It's sort of like the twisted logic behind the DMCA... copyright holders need to encrypt to protect their intellectual property... but their encryption schemes fail if faced with any serious attempt to circumvent them... so let's make circumvention tools illegal.

Locks fail so you make lockpicks illegal. Ah, but crowbars can still break the lock so lets make crowbars illegal. It's all stupid: you can lock the door of your home with a luggage lock, if a thief breaks it to get in, the cops my laugh at you but if the thief gets caught it's still "breaking and entering." You don't need to outlaw screwdrivers because they can break a "technical" lock.

Credit card fraud is illegal. There are a million ways to get numbers. If search engines find them by merely being thorough, then anyone that wants them will be able to find them that much more easily. Should dumpster manufacturers bear the onus of creating "dive proof" dumpsters? Maybe carbon paper manufacturers need to make self-destructing carbons so you can't fish old-fashioned credit card receipt carbons out of the trash.

--
It Is the Nature of Information to Transgress Artificial Boundaries
Re:standing naked in front of the window by SirSlud · 2001-11-26 07:28 · Score: 2

Yep. Well, unfrotunately, we're still stuck in the rut of thinking technology solves social problems. Fraud is a social problem, but thinking we can create technology to prevent it (beyond reasonable measures that inconvenience noone but go a long way to prevent spur-of-the-moment offences and casual fraud) is a paradox. Creating technology that cannot be used for questionable purposes is impossible. People who study the interaction between social behaviour and technology know its pretty much the other way around; technology changes and evolves human behaviour above and beyond how it was pre-adoption of a technology, but never ever causes humans to /stop/ doing something.

For instance, the breathalizer that may be installed on car ignitions in the future may prevent some drinking in driving, it is far more likely to change people's behaviours surrounding the issue of drunk driving - for instance, it may form a social pattern where by sober friends help drunk friends start their cars. Engineers (including software engineers) are unable to devine how their innovations will be used (ie, nobody can predict the future beyond reasonable assumptions). Should we be putting the inventor of the aerosol whipped cream can in jail for getting all those high school students high ... ?

More poingant than the actual way the government goes about assinging blame for misused technology is the hypocracy of it all. Considering the pace at which we are forced to invent and deploy to drive capitalism, these types of situations should be considered the cost of doing business the way it's currently done. Now we are stuck with an infrascructure that only a MINUTE percentage of the population actually understands, and the creators of that technology are the ones being blamed for its mismanagement and misuse. Sigh. I can only hopy that in 30 years, the population will be more in tune with the limitations of computer and software technology, in the same way that people have enough of an understanding of cars now so as not to immediately lay blame on the manufacturer when someone drives at 180kph into their local river.

--
"Old man yells at systemd"
Re:standing naked in front of the window by eddy+the+lip · 2001-11-26 08:15 · Score: 1

Considering the pace at which we are forced to invent and deploy to drive capitalism, these types of situations should be considered the cost of doing business the way it's currently done.

very interesting point...consumer demand (and the way that demand is enshrined in law, for example, the obligation of public traded companies to do their best to maximize shareholder profit) really is driving us down a very dangerous path. It's not so much the unfettered advance of technology, but the way that advances are demanded and regulated by those with, at best, a shallow understanding of the technology itself.

inevitably, this leads to poor implementation and poor laws to govern our shiny creations. and when the Bad Thing happens, fingers are pointed at the creators, and legislators are looked to for salvation.

(insert rant here about our blame oriented culture...)

madness

--
This is the voice of World Control. I bring you Peace.
Re:standing naked in front of the window by nanojath · 2001-11-26 08:52 · Score: 1

>insert rant here about our blame oriented >culture...)

I'm not so annoyed by the need to blame (hey, if someone steals my credit info and causes me ten kinds of hassle I WANT somebody to blame) as the stupid assignment of blame, which I'm tempted to say is a side-effect of our litigation-crazy culture - if you're suing someone you want as many targets as possible to increase the restitution pool, if you're being sued you want to spread the blame around to reduce your own liability. In this case, it's easy to assign blame - a person comitting Fraud is number one to blame, the people responsible for making your credit info so insecure take second place. I might even go so far as to say that, considering what it costs them and hence their shareholders, employees, and customers, the credit card companies could probably be more proactive in at least getting anyone that takes credit cards for commerce to agree to a minimum data protection protocol, and provide them with information on how to ensure that.

Who shouldn't be blamed is Google. Search engines make e-commerce and damn near everything else on the internet possible and practical. For Search Engines to be possible they cannot be burdened with being culpable for the data they uncover - it's just a quagmire that makes effective searching impossible. Hell, the businesses whose business it is to block specific content can only do so to a degree. The fact that Google will stay away from anything it's told to is all they need to provide. The failure to even restrict well-behaved crawlers demonstrates that the person putting the data out there has either no idea how the internet works at all or no concern for their customers' data. Either way they shouldn't be in e-commerce. If I ran the credit card companies (he ranted grumpily from his armchair) I would cancel the merchants account of anyone compromising data that badly. It would save everybody but fraudulent bastards money in the long run.

--
It Is the Nature of Information to Transgress Artificial Boundaries
Re:standing naked in front of the window by SirSlud · 2001-11-26 09:09 · Score: 2

> cancel the merchants account of anyone compromising data that badly

Which goes back to my point about the public understanding of a technology. You can't do this right now, because in the public's eyes, werbserver admin = smart guy, web crawler = invasive nameless/screenshotless technology. Therefore, public opinion, should push come to shove, would likely fall on the shoulders of Google. I would imagine, in this case, that Joe Shmoe likens Google seeing your credit card number to hooligans breaking into your house, not as the census taker who spots you naked through your curtainless bedroom window.

As He Who Owns And Runs What becomes clearer to the average joe (usually though sitcom jokes, cliches, movies, Journalists Finally Getting It (it usually happens eventually after years of a technology existing) etc), the consumer will know where the brunt of the blame lies. And while we have too much blame in our society, it does have its legitimate place (as in, publicly accepted accountability and a fair assement of a failure), so effort to make sure the blame is going to the proper places, and making sure the support from the public is directed at the correct entities, will enable the accountability in situations to fall where it should; in this case, the users operating/admin'ing the webserver. But until that admin isn't your uncle or brother (hopefully, it'll be 'you' as being able to offer services becomes more accessible to users), and Google is seen as more of a cencus taker than a peeping tom, it's unlikely that the public at large will know who to pressure for administrative mismanagement of your sensitive data, and it's subsequent accessability via search engines.

--
"Old man yells at systemd"
Re:standing naked in front of the window by eddy+the+lip · 2001-11-26 10:54 · Score: 1

I'm not so annoyed by the need to blame (hey, if someone steals my credit info and causes me ten kinds of hassle I WANT somebody to blame) as the stupid assignment of blame, which I'm tempted to say is a side-effect of our litigation-crazy culture...

fair enough, and that's an important distinction - i'm getting my rants crossed (god knows, i've got enough of them).

i agree, rampant litigation is a key problem, and the way all that litigation is rewarded - there's no incentive not to sue someone, and you might get a buck out of it. it's probably also fair to say that we're generally not trained to think deeply about a problem. that is, we follow something to it's first possible solution point (in this case, google) and then stop.

at the rate we're going, we'll be knee deep in the bodies of messengers in no time.

--
This is the voice of World Control. I bring you Peace.

regular expressions to the rescue by mkbz · 2001-11-26 05:20 · Score: 1

all major credit card numbers follow the same patterns - it would be very easy for google (or any other bot manufacturer), acting benevolently, to write code which recognizes these patterns, excludes them from results and possibly even emails the site admin (automatically? if their META properties are set correctly) to notify them of the security problem.

--
www.pixelectric.com

Re:regular expressions to the rescue by 3am · 2001-11-26 05:39 · Score: 2

or these sloppy admins could store them in encrypted form and/or in a private directory....

i'm sure google knows of a dozen ways they can do this, but why should they? it isn't prohibitively hard to write a spider, and with a 160GB HD for $300, someone with not-so-pure motives and the equivalent of an undergrad education in CS could write one, send it out (ignoring robots.txt), do the reverse of that regex search to sniff out cc#'s online, and create a database full of beer money.

ie, (as has been mentioned n+1 times already) Google changing their behavior does nothing to fix the underlying problem of sysadmins that are undertrained and/or irresponsible.

--

A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.

Hint, Hint. by A_Non_Moose · 2001-11-26 05:22 · Score: 2, Insightful

"The underlying issue is that the infrastructure of all these Web sites aren't protected."

Agreed. Such lax security via the use of Frontpage, IIS, .asp and VBS in webpages.
You might as well do and impression of Duncan in the movie Shrek "Ooo! Ooo! pick me! pick me!"

Webmasters queried about the search engine problem said precautions against overzealous search bots are of fundamental concern.

Uhh...they are "bots"...they don't think, they do.
Does the bot say "Oh, look, these guys did something stupid...let's tell them about it."

No, they search, they index and they generate reports.

I've seen this problem crop up before when a coworker was looking for something totally unrelated on google.
Sad part was it was an ISP I had respect for, despite moving from them to broadband.
What killed my respect was at the very top of the pages was "Generated by Frontpage Express"...gack!

I don't recall if it was a user account or one of their admin accounts...but for modem access I kind of stopped recommending them, or pointed out my observations.

I have to parrot, and agree, with the "Human Error" but add "Computer accelerated and amplified".

It happens, but that does not mean we have to like it, much less let it keep happening.

--
Have you read the moderator guidelines? Well, have you, PUNK? (and I want a Karma: Gnarly option)

In other news by Scrymarch · 2001-11-26 05:25 · Score: 0, Offtopic

Manufacturers Googmat today came under fire for their new transparent doormats.

An industry source who would not be named said "A number of our most important keys were compromised by this new Googmat feature. I realise they wanted to give exciting new options to their customers, but they should really give more careful thought to security before releasing a product like this."

good point... but if I may add by Stalcair · 2001-11-26 05:25 · Score: 1

HTTP bot writers should adhere to using...

and there is the problem. Many will indeed stick to the 'standards', after all why not make my robots and their actions more efficient by reading special 'for robots only' signs that give instructions and meta data. However, the real point is that many will not stick to the standards. Furthermore, the ones that spam you don't care a bit about your privacy, and will circumvent simple measures like a robot.txt to get to their prize. The most effective solution is for content providers to ensure they have adequate security and a good architecture. This comes from planning out what you want to do, and then finding how that can happen (which includes, how can this not happen and how can other things I never wanted to happen... happen) By reading up on topics related to these, whether through journals, magazines, books or online, a responsible content provider will treat his job seriously. It is sad that with all the technology and promise, that there is still such a large army of people that seem to think that web=html and no other. Furthermore, that since the web IS ONLY html, then lets only focus on presentation and cuteness. Funcionality has taken a backseat and with that, many have effectively refused to seriously consider the integrity of their sites.

I for one, would like to see a complete list of those sites that are listing passwords, credit card numbers and other similar information. (By listing I am refering to those sites/admins that allow such information to be easily obtained). If I know this, then I will NEVER do business with them, or at least until they can prove that they realize their error (which includes the error of policy and mentality for ever letting that happen in the first place) and fix it. After all, this is yet another example of what kind of foolishness can happen when we fail to apply critical thought to our actions. Yet again, this is an example of an error in thinking that the web (and the services performed on it) are 'new ideas' when they are in fact just newer implementations.

Question to those who know the legal side of things... Hypothetically. If I find that a restaurant is leaving all of their receipts (with complete numbers, addresses, names and expiration dates) out in the open, perhaps in a special bin marked 'receipts and ledgers' that is left near the bar and is easily viewed by anyone who walks by there... and if in this hypothetical situation, I find that mine and others' credit card numbers are then used illegally for purchases, what would be the legal situation here? Would I be able to sue the company to recover the cost? (and ONLY the cost, I don't care about any 'litigation as an income' scum out there) Also, shouldn't credit card companies and the banks that provide them be responsive to such situations and police any irresponsible companies that do let such sensitive information loose?

Personally, I am getting rid of all credit cards (and debit cards as well) because of this irresponsibility. However, when making large purchases of computer equipment (well, anything really) it is nice to have the extra leveraging power of the credit card company behind you if the vendor tries any funny stuff. Please contribute with what you think... especially the legal issues

--

I seek not only to follow in the footsteps of the men of old, I seek the things they sought.

I love ROBOTS.txt by josh+crawley · 2001-11-26 05:25 · Score: 1

Well, Robots have a more sinister usage. The purpose is to keep 'bots' out, but it also tells ME where to snoop more. Cool.

OT: Cyberwar? by sting3r · 2001-11-26 05:27 · Score: 0, Offtopic

I have yet to see any evidence that a "cyberwar" is imminent or even possible. Realistically - how many critical systems are connected to the Internet? Sure, a determined enemy might be able to take out Amazon or Yahoo, but who cares? Most Internet businesses aren't making much money anyway, so who cares if bad security puts the final nail in their coffin?

And think about other systems too. Is the phone network on the Internet? One wouldn't think so, because there's no benefit in adding the extra layer of complexity. How about the power grid? Or water supplies? There is literally zero business need to make any of these systems Internet accessible, so why would it happen? The answer is that it wouldn't, but our leaders just want an excuse to stay hysterical and keep their ratings high.

-sting3r

fun by British · 2001-11-26 05:27 · Score: 2

Try doing a search of the file WinsockFTP leaves(WS_FTP.LOG?). You'll get hundreds of hundreds of results, and you just might find unlinked files mentioned in it.

Of course, there's always good fun going into the /images/ directory(since virtual directories are on by default) on any angelfire user pages. Often you'll find images that the user didn't intend on the public to see.

Of cousre, there's the old fashioned way. If you see an image at http://www.whatever.com/boobie3.jpg, chances are there's a boobie1 and boobie2.jpg.

My fearless prediction by Anonymous Coward · 2001-11-26 05:28 · Score: 0

The moron in the IBM ads vs. the moron in the DELL ads? The IBM moron in the 3rd round.

funny? by 3am · 2001-11-26 05:30 · Score: 2

are you kidding?

they are talking about sensitive personal information - just don't store this online.

if you really need to access something (that isn't a credit card number... just don't do that!) and don't have physical access to the box, try SSH or at least make sure it's a secure directory (httpS://blah/mystuff...)

--

A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.

Totally unrelated to Google by Sloppy · 2001-11-26 05:33 · Score: 2

If Google can find it, then a human with a web browser can find it. That's all there is to it. Have info you don't want to share? Then don't share it!

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

It's not Google's fault you're a dipshit. by Wakko+Warner · 2001-11-26 05:38 · Score: 1, Redundant

If you somehow manage to post your credit card info on the web, exactly whose fault is it? The only way it *can't* be your fault is if it's a poorly-constructed e-commerce site that leaks out that kind of info.

I just don't see what the big deal here is.

- A.P.

--
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"

I thought this has already been solved by Anonymous Coward · 2001-11-26 05:39 · Score: 0

I thought a long time ago that things like this would have been cured by meta tags, such as "robot=nofollow" and "robot=noindex", along with deleting your CGI pages once they've been understood.

Google along with most search engines will take heed of these and and not index the page, or follow any of it's links are required - unless all these webmasters are lazy gits using MS Word and generating all that crap.

I remember when men were men and used EMACS, and girls were girls and said ``how do I get ms-windows on this'' when using a Solaris...

-JCC

Which ones? by jonestor · 2001-11-26 05:42 · Score: 1

Can they tell us which ones so we won't shop there? That ought to fix that problem right on up.

Why are they blaming the search engines? by eison · 2001-11-26 05:46 · Score: 2

"But other critics said Google bears its share of the blame."
Why?!

Google is finding documents that any web browser could find. The fault belongs to the idiots who publicly posted sensitive documents in the first place. Why doesn't the article mention this anywhere? Garbage reporting if I've ever seen it.

--
is competition good, or is duplication of effort bad?

This is good by else...if · 2001-11-26 05:47 · Score: 1

Yes! Good! Google finds credit card numbers which are publicly available to anyone who can find them! It should do that.
If someone's credit card number is accessible on the internet then search engines should be finding it, because the baddies already can. Security through obscurity doesn't work; if someones broadcasting credit card numbers over the internet, this will telling you who's doing it and what numbers are insecure. The next step is for MasterCard, Visa, et. al. to start searching google for their credit card numbers and contacting people whose numbers are compromised (oh year, and cancelling the accounts of people stupid enough to do that).

No easy solution in sight? by cjpez · 2001-11-26 05:51 · Score: 1

What are you talking about? Of course there's an easy solution in sight. Don't put your credit card number on the web! Don't give your credit card number to a website that does! Come on, how hard IS that? If it's sensitive information, it doesn't belong on a publicly-available website.

--
Al Qaeda has ninjas!

Blaming Google for this... by night_flyer · 2001-11-26 05:52 · Score: 3, Funny

Is like blaming the Highway department for speeders...

--

Thanks to file sharing, I purchase more CDs
Thanks to the RIAA, I buy them used...

Might want to speak up about this by cjpez · 2001-11-26 05:59 · Score: 1

I contacted the author, Paul Festa (paulf@cnet.com), the generic "letters" section of CNet (letters@news.com), and a Google address that seemed like they might be interested (press@google.com) about this.

Not sure if those are all the completely correct addresses to use, but in the face of some blatant FUD, they'll probably do okay . . .

--
Al Qaeda has ninjas!

DNS -based mistakes, sometimes by Nick+Arnett · 2001-11-26 05:59 · Score: 1

At my last startup, before we launched one of our services (pseuds.org), we controlled access via DNS by pointing "www.pseuds.org" to a placeholder page. Testing showed that it was indeed secure. Unfortunately, nobody noticed that the DNS entry for "pseuds.org" pointed to the unannounced site, which led it to appear on several search engines before long. Since we hadn't linked to it from anywhere public, I'm guessing that at least some of the search engines use domain registration info to find starting points for their robots.

The real irony of this was that my co-founder ran the InterNIC for a while and the employees responsible were former employees of Network Solutions. Of course, some of you will not be a bit surprised by that...

a quick search... by cliveholloway · 2001-11-26 06:00 · Score: 1

for "password" under .xls, brings up a reasonably confidential kellogg's file at No 3.

Gosh, this is fun.

cLive ;-)

--
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism

comp.risks by coyote-san · 2001-11-26 06:00 · Score: 2

Many years ago on comp.risks somebody actually looked at the contents of a number of robot.txt files - he wondered if they could be used as a quick index into "interesting" files. At the time, erroneous use of the file was still pretty rare... but I'm sure that was a selection effect that is no longer valid.

Bottom line: that standard may be intended for one behavior (robots don't look in these directories), but there's absolutely nothing to prevent it from being used to support other behaviors (robots look in these directories first). If you don't want information indexed, don't put the content on your site. Or at a minimum, don't provide directory indexes and use non-obvious directory names.

--
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken

Re:comp.risks by jfunk · 2001-11-26 06:38 · Score: 2

Um, robots.txt should *not* be used for security reasons. That's just stupid.

It is best used to tell crawlers not to bother with pages that are simply useless to crawl. If I ran a site containing a text dictionary in one big html file, I should use robots.txt. If I had a script that just printed random words, I should disallow that too.
Re:comp.risks by coyote-san · 2001-11-26 09:33 · Score: 2

Um, we aren't disagreeing. Not one bit.

But just because we think that this file shouldn't be used for security purposes doesn't mean that some idiot won't come up with this "bright idea." Just because the spec is intended to list directories and files that a robot shouldn't index doesn't mean that someone won't write a robot that actively seeks them out.

--
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken

Directory searches by wytcld · 2001-11-26 06:03 · Score: 4, Insightful

Some search engines don't just check the pages linked from other pages on the server, but also look for other files in the subdirectories presented in links.

So if http://credit.com/ has a link to http://credit.com/signin/entry.html then these engines will also check http://credit.com/signin/ - which will, if directory indexes are on and there is no index.html page there, show all the files in the directory. In which case http://credit.com/signin/custlist.dat - your flatfile list including credit cards - gets indexed.

So if you're going to have directory indexing on (which there can be valid reasons for) you really need to create an empty index.html file as the very next step each time you set up a subdirectory, even if you only intend to link to files within it.

--
"with their freedom lost all virtue lose" - Milton

Re:Directory searches by epsalon · 2001-11-26 11:11 · Score: 2

Better still, you can simply make all dirs in your webserver except those to be indexed not world-readable, and NEVER put secret data in your public web area anyway.

--

Make even shorter URLs - 8LN.org
Re:Directory searches by Monkeyman334 · 2001-11-26 12:00 · Score: 1

If you're posting to a CGI script that checks the custlist.dat, then apache doesn't need to be able to read it. So you problably couldn't change the permissions on it to hide it from the apache user. But you can either put the users list above the document root so apache can't see it. Or put an htaccess to deny all on that file. The CGI will still be able to read it.

Checklist for HTTP Distribution of Sensitive Data by Gleef · 2001-11-26 06:03 · Score: 3, Informative

First, determine if you really need to distribute this via HTTP. It is far easier to secure other protocols (eg scp), so if there's another way of doing this, do it.

Second, if the sensitive information is going to a select few people, consider PGP encrypting the data, and only putting the encrypted version online. Doing this makes many of the HTTP security issues less critical.

Assuming you still have to put something sensitive online, make sure of the following:

Only use HTTPS, never use just plain HTTP.
Use CGI, Java Servlets, or some other server-side program technology to password-protect the site. I will refer to the resulting program(s) as the security program
Never accept a password from a GET request, only accept them from POST requests.
Never make the user list or password list visible from the internet, not even an encrypted password list.
Never place the sensitive information in a directory the web server software knows how to access. Only the security program should know how to find the info.
Review all documentation for your web server software and the platform used for the security program. Pay special attention to seciurity issues, make sure you aren't inadvertently opening up holes. Keep current, do this at minimum four times a year.
Subscribe to any security mailing lists for your web server platform operating system web server software, and for the programing platform you used for the security program. If there is anything else running on this machine, subscribe to their security mailing lists too.
Subscribe to cert-advisory and BugTraq. Read in detail all the messages that are relevant to your setup. Review your setup after each relevant message.
Don't use IIS.
Don't use Windows 95/98/Me. Don't use Windows XP Home Edition.
Don't use any version of MacOS before OS X.
Don't use website hosting services for sensitive information.
Never connect to this webserver using telnet, ftp or FrontPage. SSH is your friend.
Never have Front Page Extensions (or its clones or workalikes) installed on a webserver with sensitive data.
If there is anything above that you don't understand, or if you can't afford the time for any of the above, hire a professional with security experience and recommendations from people you trust who have used his or her services. It's bad enough that amateurs are running webservers, much less running ecommerce sites and other sites with sensitive data.

The above is an incomplete list. It is primarly there to start giving people an idea of how much effort they should expect to put into a properly administered secure website with sensitive information. Do you really need to distribute this via a web browser?

--

----
Open mind, insert foot.

Nice work, Legion303. by wideangle · 2001-11-26 06:04 · Score: 1

"There is no limit to what can be accomplished if you don't mind who gets the credit." --Ronald Reagan

Re:Nice work, Legion303. by well_jung · 2001-11-26 08:42 · Score: 3, Funny

"Trees cause more pollution than automobiles do." --Ronald Reagan '81

--
Carl G. Jung
--
"With one breath, with one flow, You will know Synchronicity" -La Policia
Re:Nice work, Legion303. by Legion303 · 2001-11-26 11:49 · Score: 1

"I like fence soup." --Ronald Reagan, '99
(Sorry, couldn't resist.)
-Legion

MicroSoft Passport Credit Card # avaliable by peter303 · 2001-11-26 06:07 · Score: 2, Interesting

The new issue of "2600" all but gives a kiddie
script for extracting credit card numbers from
the Passport database. Scary. Dont buy anything
through it until they fix it.

Re:MicroSoft Passport Credit Card # avaliable by PaperTie · 2001-11-26 06:33 · Score: 2, Informative

Actually not. The article simply discussed how the Passport system uses cookies to store users' information and how you could possibly get the cookies from a user that still has them. It doesn't detail anything about accessing some magical databse, nor does it mention credit cards.

I'll test your site! by JMZero · 2001-11-26 06:09 · Score: 1

I'm going to start offering a free service. Just send me your credit card number, and I'll make sure it's not being used maliciously.

Don't worry about my expenses. I'll cover them somehow. After all, the net is full of "Good deals".

--
Let's not stir that bag of worms...

Are crawlers only using links? by n-baxley · 2001-11-26 06:11 · Score: 2

I haven't looked into how the new crawlers are working. I assume that they still follow links from page to page, but are there new types of crawlers that could be searching the directory sturctures of a site? Not that this excusses the webmasters, but it might explain some of the new search results.

--

THIS SPACE FOR RENT

Blissful ignorance backfires again. by hkmwbz · 2001-11-26 06:12 · Score: 3, Interesting

That a search engine is able to harvest this kind of data just proves that some people don't know what they are doing. Forgive me if I seem judgmental, but these people are probably the same people who think Windows XP is the next step and that IE is the only browser in the world. But as is proven again and again, ignorance backfires. Not only are they attacked by viruses and worms and have all backdoors and security holes exploited - they are ignorant enough to leave users' data in the open, for everyone to get.

Google's comment was:

"The primary burden falls to the people who are incorrectly exposing this information."

This is where they should have stopped. Those who find their credit card information in a search engine will learn a lesson and use services that actually take care of their customers' security and privacy. Google shouldn't have to clean up incompetent people's mess.

In the long run, these things can only lead to the ignorant (wannabe?) players in the market slowly dying because they don't know what they are doing.

I personally hope someone gets a taste of reality here, and that only the serious players survive. The MCSE crowd may finally learn that there's more to it than blind trust in their own (lacking) ability.

--
Clever signature text goes here.

How it's done in the real world by Brainless · 2001-11-26 06:12 · Score: 1

You have to remember, how it SHOULD be done and how it IS done are two completly different things. Larger sites such as Amazon and Barnes n Noble may have elaberate systems set up, but for the average small time e-commerce site securtiy is normally fairly lax. I have worked for companies that put un-encrypted credit card numbers in the database and rely on database security to keep hackers out. Granted the machine may be behind a firewall to block netbios/trojans/etc but when you open the ports to do database administration remotly, you're just asking for trouble. None of the companies have had any problems that I am aware of, but it's a timebomb waiting to happen.

Re:How it's done in the real world by gazbo · 2001-11-26 06:34 · Score: 1

There is no excuse for such lax security though. It is forgivable not to have a cast iron security system, as the only truly secure system is, of course, never to deal with CCs etc over the Internet anyway.

Our company holds CC numbers, and has no mechanism for displaying them to anyboddy over HTTP. Furthermore (and at no real cost in terms of difficulty) we GPG the numbers immediately as they are received, and keep the private key on a machine that runs no http/ftp/DB services (or similar), and is behind a firewall blocking external access.

I'd have thought that encryption of this sort is minimum - you can't rely on a single server not being cracked.

Of course, if the machine was compromised, the DB password discovered, the public key discovered, and the encrypted CC numbers downloaded, it is true that *eventually* they could be cracked. But asI said at the start, you've got to draw the line somewhere.

The proof! by Anonymous Coward · 2001-11-26 06:13 · Score: 0

Check out this link...
http://www.google.com/search?q=cache:KpZEOi1W8rA :w ww-cgsc.army.mil/nrs/CGSOC/course_material_sy00/S5 10/lsnguide/lg6.doc+%22area+51%22+filetype:doc+sit e:.mil&hl=en

and read question 23! The proof is there! Bwahaha.

Gary McGraw, super-genius. ;) by bacchusrx · 2001-11-26 06:14 · Score: 2

"The guys at Google thought, 'How cool that we can offer this to our users,' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief." - Gary McGraw (quoted in the CNet article).

*blinks*

Well, actually, Gary, it seems to me that it isn't Google that's been caused any grief here, but, those wembasters who didn't "think about security from the beginning." In fact, it looks like Google runs a pretty tight ship.

This is the kind of guy who blames incidents.org for his web server getting hacked. After all, they weren't thinking about security from the beginning, were they?

Riight.

BRx ;)

--
Life after capitalism? The participatory economics project

Terrorist by Anonymous Coward · 2001-11-26 06:21 · Score: 0

Yes you are, Mr. Bin Laden wants to arrange a meeting with you tomorrow

Accidental or not... by Jordan+Block · 2001-11-26 06:25 · Score: 1

This is still a MAJOR screwup on the part of the admins and/or the coders!

So what if it's accidental, does that make the CC#'s any less real?

Re:Accidental or not... by Joe+Decker · 2001-11-26 06:34 · Score: 1

This is still a MAJOR screwup on the part of the admins and/or the coders!
So what if it's accidental, does that make the CC#'s any less real?
Of course not. And yes, it is a MAJOR screwup.

--
I'm a nature photographer.

Re:How can this happen? Apache bug exploited... by pojo · 2001-11-26 06:32 · Score: 1

This was also brought up on bugtraq a while back. I paid special attention because I had a text file on a web site that was indexed, but a google search of "link:<my text file>" produced no results.

On some servers, if you make a query for http://<server>/<path>/?M=A or http://<server>/<path>/?S=D you will get a directory listing instead of the default page. This is a result of FancyIndexing in Apache and can be disabled through methods detailed in the Bugtraq discussion.

The originator of the discussion pointed out that his log files had get requests from Google that specifically looked for these directory listings, so it's pretty clear they were (are?) doing it intentionally.

For those who must use IIS by JMZero · 2001-11-26 06:33 · Score: 3, Informative

I agree with all of your assertions, except

"Don't use IIS."

This just isn't an option for a lot of people. I would change this to:

"If you use IIS, you need to make sure you check BugTraq/cert EVERY day."

I would also add:

"If you use IIS with COM components via ASP, make sure the DLL's are not in a publicly accessible directory."

This happens a lot, and makes DLL's lots easier to break.

--
Let's not stir that bag of worms...

Another good search... by Anonymous Coward · 2001-11-26 06:35 · Score: 0

is for "mck-cgi". Then goto url and append:

/conf/merchant_conf

Then get hold of the cybercash API and have fun ;-)

eg

Different file types make my day by srichman · 2001-11-26 06:39 · Score: 3, Interesting

The big complaint of the article is that Google is searching for new types of files, instead of HTML.

The only people who complain about this are obviously the folks using crossed fingers for security. The rest of us love that Google indexes different file types.

I'll never forget the day I first saw a .pdf in Google search result. Not that long ago I saw my first .ps.gz in a search result. I mean, how dope is that!? They're ungzipping the file, and then parsing the postscript! Soon they'll start uniso-ing images, untarring files, unrpming packages, .... You'll be able to search for text and have it found inside the README in an rpm in a Red Hat ISO.

Can't wait until images.google.com starts doing OCR on the pix they index...

Pusing stuff around by Helmholtz · 2001-11-26 06:44 · Score: 2

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security..."

Interesting that this is being pushed off onto Google. I think a more appropriate phrase would be "The guys at though, 'How cool that this website is so easy to set up' without thinking about security...."

--
RFC2119

Jeesh by Tom7 · 2001-11-26 06:47 · Score: 1

> Or you could just suck.

Look, I was just explaining how it happens. I point out ways to avoid this in a later paragraph. Let's direct that anger towards something more productive, ok?

Re:Jeesh by Anonymous Coward · 2001-11-26 07:10 · Score: 0

Look, I was just explaining how it happens.

For the official record, Tom7 does not suck, as far as I know. I meant "you" in the sense of "the idiot who would put confidential info on a public web site".

out of the loop by winse · 2001-11-26 06:50 · Score: 1

Could someone tell me what the hell a PHB is. I thought I used to think I was smart.

--
this sig is deprecated

Re:out of the loop by CraigoFL · 2001-11-26 07:06 · Score: 2

"Pointy Haired Boss". Go grab a book of Dilbert comics if you don't happen to know what that is.
Re:out of the loop by CraigoFL · 2001-11-26 07:10 · Score: 2

Dammit, my reply to your message wasn't. Sorry. Check it out here: http://slashdot.org/comments.pl?sid=24151&cid=2614 769

Weakest Link of Security: Human by Pollux · 2001-11-26 06:51 · Score: 2

Alright, so I admit, I was a little curious about how dumb people are with their passwords so I tried the search. It's simply amazing how careless people are with their security...

Here was a simple document found using the exact search method listed above that is just the Minutes from some board meeting. In it, they actually LISTED a website to log into as well as the password required to get in! Right in the minutes! The website is no longer available, so I'll actually post the text from the minutes...

Minutes of the Gulliver Meeting at Carlton Library 17.8.01

...

7. Assessment of database products

B_ spoke briefly about the online tool that is an outcome of work done at Monash University for Libraries online, he will make the URL available so that evaluation of the usefulness of the tool may commence.

The tool is at http://130.194.38.42
Password admin

Talk about careless. Even if you're positive that the minutes document won't be posted on the web, you certainly don't go and actually write it onto something that will be distributed to the public! A hard copy (aka paper) of access to a server is just as dangerous as it being stored online.

The problem is that people don't realize that it's not save to distribute private information through ANY public medium.

McGraw responds by Anonymous Coward · 2001-11-26 06:52 · Score: 1, Informative

I dropped a note on his comments

"We have a problem, and that is that people don't design software to behave itself.. etc.."

Me(typoes and all)- You honestly believe that a crawler that finds a private page is responsible fro exposing private info?

Seriously? Cmon, Under 0 circumstances should my CC information be available to anyone visiting a website, if it is, the owner of that site should be criminally liable.

The response -

Hi Sean,

I agree. I actually made that point too, but the reporter chose to focus on other things I said.

gem

[OT] Re:out of the loop by Scooby+Snacks · 2001-11-26 06:56 · Score: 1

PHB means "Pointy-Haired Boss". Popularized by Dilbert.

--

--
Runnin' around, robbin' banks all whacked on the Scooby Snacks...

It's not used for security (if you're smart) by sideshow · 2001-11-26 07:02 · Score: 1

I had a client who some reason got their 404 page the highest rank on google. So if typed their company name into the search engine the 404 page was number one so you could imagine the problems that caused. So I put their error pages in the robots.txr and everything worked out.

--

Hollow words will burn and hollow men will burn.

Re:Checklist for HTTP Distribution of Sensitive Da by J'raxis · 2001-11-26 07:02 · Score: 1

Dont use any version of MacOS before MacOSX? MacOS was pretty secure, more so than OSX rather difficult to crack MacOS remotely considering that there is no command line, and it comes with no services installed (let alone enabled).

--
Liberty in your lifetime

Radio Station Contests by Anonymous Coward · 2001-11-26 07:04 · Score: 0

A few months ago I was looking up the email address of a friend. I knew that he occasionally posted messages with his email address, so I figured I'd give google a shot. After plugging in his email, I found a link to a text file of names, phone numbers, addresses, ages, and comments for a local Miami radio station. I sent an email to the web administrator there -- he was rude enough to suggest that I'd found the page using illicit means. So lets blame google rather than some incompetent web administrator.

Re:Not as effective by Anonymous Coward · 2001-11-26 07:07 · Score: 0

"Push the plunger and meet Allah" only works on the Arabs.

What about those Irish terrorists, assmouth? Don't they blow themselves up for jesus?

An interesting spin... by Xepherys2 · 2001-11-26 07:08 · Score: 1

Alright, before you go flaming me for thinking of this, posting this or whatever, just realize that I belong to the school of thought that says, "If you can think of something malicious, someone else can think of it too, best to be prepared"

What would be the option to detect and protect yourself from a virus that did this:

1) Gets on your webserver through whatever means.

2) Grabs your password files (either *nix or Win*) and creates a text file with a strange, multi-character and fairly unique name that stores that data, then adds a robots.txt file in your root to allow searches for that file.

3) Now the attacker can do a search on google for that file name and likely come up with few matches that are NOT what s/he is loking for.

*shrug* Just a thought

Xeph

16 by Anonymous Coward · 2001-11-26 07:16 · Score: 0

>>>"You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain. "

Especially since credit card numbers are 16 digits long...

4+4+4+4 = 16

Re:16 by Xerithane · 2001-11-26 07:44 · Score: 1

I meant 16, but the mod(10) popped into my head.

Oops.

--
Dacels Jewelers can't be trusted.

Password search by azaroth42 · 2001-11-26 07:19 · Score: 3, Interesting

Or for more fun, do a search like

filetype:htpasswd htpasswd

Scary how many .htpasswd files come up.

-- Azaroth

Re:Password search by Anonymous Coward · 2001-11-26 11:56 · Score: 0

I tried this and searched for "filetype:htpasswd htpasswd" on google... 7th one down was www.netstores.com That really instilled my faith in e-commerce.
Re:Password search by Anonymous Coward · 2001-11-26 16:05 · Score: 0

Yeah, that's pretty good but on the last page we can see www.oc.nps.navy.mil. Now that is funny.

No kidding... by Anonymous Coward · 2001-11-26 07:24 · Score: 0

I mean like when you put data onto a publicly accessible webserver.... well, that's exactly what your doing. Duh!

There's the problem: by Anonymous Coward · 2001-11-26 07:32 · Score: 0

I just accidently faxed everything in my wallet to everyone in Michigan. Please, everyone turn off your fax machines!

Gary McGraw's new book... by nagora · 2001-11-26 07:35 · Score: 0, Offtopic

will be really good; his grasp of security is really deep. I mean, he can spell it and everything!

TWW

--
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"

Let's create some real incentive. by Ludwig668 · 2001-11-26 07:49 · Score: 1

Seems to me that credit card companies need to start suing websites which jeopardize the integrity of credit card numbers-- that until online merchants feel a real monetary incentive to get their security right, there will be no incentive at all. Courts, credit card companies, and insurance companies are the ones who really need to step up to the plate here. I firmly believe that a big part of the failure of online commerce has to do with the negative press which these compromises create...

Check your own mouth too by fizbin · 2001-11-26 07:49 · Score: 2

And "Interesting" posts should know what they're saying, but one rarely gets everything one wants.

The point: the poster is implying that there's some mismatch between looking the password up in a mysql database and doing HTTP/1.0 Basic Authentication. There isn't - the phrase "HTTP/1.0 Basic Authentication" refers to how the password is sent over the wire. The server can look up the password by carrier pidgeon for all that that matters.

It's true that the standard Apache password mechanisms look things up in flat files and not a mysql database, but that's not what the poster said.

Re:Check your own mouth too by Crewd · 2001-11-26 08:08 · Score: 1

Touche...

it is quite clear, even here on slashdot by Bender+Unit+22 · 2001-11-26 07:53 · Score: 1

it is quite clear, even here on slashdot. Just take a look at how many people are posting messages as "Anonymous Coward". That poor guy must have left his login somewhere.

ahrm.

First rule of security by maxxon · 2001-11-26 08:02 · Score: 1

If something is sensitive, it shouldn't be publicly available to everyone through HTTP. Search engines aren't causing security problems, they're exposing them.

--
max

oldies but goodies by jahjeremy · 2001-11-26 08:05 · Score: 0, Offtopic

I'm surprised that Jules Verne and H.G. Wells have not been mentioned. They are probably the two most influential early science fiction writers.

H.G....

Island of Doctor Moreau: Predicts genetic engineering. ;)

War of the Worlds: Aliens, flying saucers and the like...not exactly a prediction, but it does cover some modern "interests," a la X-Files and, uh, Battlefield Earth (?).

Time Machine: Again, not a prediction but a current concern of many modern minds such as Stephen Hawking and popular culture like Timecop, Back to the Future and Quantum Leap.

The World Set Free: Predicted the nuclear bomb and the resulting arms race and stalemate.

Jules...

20,000 Leagues: Deep-sea submersibles!

Around the World in Eighty Days: Rapid transmit..hehe.

From the Earth to the Moon: Space travel.

Paris in the Twentieth Century: Never read it, but I heard some of the predictions are quite accurate.

DMCA by C. · 2001-11-26 08:09 · Score: 2, Interesting

> You should be writing that type of data on the backs of envelopes and leaving them scattered around your living room...

Not much worse than some "commercial-grade" encryption...

Maybe somebody should consider suing Google under the DMCA. I haven't studied the DMCA with enough detail to be sure of this (and much less studied law, for that matter), but i guess Google is easily guilty of the following "crimes" against modern society:
- linking to decryption algorithms
- linking to reverse enginnering tools
- linking to passwords that could be used to circumvent somebody's copyright.
- storing and distributing all the above (with google's cache)

As I understand current legislation, Google should not even have the right to define what is public or not like they're trying to do. Even the safe-harbour provisions do not immunize them from having to remove unlawful content.

Such a lawsuit would make for an interesting debate, and with a bit of luck could get us all rid of this stupid law.

C.

--
C.

Re:DMCA by klykken · 2001-11-26 10:19 · Score: 1

> Such a lawsuit would make for an interesting
> debate, and with a bit of luck could get us all
> rid of this stupid law.

Either that, or it'll rid us of Google itself, depending on the outcome.

.

--
Looks like a fish, drives like a fish, steers like a cow.
Re:DMCA by fishebulb · 2001-11-26 10:41 · Score: 1

if you have a sign in your house window, and someone takes a picture of it. you cant sue that person when you find out your sign had stuff it shouldnt have
Re:DMCA by C. · 2001-11-26 13:22 · Score: 1

We're going to lose much more than Google alone if the DMCA stays...

Without reverse engineering tools being legal, today there would (almost) only be IBM-brand computers running IBM-supplied software, Sun-brand computer running Sun-supplied software, and Apple-brand computers running (erm...) everybody's software (but doing only what Apple wants us to).

The world would be a lot different...

C.

--
C.
Re:DMCA by Anonymous Coward · 2001-11-26 18:41 · Score: 0

aye..

but if you get a copywrighted/illegal .mp3 from a website (or other verboten materials), then provide access to those materials from your own site, well, thats quite verboten as well...

It's not the number, it's the matching name by Killio · 2001-11-26 08:10 · Score: 0

Anyone can generate a credit card number very easily in 2 or 3 minutes, starting with random numbers and putting it through the LUHN scheme used to verify numbers. So you can easily come up with a working credit card number - but it's the matching name that's important.
Hell, you could shluck any old (LUHN-passing) CC number into a site, with a random name from the telephone book and it would verify nine times out of ten...

[nosearch] by Anonymous Coward · 2001-11-26 08:15 · Score: 0

We just need a tag that tells the search engine not to include the page in its output.

Maintains flying saucers and keeps alien bodies... by Futurepower(tm) · 2001-11-26 08:18 · Score: 2

QUESTION 23: What national-level intelligence assets are available to you, the warfighter?

ANSWER: Area 51 -- Maintains flying saucers and keeps alien bodies in the freezer.

Okay, how did you do that?

(If you read Slashdot enough, sooner or later you see everything.)

--
Bush's education improvements were

Easy fix for this... by bytes256 · 2001-11-26 08:21 · Score: 1

The easiest way to fix this...and it works for nearly every webserver... is to put a dummy index.htm file in every directory. It doesn't have to say anything important at all. It just has to prevent the server from displaying a directory listing

--

Slashdot, the site where everything's made up and the points don't matter

Check out Question 28, also: by Futurepower(tm) · 2001-11-26 08:22 · Score: 2

If you throw a cat out the window of a car, does it become kitty litter?

--
Bush's education improvements were

Internet Archive by IdocsMiko · 2001-11-26 08:28 · Score: 1

An interesting issue indeed. To push the question even further, what issues does this raise for the Internet Archive? Can they be held responsible for publishing a web page as it was last year? If the web page contains slander, is IA somehow responsible for taking down that page from their archive?

One of the ways the web is really different from printed media is that pages have an implication of currentness and ownership, even when they say otherwise. Unlike a newspaper publisher that A) cannot possibly recall all issues of a screwed up edition and B) no longer owns the physical copies of the paper that have been sold to the public, webmasters can always take down a page, and in fact must continue to provide the page in order for it to be available. That proactive providing of the page might to some people imply that archivers of any sort are responsible for the content.

Re:Weakest Link of Security: Human by Anonymous Coward · 2001-11-26 08:40 · Score: 0

I have a story similar to that. We were setting up a polling site for our organization and had slapped a quick HTTP login/password auth on the results. The only reason they had that was to keep people from playing with the input to see what happened on the output stage. The idea was to pull the passwords when the input stage was done and let everyone view the results.

Then they had a board meeting, and one of the directors gave a presentation using our results. A reporter from the local paper was paying attention, and sure enough, the next day in the paper:

The results are at (URL), which requires a login and password, both of which are "blah".

When I saw that the next morning during breakfast, I came close to spewing my cereal across the room.

The moral of the story is: temporary things never intended for public viewing frequently leak when random twits are involved.

But, that's nothing. Download the entire document. by Futurepower(tm) · 2001-11-26 08:41 · Score: 2

Download the entire document from the U.S. military web site: lg6.doc

U.S. ARMY COMMAND AND GENERAL STAFF COLLEGE

S 510/0 Strategic, Operational and Joint Environments

Lesson Guide for Lesson 6
National and Theater Command and Control

Third bullet under question 28: "If you throw a cat out the window of a car, does it become kitty litter?"

Hey, military commanders, don't be mis-treating cats!!!

How U.S. government policy contributed to terrorism: What should be the Response to Violence?

--
Bush's education improvements were

More Fun with Google by davy_wavy_42 · 2001-11-26 08:45 · Score: 1

try searching Google for the first eight numbers (or fewer) of your credit card... these are the numbers that indicate the bank, and therefore can be quite common... ex: XXXX XXXX (space in between)...

i found a page with transaction data from some small web merchant. it utilized the unfathomably secure method of Black Text on Black Background. something any Neo who's surfing Source Code could pick up.

--

Re:More Fun with Google by Anonymous Coward · 2001-11-26 09:19 · Score: 0

Wow, thanks, man!

I found this using your idea!
Re:More Fun with Google by plover · 2001-11-26 10:26 · Score: 2

At least some of those numbers are fakes. I believe they're the postings of people falling for the "order me some stuff" gag web page. (and I do mean "gag".)
This is certainly across the borderline of unethical and nearing the boundaries of "illegal."
John

--
John
Re:More Fun with Google by garbuck · 2001-11-26 12:25 · Score: 1

If you post your credit card on a sign in your front window, it's perfectly legal for any window browsing passerby to copy it down or photograph it or write a letter to the editor about it. However, it's still illegal for them to buy stuff with it.
Re:More Fun with Google by hawk · 2001-11-26 12:42 · Score: 2

> i found a page with transaction data from some small web merchant. it
> utilized the unfathomably secure method of Black Text on Black
> Background. something any Neo who's surfing Source Code could pick up.

Not to mention what pages look like under Lynx . . .

hawk, lynx user

"Internet worms"? by O2n · 2001-11-26 08:52 · Score: 1

From the article: Recent Internet worms such as Code Red and Nimda prove that massive, automated hacking exploits have no need of search engines to find vulnerable computers.

Internet worms? "Microsoft worms", rather. But of course Cnet can't say that in an article, although it can be demonstrated that the Internet is not required for those worms to propagate (i.e. a local network with tcp/ip is enough). Ok, arguably. :)

Re:Checklist for HTTP Distribution of Sensitive Da by Anonymous Coward · 2001-11-26 08:59 · Score: 0

Your pouint #2 is plain wrong. It is almost always best to rely on the server's mechanisms for authentiaction. Whatever half-arsed pathetic lame attmept you implement yourself on the server-side for validation will result in problems. Either that or your talents are wasted doing web sites....

Re:But, that's nothing. Download the entire docume by Anonymous Coward · 2001-11-26 09:44 · Score: 0

Wait, you missed another funny one:

"If you throw a cat out the window of a car, does it become kitty litter?"

Re:Easier fix for this... by Lord+Bitman · 2001-11-26 11:01 · Score: 1

Most [legitamit] web servers have an option to disable directory listing. Works wonders.

but then what does this do for anyone?
I found on various search engines links to pages I didnt even know I had. I guess it doesnt take an index.html

Though all the junk traffic has stopped thanks to port 80 blocking.. dumbassian..

--
-- 'The' Lord and Master Bitman On High, Master Of All

Would you buy anything from this company? by FrozenFrog · 2001-11-26 11:36 · Score: 1

I found this link by searching Google for "index of /password" (from a previous post in this topic):

http://www.centurionsoft.com/password/

It's a small page that asks for your name and email before downloading the trial version of their software. Clicking "pass_down2.html" using the above link bypasses this requirement. While not a huge security risk, it does show laziness.

What does the company sell? Security software. :)

Directory listing faux pas by mesach · 2001-11-26 12:26 · Score: 1

I still cant believe how many companyies have admins that cant turn off directory listing, one of my friends found out the bios passwords for the websurfer because the directory listing was not enabled, he got into thier dev servers and all over thier webservers, because of this oversite.

So people making passwords and credit card information available isnt a surprise, considering how many "mcse's" there are out there...

its to the point now that i dont even talk about being certified in public beacuse i dont want anyone to think im a n00b and wanna brag. damn i need a better job, oh well my rant is over.

--
moo.

Re:Directory listing faux pas by Anonymous Coward · 2001-11-27 08:35 · Score: 0

considering how many "mcse's" there are out there

In the MS webserver, IIS, directory listing is turned off by default.

Nice try, though.

No, search for the numbers instead by leonbrooks · 2001-11-26 12:49 · Score: 2

/me goes to search for "credit card"

/ME would be better off searching for a known-good credit card number.

If you find it, it might lead you to many other credit card numbers - but first cancel the one that you found, and sue the company exposing it. Ask them if they have the box their computer came in. (-: And maybe post the URL to CERT as a vulnerability :-)

--
Got time? Spend some of it coding or testing

It doesn't last by kimihia · 2001-11-26 13:55 · Score: 3, Informative

Because 28 days after you took your page offline it will disappear from the Google cache.

Google reindexes web pages, and if they 404 on the next visit, then good bye pork pie! You have to get them while they are hot, eg, when a site has JUST been Slashdotted.

An advertisement for publicfile by kimihia · 2001-11-26 14:03 · Score: 3, Informative

Perhaps it would be a good idea after reading this article to examine publicfile.

It was written by a very security conscious programmer who realises that your private files can easily get out onto the web. That is why publicfile has no concept of content protection (eg, Deny from evilh4x0r.com or .htaccess) and will only serve up files that are publically readable.

From the features page:

publicfile doesn't let users log in. Intruders can't use publicfile to check your usernames and passwords.
publicfile refuses to supply files that are unreadable to owner, unreadable to group, or unreadable to world.

A good healthy does of paranoia would do people good.

abcnews.com by waldoj · 2001-11-26 14:14 · Score: 1

There's always good stuff at http://abcnews.go.com/robots.txt and http://www.cnn.com/robots.txt. I check that every so often just because I get such a kick out of viewing their logs and such.

-Waldo Jaquith

Re:Checklist for HTTP Distribution of Sensitive Da by Gleef · 2001-11-26 14:53 · Score: 2

I have found web server based authentication systems limited, weak, and hard to integrate into authorization systems (to determine whether the given user is allowed to access the given information). Granted, my experience is limited to Apache and Netscape Enterprise Server.

In addition, the client-based authentication scheme that is triggered by web server based authentication doesn't allow for logging out in a manner that is consistant across browsers. Having the ability to log out is critically important for a good security system.

Your mileage may vary.

--

----
Open mind, insert foot.

Googles Cache, getting it cleared? by Anonymous Coward · 2001-11-26 16:21 · Score: 0

Like probably everyone else reading this story, I did a quick little search and sure enough found a small data file of cards, names, adresses almost immediately.

Emailed them to let them know of the problem, so they can clean it up.. They responded back saying they'd got a flood of emails from people who read /. which i guess is encouraging..

However, since presumably i wasnt the first to notify them, and i was still able to find the data.. it got me wondering.. During this cleanup, is there a process/procedure to notify google (and other engines that cache?) that you wish them to flush/expire/remove your offending document from their cache. Pulling the page simply isnt enough, it's still there in the cache (I assume it expires, but definitely not instantly, which in a situation like this is what you want)

Better Page by Anonymous Coward · 2001-11-26 18:33 · Score: 0

Here is a better page that includes starting digits for the different cards, and includes the checking algo used to verify cards.

Mod this up - Insightful (and funny too) by Anonymous Coward · 2001-11-26 19:12 · Score: 0

QED

Bzzt. Wrong. by Anonymous Coward · 2001-11-26 19:24 · Score: 0

The customer does not get hurt by stolen CC#, it is the merchants that get charged by the CC companies. If a customer reports their card stolen as soon as they realize their is a problem then by law, at max they face a loss of $50. Quite often it is free.

The only cost to consumers is the hassle of getting a new CC#, and the downtime while you don't have a CC#.

Assumptions. by AftanGustur · 2001-11-26 19:46 · Score: 2

How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?

Huh, ? Maby because it isn't illegal at all ?
Think about it, where does it say it's illegal ?

--
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc

Re:Assumptions. by banuaba · 2001-11-27 02:39 · Score: 1

I haven't seen anything that says that the GC is illegal, but it contains copyrighted information that is reproduced without the permission of the copyright holder.

--

Brant

Argle. Bargle.

Print out this article !! by AftanGustur · 2001-11-26 19:49 · Score: 3

No, seriously, do it !
Print it out and hand it on the wall, then put a post-it note on top of it saying : "The best example of 'blaiming the messenger' ever !!!"

--
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc

human indexing by haukex · 2001-11-26 20:58 · Score: 1

if you trained a bunch of monkeys to recognize credit card numbers, sat them down and let them click away for hours on end, who knows what they would find?

then again, if we could train those same monkeys to fill out the stack of credit card applications I get in the mail all the time...

at least search engines don't have to deal with popups...

Thanks!! by Anonymous Coward · 2001-11-26 21:47 · Score: 0

Thanks for the link. Now I know what I want for Christmas.

Thanks again, a kinky AC.

Some sites go one step further by yora · 2001-11-26 22:11 · Score: 1

This reminds me of a site I once visited. In that site I tried to search for somehting and the first result of the search was the admin page for the site. Just to have some fun I clicked on the link, and to my surprise the admin page came up withought asking any passwords and allowed me to do lots of intresting site admin things.

Google realtime query info by pne · 2001-11-27 01:57 · Score: 2

By the way, does google have that realtime display of what people are searching for?

Not to my knowledge. When I asked Google whether such a facility exists, they said no -- but they did point me to Google Zeitgeist, which gives "Search patterns, trends, and surprises according to Google". Usually published once a week and showing e.g. the top 10 gaining and losing queries of that week. So you get some interesting info, but it's not realtime by any description of the word.

--
Esli epei etot cumprenan, shris soa Sfaha.

Re:What a bunch of idiots, was Re:Bad manager idea by RFC959 · 2001-11-27 02:21 · Score: 1

Can any convicted burglar claim to be a locksmith?

Actually, yes: there are many laws about owning "burglary tools", and most of them state it's OK if you are a "professional locksmith" - which is never defined anywhere. So if you /say/ you are, then you /are/... (IANAL, so don't blame me if you end up in jail anyway.)

atleast someone's doing something right.. by jglow · 2001-11-27 09:19 · Score: 1

I ran across this when trying to get a cgi-bin directory listing. Unavailable Tripod Directory Tripod does not allow the automatic listing of directory or subdirectory contents. who'da thought Tripod would do something right..

--

There's no "I" in Linux.. err..

Slashdot Mirror

The Problem of Search Engines and "Sekrit" Data

411 comments