The Problem of Search Engines and "Sekrit" Data

A symptom of poor programming... by Bonker · 2001-11-26 04:48 · Score: 4, Insightful

I don't see what's so hard about this problem. It's very simple... don't keep data of any kind on the web server. That's what firewalled, password/encryption protected DB servers are for.

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!

Re:A symptom of poor programming... by ChazeFroy · 2001-11-26 04:58 · Score: 5, Interesting

Try the following searches on google (include the quotes) and you'll be amazed at what's out there:

"Index of /admin"
"Index of /password"
"Index of /mail"
"Index of /" +passwd
"Index of /" password.txt
Re:A symptom of poor programming... by Brainless · 2001-11-26 05:04 · Score: 4, Funny

I manage a Cold Fusion web server that we allow clients to post their own websites to. Recently, their programmer accidentally made a link to the admin section. Google found that link and proceeded into the admin secion and indexed all the "delete item" links as well. I found it quite amusing when they asked to see a copy of the logs complaining the website was hacked and I discovered GoogleBot deleted every single database entry for them.
Re:A symptom of poor programming... by ChazeFroy · 2001-11-26 05:15 · Score: 2, Informative

Something I forgot to mention in my other post:

The October 2001 issue of IEEE Computer has some articles on security, and the first article in the issue is titled "Search Engines as Security Threat" by Hernandez, Sierra, Ribagorda, Ramos.

Here's a link to it.
Re:A symptom of poor programming... by ichimunki · 2001-11-26 05:19 · Score: 5, Informative

A big part of why this is a problem is the fact that many web servers are, by default, set up to display file listings for directories if there is no "index.html" file in the directory and the user requests a URL corresponding to that directory.

Personally I like to make sure that there is an .htaccess file that prevents this (on Apache-- I'm sure IIS and others have similar config options). I like to turn off the directory listing capability if possible, and certainly assign a valid default page, even if index.html is not present.

And don't forget "index of /cgi-bin" for some real fun. ;)

--
I do not have a signature
Re:A symptom of poor programming... by Legion303 · 2001-11-26 05:29 · Score: 5, Interesting

Please give credit where credit is due. Vincent Gaillot posted this list to Bugtraq on November 16.
-Legion
Re:A symptom of poor programming... by Bonker · 2001-11-26 05:42 · Score: 2

Funnily enough, IIS defaults to hide directory contents where Apache doesn't. The option to display directory contents can be turned on easily enough, but an administrator does actually have to make the concious decision to do so.

This is a good reason not to let developers have administrator access to any boxen they are developing on.

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Re:A symptom of poor programming... by greenrd · 2001-11-26 05:51 · Score: 3

Do you realise that the web developer who made the admin section accessible via a GET request, without any additional authentication, is the biggest moron here, not the client? You shouldn't rely on people not knowing where your wide-open doors are - lock them!

--
Female Prison Rape in NY
Re:A symptom of poor programming... by subsolar2 · 2001-11-26 06:57 · Score: 2, Interesting

I've seen this myself searching for information on linksys routers about a year ago. I found somebody with a page that listed the password for their linksys router along with other systems and information. I e-mailed the guy who seemed very supprised that the information was available there and thanked me for letting him know. The information was gone when I checked again.
It's a silly mistake, I don't have a clue as to how google came accross the link. Like with anything new it's going to take some time before this becomes "common sense" and people do not put this information on public servers.
- subsolar
P.S. It's possible to generate a url that when clicked by somebody behind a linksys router to enable remote administration if you know the password. I've turned it in to linksys but gotten nothing but silence from them.

How can this happen? by Nonesuch · 2001-11-26 04:48 · Score: 4, Redundant

To the best of my knowledge, search engines all work by indexing the web, starting with the base of web sites or submitted URLs, and following the links on each page.

Given this premise, the only way that Google or another search engine could find a page with credit card numbers or other 'secret' data, would be if that page was linked to from another page, and so on, leading back to a 'public' area of some web site.

That is to say, the web-indexing bots used by search engines cannot find anything that an ordinary, very patient human could not find by randomly following links.

--

I do not deploy Linux. Ever.

Re:How can this happen? by kilgore_47 · 2001-11-26 06:43 · Score: 2

Index pages often have the ../ parent link and that can get you to some places people tend not to think of as being accessible.

I sincerley hope that there aren't any widely used webservers that would actually let you request "../" and get something above the designated webspace. That is one of the most obvious exploits ever, and I think even microsoft is smarter than that now.

--
___
The way to see by faith is to shut the eye of reason. --Ben Franklin
Re:How can this happen? by GigsVT · 2001-11-26 07:35 · Score: 2

Actually directory traversal exploits happen all the time, but it's not likely you would put a hyperlink to exploit your own (or someone else's) site just on the web.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:How can this happen? by Nonesuch · 2001-11-26 07:41 · Score: 2

Did you not see the question mark at the end of the subject on the parent comment?
FYI, I did read both the Slashdot article and the referenced offsite article, and neither answers my question as to how Google (or any other web-crawler 'bot) finds 'secret' files that presumably are never linked to from a 'non-secret' page.
Other users here have offered constructive suggestions about how this can happen (apache bug, referer data exposed by analog, etc) , meanwhile you waste your time and karma composing rants about why my question is redundant.

--

I do not deploy Linux. Ever.

Oh Yeah? by Knunov · 2001-11-26 04:49 · Score: 4, Funny

"...search engines are finding password and credit card numbers while doing its indexing."

This is very serious. Could you please post the exact search engines are query strings so I can make sure my information isn't there?

Knunov

--
Why do users with IDs under 100,000 or over 700,000 usually have the most worthwhile comments?

Re:Oh Yeah? by Karma+50 · 2001-11-26 04:51 · Score: 5, Funny

Just search for your credit card number.

By the way, does google have that realtime display of what people are searching for?

--
http://www.thehungersite.com
Re:Oh Yeah? by 4of12 · 2001-11-26 05:01 · Score: 2, Funny

Yeah!

I just typed in my credit card number and found 15 hits on web sites involving videos of hot young goats.

--
"Provided by the management for your protection."
Re:Oh Yeah? by morcego · 2001-11-26 05:59 · Score: 2, Insightful

Yes, it could. Actualy, it very trivial to do.
I actualy tried to search for my credit card number, but only searched for 8 digits, in various forms (always the same digits, mind you), like:
"XXXX XXXX"
"XXXX-XXXX"
"XXXXXXXX"

Thanks god, nothing ...
This is something I sugest you people to do. I would sugest using the last 8 digits, onde the "last 4 digits" are commonly used, but you won't be exposing something that is probably already everywhere.

--
morcego
Re:Oh Yeah? by Hiro+Antagonist · 2001-11-26 08:24 · Score: 2

Your information is perfectly safe.

Oh, and by the way -- thanks for the christmas gift. I've always wanted a silver Ferrari.

--

--
I Hit the Karma Cap, and All I Got Was This Lousy .sig.
Re:Oh Yeah? by rkent · 2001-11-26 10:58 · Score: 2

I would sugest using the last 8 digits...

Or just search for the first 4, which identify the card manufacturer (eg Discover card is 6011), and so pose no risk to you at all. See if any of the results for "6011" have 12 more digits following...
Re:Oh Yeah? by rkent · 2001-11-26 11:01 · Score: 2

I realize this is a joke, but that works, and it's fuckin' scary! Of course I didn't search for my own credit card number, but I did search for the first 4 digits, which is just a card issuer's indentity string anyway. For example, (some) visa cards start with "4128."

So if you search google for "Visa 4128"... watch out. I'd estimate about a third of the results I got actually had whole visa numbers within. Scary.

Tangential Google Question by banuaba · 2001-11-26 04:50 · Score: 5, Interesting

How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?
If I want to find lyrics to a song, the site that has them will often be down, but the cache will still have them in there.. Why is what google is doing 'okay' but what the origional site not okay? Or do they just leave google alone?

--

Brant

Argle. Bargle.

Re:Tangential Google Question by CaseyB · 2001-11-26 04:59 · Score: 3, Interesting

Good question.
Given that they do have (for now) some sort of immunity, it opens a loophole for publishing illegal data. Simply set up your site with all of Metallica's lyrics / guitar scores (all 5 of them, heh). Submit it for indexing to Google, but don't otherwise attract attention to the site. When you see the spider hit, take it offline. Now the data is available to anyone who searches for it on Google, but you're not liable for anything. The process could be repeated to update the cache.
Re:Tangential Google Question by passion · 2001-11-26 04:59 · Score: 2

I doubt most prosecuting teams are savvy enough to think about google's cache.

--
- passion
Re:Tangential Google Question by Suidae · 2001-11-26 05:27 · Score: 2, Interesting

Don't bother taking it offline, just set up your web server so it only responds to the google indexing server. Cache stays up all the time, but no one else can (easily) see that you are serving it.
Re:Tangential Google Question by Xzzy · 2001-11-26 05:44 · Score: 3, Informative

> If you only had Google pointing to it, wouldn't
> it be very low on a search list?

If it's a very specific search term, Google will still return it in the list. If it's unique enough, it's very possible that it will even be the top ranked page. If you put a unique string of characters (like a password or something) on a page, and google indexed it, typing that "password" into the search engine will give you your page.

You can also type domain names into google to retrieve the cache page for that website, which would accomplish much the same thing as long as it's not geocities or something.
Re:Tangential Google Question by snake_dad · 2001-11-26 06:21 · Score: 3

Make that: were savvy enough.

--
karma capped .sig seeking available Slashdot poster for long-term relationship.
Re:Tangential Google Question by kilgore_47 · 2001-11-26 06:49 · Score: 2

You could always search for exact phrases from your site, save the resulting cache link, setup a forwarder (or frameset) on a free account somewhere to point to the google cache URL, and distribute the URL to your free website. Technically, the offending data will be served by google only.

Now if only google would start letting their spider index .mp3 files!

--
___
The way to see by faith is to shut the eye of reason. --Ben Franklin
Re:Tangential Google Question by LinuxHam · 2001-11-26 09:47 · Score: 2, Interesting

Don't bother taking it offline, just set up your web server so it only responds to the google indexing server. Cache stays up all the time, but no one else can (easily) see that you are serving it.

Oooh.. that's a particularly good one.. kinda like getting high-bandwidth web service FOC, if you build your site URLs to ride along the google cache instead of your own... (gears cranking)..

--
Intelligent Life on Earth
Re:Tangential Google Question by randomgeek · 2001-11-26 16:40 · Score: 2, Interesting

Random thought, it'd be possible to use Google as a kind of morpheus/napster-like distributed service? Make a HTML "page" that looks something like:

FileName: MyFile
Size: FileSize

Encode in Base64, rot13 it, and then call it protected under DMCA, bonus points.

Of course, your web server would only accept connections from the google spiders, and you'd effectively have a free file distribution service. Not saying this would actually work, but I think there's a chance it'd work.

how the FUCK is this possible? by posmon · 2001-11-26 04:50 · Score: 2, Insightful

just because google is only picking them up now doesn't mean that they haven't been there for years!

how can someone be so blatantly stupid as to store anything other than their web content, never mind credit card details, in their published folders? how? they redirected my documents to c:\inetpub\wwwroot\%username%\...???

--

update comments set karma=-1, reason='offtopic' where sid=26315

Re:how the FUCK is this possible? by Karma+50 · 2001-11-26 04:56 · Score: 2, Insightful

Google has just added the ability to index PDFs, word docs etc. So, yes, the information was there before, but now it is much easier to find.

--
http://www.thehungersite.com
Re:how the FUCK is this possible? by Neon+Spiral+Injector · 2001-11-26 04:57 · Score: 2, Insightful

In published folders? How about on machines that are on the Internet at all.

In an ideal setup the machine storing credit card information wouldn't have a network card, or speak any networking protocal. You'd have a front end secure webserver. That machine would would pass the credit card information to the backend across a serial link. The backend machine would process the card and return the status. The CC data would only be a one way transfer, with no way of retrieving it back off of that machine.
Re:how the FUCK is this possible? by WNight · 2001-11-26 12:33 · Score: 2

Yup. Much the same as dumping critical logs to the printer, as soon as they are generated. And if you're paranoid, dump SHA hashes of the other logs at certain point. (ie, the apache log, as of 11/26/01 16:04:53 was 253,035 bytes, SHA 0x84BE2C9A1029A3C1(etc))

That way you've got critical logs that a hacker can't modify, and by checking every few days (with a simple script to roll the logs back to a given size) you can tell if the less-important logs have been modified. You may not ever know what they said if they have been, but the mere fact someone other than you or the server modified the logs is a big indicator of a problem.

For this reason, dot-matrix printers are still fairly often found in server rooms.

The serial cable thing is an extension of this. It just has a computer logging the data. You may start sending corrupt data, but it's not going to a shell you had to log into so the most you could do is confuse the logging script, but you could never get data back out.

If you're *really* paranoid, cut the 'send' line from the secure server. Even if the truly 31337-hacker could break into the machine they could never get anything back out. (And they'd have to find a buffer-overflow in '>' and manage to exploit it, blind, on an unknown system...)

Stopping Google won't stop the problem... by Kr3m3Puff · 2001-11-26 04:51 · Score: 5, Insightful

The big complaint of the article is that Google is searching for new types of files, instead of HTML. If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...

--
D.O.U.O.S.V.A.V.V.M.

Re:Stopping Google won't stop the problem... by Zspdude · 2001-11-26 05:29 · Score: 3, Interesting

It's definately very true that if there were no stupid people these things would not be an issue of controversy. However, society has struggled for a very long time to resolve the question, "Should stupid people be protected from themselves?" There will always be those who( whether they're just technologically inept or for whatever reason) will not act sensibly and not realize they are being foolish. Do they deserve protection as well, even though they don't know how to protect themselves? That's a question which is not quite as easy to answer....

--
What's in a Sig?
Re:Stopping Google won't stop the problem... by greed · 2001-11-26 05:39 · Score: 2, Interesting

So maybe the fix should be in making it harder to share things on the Web, rather than trying to have search bots guess whether someone really meant to post the file?

Web servers could ship configured to not AutoIndex, only allow specific file types (.jpeg, .html, .png, .txt), and disable all those things that I disabled in Apache without losing anything I needed for my site, and so on. Then, the burden is placed on the person who started sharing these other filetypes that have sensitive data on the public internet.

Of course, putting something in public that you don't want someone to see is just plain stupid, but apparently we need to make stupid people feel like they're allowed on the 'net.
Re:Stopping Google won't stop the problem... by DaoudaW · 2001-11-26 06:07 · Score: 2

If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

This seems to be the most common early response to the article and I agree up to a point. The problem is where to stop. Several times I've found stuff in Google's cache that I know were password-protected on the website. I was grateful, but wondered how they retrieved them. Did they purchase a subscription? Did the owners give them access for the benefit of having the site catalogued?

Another issue appears when they start crawling directories. It's never obvious which directories were meant to be public readable and which ones weren't, but Google undoubtedly uses techniques beyond that of the casual browser. As what point do they become crackers?

A number of years ago, I had a shell account on a Unix system. It was amazing where I could go, what I could see on the system with a little bit of ingenuity. When I pointed this out to the sysadmin, he treated me like a criminal. Okay, maybe I should have stopped when I started getting warning messages ;-), but the fact is that Google could probably get behind at least 50% of firewalls if they wanted to.

How far is too far in the search for information?
Re:Stopping Google won't stop the problem... by mobiGeek · 2001-11-26 06:37 · Score: 5, Funny
but Google undoubtedly uses techniques beyond that of the casual browser
Uhh...no.
HTTP is an extremely basic protocol. Google's bots simply do a series of GET requests.
It would be possible that Google's bots have a database of username/passwords for given sites, but the more likely scenario is that they have stumbled across another way to get the "protected" information:
- a link which contains a username and/or password
  /protected/show_article.pl?username=foo&passwo rd=bar&num=1
- a link to the pages which by-passes the protection scheme
  /no_one_can_find_this_cause_Im_3l33t/article1.html
- someone else posted the information elsewhere, and this is what is actually crawled
I ran robots for nearly 2 years and was harassed by many a Webmuster who could prove that my robots had hacked their site. They'd show me protected or secret data. It typically took 3 to 5 minutes to find the problem...usually the muster was the problem themself.
HERE'S A NOTE OF WARNING TO WEBMASTERS:
Black text links on black backgrounds in really small fonts are NOT secure.
Maybe I should get this posted to BugTraq...or would MS come after me??
--
...Beware the IDEs of Microsoft...
Re:Stopping Google won't stop the problem... by Webmonger · 2001-11-26 07:24 · Score: 3, Insightful

Umm, I don't think that's how it happens. I think Google indexes the page and THEN the idiots put on the password protection.

If Google accessed it via a special link, then Google would store that link, and you'd use that link, and you'd see it yourself.

(another form of not-secret link:
http://user:password@domain/path/file)
Re:Stopping Google won't stop the problem... by Anonymous Coward · 2001-11-26 08:44 · Score: 4, Insightful

Years ago cable companies cried foul that ordinary citizens were grabbing satelite communications off the air with their fancy 6' dishes and whatching whatever they wanted for free. The companies raised a big stink and tried to get people to pay for the content. The FCC said "tough luck buddy. If you put it out there then people have a perfect right to grab it." Since that time most satelite traffic has been encrypted.

If you run a web site on the public internet then you should be paying attention to this basic fact: If you put it out there then people have a perfect right to grab it, even if you don't specifically tell them it's there. (I know FCC rulings don't apply, but the principle is the same). You should encrypt EVERYTHING you don't want people to see.

Encryption is like your pants, it keeps people from seeing your privates. Hiding your URLs and hoping is like running realy, realy fast with no pants on - most people wont see your stuff, but there's always some bastard with a handy-cam.
Re:Stopping Google won't stop the problem... by nahdude812 · 2001-11-27 02:11 · Score: 2

obviously with google around, the whole security through obscurity thing just isn't an option anymore.

I assume that statement was sarcastic, because it never really was. Security through obscurity is an oxymoron.

--
Slay a dragon... over lunch!

Well Behaved Crawlers by tomblackwell · 2001-11-26 04:51 · Score: 4, Insightful

...obey the Robot Exclusion Standard. This is not a big secret, and is linked to by all major search engines. Anyone wishing to exclude a well-behaved robot (like those of major search engines) can place a small file on their site which controls the behaviour of the robot. Don't want a robot in a particular directory? Then set your robots.txt up correctly.

P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.

Re:Well Behaved Crawlers by Nos. · 2001-11-26 05:06 · Score: 2

This is not the way to do it, as the article mentions. This may stop Google, but suppose I'm running my own search engine that doesn't follow "robots.txt" rules?
Re:Well Behaved Crawlers by ryanvm · 2001-11-26 05:12 · Score: 5, Insightful

The Robot Exclusion Standard (e.g. robots.txt) is mainly useful for making sure that search engines don't cache dynamic data on your web site. That way users don't get a 404 error when clicking on your links in the search results.

You should not be using robots.txt to keep confidential data out of caches. In fact, most semi-intelligent crackers would actually download the robots.txt with the specific intention of finding ill-hidden sensitive data.
Re:Well Behaved Crawlers by MadAhab · 2001-11-26 12:33 · Score: 2

This is true, but if it really concerns you that someone might try looking at things listed in robots.txt (I've done it), try adding an exclude to robots.txt that specifies a bogus, but tempting directory, say "/mp3/*" or "/warez/*" or "/bspears/*". Then create the directory, make index.cgi mail you immediately when someone requests the directory index.... haha.

--
Expanding a vast wasteland since 1996.
Re:Well Behaved Crawlers by kimihia · 2001-11-26 13:57 · Score: 2

The way around having crackers look at your robots.txt for a pointer to sensitive information is be guarded about what you put in it. For example, if I had the url:

/secret-stuff-here/credit-card.txt

I would sure as heck not be putting that in my robots.txt, because your robots.txt could eventually end up in the index too. Instead I would put this:

/sec

That will match and disallow the first address, and the h4x0r still has a lot to guess.
Re:Well Behaved Crawlers by merlin_jim · 2001-11-27 06:28 · Score: 2

P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.

Long and hard? Anyone keeping credit info in a web directory really shouldn't have to think that long about getting out of the internet.

As soon as someone points out to you what a colossal mistake that is, it's time to go back to McDonald's and hope they'll give you your old job back, because this whole Internet thing just ain't workin out for you no more.

--
I am disrespectful to dirt! Can you see that I am serious?!

Google shouldn't lift a finger by sketerpot · 2001-11-26 04:51 · Score: 2, Interesting

Why should Google or any other search engine do anything to save fools from their stupidity? Putting credit card numbers online where anyone can get them is just plain idiotic. Hopefully this will get a lot of publicity along with the names of companies who do stupid things like this and most people will shape up their act.

Re:Google shouldn't lift a finger by nomadic · 2001-11-26 05:37 · Score: 2

Yeah, but the people that suffer the most aren't the idiots posting the data, they're the people whose credit card numbers they are. Why should they suffer because the store they bought something from doesn't understand the concept of security?

Simple but burdensome solution by camusflage · 2001-11-26 04:52 · Score: 4, Informative

Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.

--
The truth about Scientology, Xenu, and you: Operation Clambake

Re:Simple but burdensome solution by Xerithane · 2001-11-26 04:56 · Score: 5, Insightful

It is a burden, but the responsibility does not lie on a crawling engine. You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.

If anything Google is providing them a service by telling them about the problem.

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by Codifex+Maximus · 2001-11-26 05:18 · Score: 2

While the idea is workable in a small subset of data, what about other sensitive data that is found in the public domain? Will Google and other search engines be responsible for hiding that too? Where does it end?

The burden of hiding information, that should *NOT* be there in the first place, should rest on the entity that posted the information publicly - the web-site.

Once information is *Published* can it be *UnPublished*?

--
Codifex Maximus ~ In search of... a shorter sig.
Re:Simple but burdensome solution by camusflage · 2001-11-26 05:56 · Score: 2

I never said it wasn't web monkey's fault. Yes, anyone who would do something like this doesn't deserve even the title of web monkey. This is simply a reaction, like a provider filtering inbound port 80 to staunch code red's effects.

--
The truth about Scientology, Xenu, and you: Operation Clambake
Re:Simple but burdensome solution by The+Pim · 2001-11-26 06:28 · Score: 2

Credit card numbers follow a known format (mod10). It should be simple, but somewhat intensive as far as search engines go, to scan content, look for 16 digit numeric strings, and run a mod10 on them. If it comes back true, don't put it into the index.
Among other things, this would have the amusing effect of blacklisting most web pages about credit card number validation.

--

The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
Re:Simple but burdensome solution by Xerithane · 2001-11-26 07:42 · Score: 2

I wasn't saying you said that. More supporting your argument and putting in my own 2c.

I just hope that the proper heads roll on this one.

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by scrytch · 2001-11-26 11:19 · Score: 2

but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.? \d .?\d.?\d.?\d

should match >99% of cc numbers. And a lot of other dross, but you can just pipe it into a mod10 checker. Search engines shouldn't have to do this -- unless they're looking for cc numbers that is. Anyone who publishes confidential data that's crawlable without any kind of password protection should be liable for damages (hint: even if you have a CGI braindead enough to allow password=foo, you can still defeat it with sessionid's that time out.... 'course if you do that, just use the sessionid as the per-request auth in the first place)

--
I've finally had it: until slashdot gets article moderation, I am not coming back.
Re:Simple but burdensome solution by Bronster · 2001-11-26 15:38 · Score: 3, Informative

\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d.?\d .?\d.?\d.?\d

should match >99% of cc numbers. And a lot of other dross, but you can just pipe it into a mod10 checker

Putting the burden on me, the poor sap who wants to have my web pages indexed, to make sure that I don't accidently put any numbers on a web site that might be mis-interpreted as a credit card number (i.e. a tab or comma separated list of numbers would be likely to hit the above, especially if it was much longer than a CC number).

Not to mention the problem of recursive lookup on
a long number (the first 2000 digits of pi are 3.1415926535.......) - it would take an age to make sure there were no CC no's in that.

All together, it would cause 'innocent' pages to not be indexed, which is distinctly sub optimal.
Re:Simple but burdensome solution by Xerithane · 2001-11-27 06:03 · Score: 2

That wont match anything useful, maybe I'm just missing something but it looks ... pointless.

That would match this hex string:
3c5e2a992b3c2a151...(dont feel like finishing typing it out)

and a plethora of other valid data. The reason why this algorithm is ugly is because all numbers that are mod10 are not credit card numbers.
What does your hint mean anyway? What does that have to do with anything? Expiring sessions that remove themselves when they timeout wont matter (which is relatively easy, you just have a scrub process stat the session lock and purge session if access time is > timeout time.)

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by scrytch · 2001-11-27 14:15 · Score: 2

zzzzzzzzZZZZZZZOOOOOOMMMMM

the sound of my point going over the heads of both respondents. I was demonstrating how it was far from "prohibitively expensive" to scan for cc#'s, and a little more reading would have also revealed that I was saying that search engines SHOULDN'T have to be burdened with this nonsense, but it sure would be useful to people specifically looking for a cc#.

And my hint had to do with the fact that even password protected pages (that are not using http auth) can often be defeated by a link that contains a password, a fact that was pointed out well earlier in the discussion. Sessions that time out aren't susceptible to that problem, even a link with all the authentication info would still be invalid by the time most search engines got around to indexing it.

I'm gonna get pissy again: doesn't anybody read anymore?

--
I've finally had it: until slashdot gets article moderation, I am not coming back.
Re:Simple but burdensome solution by Xerithane · 2001-11-28 04:47 · Score: 2

Only if you made sense. Your regex was absolute drivel and didn't address any problem, nor would it work correctly because my guess you have no idea how credit card numbers are generated. They are mod10 numbers that have an algorithm based upon the expiration date (called the Lune Check, go google if curious) so if you have the expiration date (any 4 number string broken up in numerous different ways) and a 16 digit number you can match them up -- also anyone searching for credit card numbers with your regex is absolutely stupid and destined for failure.

The point wasn't lost on us, it was just simply a silly and overly pointless response. Links that contain passwords are a bad idea, but most "real" sites that offer password protection with session authentication don't have problems like sessions laying around for hijacking.
Also, a search engine that crawls upon link that contains password information would initiate a login/session pair on that site (given it is a session-based site) or just a persistent login (which is a bad idea anyway)
Feel free to get pissy; you are still wrong. And if that regex is a real demonstration of your coding abilities, you are way out of your league but I wont make a judgement about that - just a statement from a developer (who, incidently is writing a credit card processing engine at the moment) who knows the workings of credit card algorithms. And, on an unrelated side note just to vent: veriphone sucks.

--
Dacels Jewelers can't be trusted.
Re:Simple but burdensome solution by scrytch · 2001-11-28 08:17 · Score: 2

My regex would be the first of a chain of filters, using a pipe or lazy evaluation, that would still throw out 99% of pages it crawled, passing the remaining piece to a more computationally expensive algorithm. This is called a heuristic, see? This is a win when you have two machines, see? The second filter would probably try to look for something like an expiration date, which is a smaller string and thus is less optimal as a search.

I didn't realize I was going to have to teach a damn CS class about how to layer because some slashdoterati find it necessary to flaunt their coding dick size at every turn.

--
I've finally had it: until slashdot gets article moderation, I am not coming back.
Re:Simple but burdensome solution by Xerithane · 2001-11-28 08:53 · Score: 2

Well, why don't you start out by writing a decent initial layer. And, you wouldn't want to use a regex because of computational overhead that is unnecessary for a fixed-length comparison as you put it. If you understood the obvious problems with your initial filter you might be qualified to teach CS on slashdot. Although, I really would like to see you run a query like that as your primary 'non-expensive' regex... it'd be a good laugh when you could do the same test in 2 lines of code that requires about half the overhead. Hell, with the overhead you save you could do a full mod10 check too - and actually get valid results. You wouldn't throw out 99% of the invalid pages too because your regex wasn't encapsulating only 16 objects - not even bounded so any page that had any number of sequential numbers would be passed to the second layer. That's fine and dandy but why not just do a comparison check that verifies it's a mod10 number in the first go? Oh right, because you are so l33t you don't need to worry about algorithm efficiency and overhead... my bad.

Posting code on slashdot does this, especially when you follow it up with a claim that you are some cool guy who understands all these principles of CS when it ends up you really have no clue how to design a high-load low-overhead algorithm to filter based on a relative scoring algorithm (in this case, it's a mod10 16-digit number scored at either 100% or 0% in the initial algorithm thereby ommitting any necessity for a pipe) the expiration date can also be computed after a CC number is found and inserted into an index query (passed two a second layer) and then using a lune check to calculate the proper pairing (with a score index based off of page/domain to filter the CC's to check against)

But hey.. you knew that right? This isn't coding, it's science. Big difference, go back to school.

--
Dacels Jewelers can't be trusted.

Google exploit patch for Apache by Anarchofascist · 2001-11-26 04:52 · Score: 4, Funny

% cd /var/www % cat > robots.txt User-agent: * Disallow: / ^D %

--
Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!

Insert foot in mouth.... by Crewd · 2001-11-26 04:52 · Score: 2, Interesting

From the article :

"Webmasters should know how to protect their files before they even start writing a Web site," wrote James Reno, chief executive of Amelia, Ohio-based ByteHosting Internet Services. "Standard Apache Password Protection handles most of the search engine problems--search engines can't crack it. Pretty much all that it does is use standard HTTP/1.0 Basic Authentication and checks the username based on the password stored in a MySQL Database."

And chief executives of a hosting company should know how Basic Authentication works before hosting web sites...

Crewd

Re:Insert foot in mouth.... by simong · 2001-11-26 05:01 · Score: 2, Funny

Not necessarily, they are chief executives after all.

The Problem of Search Engines and "Sekrit" Data by NTSwerver · 2001-11-26 04:55 · Score: 4, Funny

Please change the title of this article to:

The Problem Incompetent System Administrators

If data is 'sekrit'/sensitive/confidential - don't put it on the web. It's as simple as that. If that data is available on the web, search engines can't be blamed for finding it.

--
-----------------------
Moderator's essentials

Re:The Problem of Search Engines and "Sekrit" Data by Genom · 2001-11-26 06:53 · Score: 2

I wouldn't try and blame someone else if I left the keys in my car's ignition and someone stole it.

While true, that it's a bonehead move to leave your keys in the ignition, the presumption that you would solely be to blame for the theft of your car would be wrong. The person stealing the car would still be to blame for the actual stealing, you're just making it hellishly easy for them to do so.

Now, in regards to search engines, it would be similar to leaving your keys in the ignition, and having a search helicopter see your car, land, and put up a big flashing neon sign saying "Hey! Whoever left this car here, your keys are still in the ignition!"

A car is a pretty bad analogy, though, when it comes to Google's cache - because cars don't replicate =)
Re:The Problem of Search Engines and "Sekrit" Data by mosch · 2001-11-26 07:18 · Score: 2

terrible analogy. a much better one would be if you left the keys in your car, and parked it under a sign that said 'free cars, it is 100% legal and will be appreciated if you would please remove any car you'd like from this parking lot'

This is what happens when you use frontpage... by Grip3n · 2001-11-26 04:55 · Score: 5, Informative

I'm a web developer, and I don't know how many times I've heard people who are just getting into the scene talking about making 'hidden' pages. I'm reffering to those that are only accessible to those who click on a very tiny area of an image map, or perhaps find that 'secret' link at the bottom of the page. Visually, these elements seem 'hidden' to a user who doesn't really understand web pages and source code. However, these 'hidden' pages look like giant 'Click Here' buttons to search engines, which is what I'm presuming some of this indexing is finding.

The search engines cannot feasibly stop this from happening, each occurance is unique unto itself. The only prevention tool is knowledge and education, and bringing to the masses a general understanding of search engine spidering theory.

Just my 2 cents.

--
To make a pun demonstrates the highest understanding of a language

Re:This is what happens when you use frontpage... by onion2k · 2001-11-26 05:02 · Score: 2

Often worse than that.. the dreaded visibile:hidden CSS/DHTML that the likes of Dreamweaver is so keen on.. what the eye can't seen the robot certainly can..

--
http://twitter.com/onion2k
Re:This is what happens when you use frontpage... by EccentricAnomaly · 2001-11-26 05:51 · Score: 2, Insightful

C|Net seems to think the security problem is with Google:

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

This is crazy. Google isn't doing anything wrong. The problem is with the idiots who don't spend five minutes to check that their secret data is really hidden.

This is like blaming a dog owner when his dog bites a burgler... er uh, nevermind.

--
There are 10 types of people in this world, those who can count in binary and those who can't.
Re:This is what happens when you use frontpage... by Sabalon · 2001-11-26 16:02 · Score: 2

Yuh...Ford thought the same thing - we have this car that we can offer to our users without thinking about how they could be used in a crime.

Didn't the courts just say this was a bogus argument in gun crimes?

Example by squaretorus · 2001-11-26 04:55 · Score: 5, Informative

I recently joined an angel organisation to publicise my business in an attempt to raise funds. The information provided to the organisation is supposed to be secret, and only available to members of the organisation via a paper newsletter which was reproduced in the secure area of the organisations website.
A couple of months down the line a couple of search engines, when asked about 'mycompanyname' were giving the newsletter entry in the top 5.

Alongside my details were those of several other companies. Essentially laying out the essence of the respective business plans.

How did this happen? The site was put together with FP2000, and the 'secure' area was simply those files in the /secure directory.

I had no cause to view the website prior to this. The site has been fixed on my advice. How did this come about? No one in the organisation knew what security meant. They were told that /secure WAS!

It didn't do any damage to myself, but a few of the other companies could have suffered if their plans were found. Its not googles job to do anything about this, its the webmasters. But a word of warning - before you agree for your info to appear on a website ask about the security measures. They mey well be crap!

I've got a solution! by CraigoFL · 2001-11-26 04:56 · Score: 5, Funny

Every web server should have a file in their root directory called "secret.xml" or somesuch. This file could list all the publicly-accessible URLs that have all the "secret" data such as credit card numbers, root passwords, and private keys. Search engines could parse this file and then NOT include those URLs in their search results!

Brilliant, huh? ;-)

On second thought, maybe I shouldn't post this... some PHB might actually think it's a good idea.

Re:I've got a solution! by Plutor · 2001-11-26 07:29 · Score: 2

That's a fantastic idea! Altho it should be a text file, and we should call it robots.txt or something like that.

(Note: this kind of thing already exists, and its already called robots.txt)
Re:I've got a solution! by po_boy · 2001-11-26 07:36 · Score: 2

s/secret.xml/robots.txt/g
http://www.robotstxt.org/wc/norobots.html

Google exploit patch 0.2 for Apache by Anarchofascist · 2001-11-26 04:57 · Score: 2, Funny

Oops! Version 0.2 already:

% cat > /var/www/html/robots.txt User-agent: * Disallow: / ^D %

--
Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!

Bad.. but by boaworm · 2001-11-26 04:58 · Score: 2

Such problems have existed for quite a while. Hackings, Crackings, internet sniffing etc.

The real issue is not if you can.. but if you actually do use the information. Regardless of if it is available or not, it IS ILLEGAL. (Carding does give rather long prison times as well)

People had the chance to steel from other people for as long as mankind existed. This is just another form... perhaps a bit simpler though ...

--
Probable impossibilities are to be preferred to improbable possibilities.
Aristotele

How this happens by Tom7 · 2001-11-26 04:59 · Score: 5, Informative

People often wonder how their "secret" sites get into web indices. Here's a scenario that's not too obvious but is quite common:

Suppose I have a secret page, like:
http://mysite.com/cgi-bin/secret?password=admini st rator

Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).

Now suppose elsewhere.com runs analog on their web logs, and posts them in a publically-accessible location. Suppose elsewhere.com's analog setup also reports the contents of the "referer" header.

Now suppose the web logs are indexed (because of this same problem, or because the logs are just linked to from their web page somewhere). Google has the link to your secret information, even though you never explicitly linked to it anywhere.

One solution is to use proper HTTP access control (as crappy as it is), or to use POST instead of GET to supply credentials (POST doesn't transfer into a URL that might be passed as a referrer). You could also use robots.txt to deny indexing of your secret stuff, though others could still find it through web logs.

Of course, I don't think credit card info should *ever* be accessible via HTTP, even if it is password protected!

Re:How this happens by Garfunkel · 2001-11-26 05:05 · Score: 2, Informative

ah yes, analog's reports (and other web stat programs) are a big culprit as well. Even on local sites. If I have a /sekrit/ site that isn't linked to from anywhere on my site, but I have a bookmark that I visit often. That shows up in web logs still and usually gets indexed by a web log analyzer which can "handily" create links to all those pages when it generates the report.

--
-jay
Re:How this happens by frankie · 2001-11-26 05:20 · Score: 2, Troll

Suppose I have a secret page, like: http://mysite.com/cgi-bin/secret?password=administ rator
Then it's a pretty crappy secret. Plaintext passwords sent via GET are weaker than the 4 bit encryption in a DVD or something.
Suppose this page has some links on it, and someone (maybe me, maybe my manager) clicks them to go to another site (http://elsewhere.com/).
If the page is really truly supposed to be secret, then it won't have external links, and you'll filter it out of your web logs too. Or you could just suck.
Google doesn't kill secrets. PHBs and MCSEs kill secrets.
Re:How this happens by mlinksva · 2001-11-26 07:36 · Score: 2

If you do have a "sekrit" internal page with external links, you can prevent the referer from being sent by changing your links from
<a href="http://foo.com">Before Bar</a>
to
<a href="javascript:window.location='http://foo.com'" >Before Bar</a>

Where is YOUR speech license? by Unknown+Poltroon · 2001-11-26 04:59 · Score: 2

WHere have you put your license to speak yoour mind on slashdot? Surely, people cant go around putting anything they want to say into a public forum. They might say anything. A a matter of fact, we must revoke peoples phone privliges until lthey can proove theyre smart enough not to give out credit card numbers to telemarketers. As a matter of fact, lets just legislate intelegence. We can tack it on as a rider for that bill to make Pi = 3.
Youre a nitwit. Im revoking your speech licnese on slashdot.

--
All Troll + "offtopic" mods are meta moderated as "Unfair", because you abused the system.

Basic Authentication by KyleCordes · 2001-11-26 05:01 · Score: 3, Insightful

[know how Basic Authentication works before hosting web sites]

... and know that it's a wholly inadequate way of "protecting" credit card numbers!

robots.txt by mukund · 2001-11-26 05:02 · Score: 2, Interesting

From my web logs, I see that a lot of HTTP bots don't care crap about /robots.txt. Another thing which happens is that they read robots.txt only once and cache it forever in the lifetime of accessing that site, and do not use a newer robots.txt when it's available. It'd be useful to update what a bot knows of a site's /robots.txt from time to time.

HTTP bot writers should adhere to using information in /robots.txt and restricting their access accordingly. In a lot of occasions, webmasters may setup /robots.txt to actually help stop bots from feeding on junk information which they don't require.. or things which change regularly and need not be recorded.

--
Banu

Re:robots.txt by mobiGeek · 2001-11-26 07:56 · Score: 2

Though I do concur that 'bots should respect the robots.txt protocol, one must remember that /robots.txt does not solve the problem being highlighted by this article.

--
...Beware the IDEs of Microsoft...
Re:robots.txt by WNight · 2001-11-26 18:50 · Score: 2

Why should bot authors have to follow robots.txt as if it's a law or something?

Robots.txt is as much for their benefit as it is for the benefit of the site author. It saves the bot from indexing something that might confuse it, CGIs that auto-generated an infinite number of pages for example.

If the bot author knows this, and wants to see these, why shouldn't they read them?

Bringing this up in an article about google suggests that they don't follow the robots.txt (even though you didn't say it directly). And it implies you think this would have been fixed had they.

When do site authors actually have to take responsibility for this? If you really object to someone mirroring or indexing your site, block that. Either with the user agent, or by detecting sequential accesses, or something.

Re:To test your credit-card ordering site... by Legion303 · 2001-11-26 05:02 · Score: 2

go to Google, type in "site:yourdomain.com xxxx-xxxx-xxxx-xxxx" where the x's are the credit card number of a known customer.

Then watch the fraudulent charges fly when the person who was sniffing cleartext HTTP traffic gets it in his logs.

-Legion

Many crawlers ignore robots.txt by Ars-Fartsica · 2001-11-26 05:02 · Score: 3, Interesting

I do not know if this is still the case, but Microsoft's IE offline browsing page crawler (collects pages for you to read offline) ignored robots.txt last time I checked. I know many other crawlers do likewise.

Re:Many crawlers ignore robots.txt by Syberghost · 2001-11-26 06:33 · Score: 2

Other crawlers that do listen to robots.txt can be duped into effectively ignoring it.

For example, try this with wget sometime:

wget -r somesitethathasrobots.txt
su -
chown root:root robots.txt
cat /dev/null >robots.txt
chmod 0000 robots.txt
exit
wget -r somesitethathasrobots.txt

voila, wget now thinks it's observing robots.txt, but robots.txt is a zero-length file, and it can't overwrite it because only root can write to that file...
Re:Many crawlers ignore robots.txt by tjwhaynes · 2001-11-26 06:51 · Score: 2

That is a little extreme! Just add 'robots = off' to your .wgetrc file and wget will ignore any robots.txt on the site it is crawling.
This is extremely bad netiquette so DON'T DO IT

Cheers,

Toby Haynes

--
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
Re:Many crawlers ignore robots.txt by jesser · 2001-11-26 15:13 · Score: 2

Why is it bad netiquitte to use wget on sites that use robots.txt? Robots.txt is aimed at search engines and is primarily used to keep search engines from downloading dynamic data or an infinite number of pages. I only use wget to avoid downloading a large number of links manually, and I'm always careful to make sure I only download what I'm trying to download.

--
The shareholder is always right.
Re:Many crawlers ignore robots.txt by Syberghost · 2001-11-26 17:44 · Score: 2

True, but my example is more extendible to the general case of ANY crawler that observes robots.txt.

Oh, for regular expression searching in Google by EnglishTim · 2001-11-26 05:03 · Score: 5, Funny

I could be a rich man...

(Not, of course that I'd ever do anything like that...)

Searching with regular expressions would be cool, though...

Directory listings by NineNine · 2001-11-26 05:03 · Score: 2, Informative

Most of tihs is coming from leaving directory listing turned on. Generally, this should only be used on an HTTP front-ends to FTP boxes, and for development machines. IIS has "directory browsing" turned off by default. Maybe Apache has it turned on by default? You'd be surprised to see how many public webservers have this on, making it exceedingly likely that search engines will find files they weren't meant to find. The situation arises when there's no "default" page (usually index.html or default.html, default.asp, etc.) in a directory and only a file like content.html in a directory. IF a SE tries http://domain.com/directory/, it'll get the directory listing, which it can, in turn, continue to spider.

Must... blame... someone.... by JMZero · 2001-11-26 05:04 · Score: 3, Funny

INetPub means "INetPublic" not "INetPubrobably a great place to put my credit card numbers".

Why are stupid people not to blame for anything anymore?

--
Let's not stir that bag of worms...

Business Model by Alomex · 2001-11-26 05:05 · Score: 5, Funny

A while back there was a thread here about the weakness of the revenue model for search engines. Maybe we have found the answer, think about all the revenue that Google could generate with this data!

Anybody knows when Google is going public?

well golly gosh, it works! by Anonymous Coward · 2001-11-26 05:05 · Score: 2, Informative

search for: password admin filetype:doc

My first hit is:

www.nomi.navy.mil/TriMEP/TriMEPUserGuide/WordDoc s/ Setup_Procedures_Release_1.0e.doc

at the bottom of the html:

UserName: TURBO and PassWord: turbo, will give you unlimited user access (passwords are case sensitive).

Username: ADMIN and PassWord: admin, will give you password and system access (passwords are case sensitive).

It is recommend that the user go to Tools, System Defaults first and change the Facility UIC to Your facility UIC.

oh dear, am I now a terrorist?

Bring out the legal eagles by Milican · 2001-11-26 05:06 · Score: 4, Insightful

"Webmasters should know how to protect their files before they even start writing a Web site"

That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.

To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light :)

JOhn

--
Campaign for Liberty

Re:Bring out the legal eagles by malkavian · 2001-11-26 06:52 · Score: 2

Hmmm.. All I've seen lawyers do so far (in the most part anyhow) is be employed by the people with the money (i.e. the stupid people who are likely to put insecure documents on the web) to make it illegal to look at the stuff they don't want you to see.
The credit card data, although in a public area was "Not authorized for Transmission". Which means that any access by a bot was unauthorized access to the machine in question.
This, as I understand it, is now being classified as a terrorist act. If not, at least a highly illegal action.
Thus, Search engines are now tools of terrorists/criminals. The provision of those numbers (if they are ever used) could be set up in a case as direct theft by the search engine, or complicity in the final actions. The owners of the search engine could probably end up in court if owners of 'sensitive information' ended up deciding to pass the blame on and sue.
This is going to be another license to print money by a few Lawyers who decide it's worth fighting a few cases.
I'm hoping sense prevails, but I think in the long run, some silly person is likely to sue...
Re:Bring out the legal eagles by Milican · 2001-11-27 07:41 · Score: 2

Thats a good point. I guess the only way a firewall could help is if you setup another server (web or other) on another port and then only allowed selective access to that directory. Of course, a better approach (ando more secure) would be to only allow secret data to be accessible via the intranet or VPN. However, this isn't very likely since most small shops are hosted remotely and have multiple (100s) of web stores per server. Great point though firewalls would not help in this case.

JOhn

--
Campaign for Liberty

Re:Nothing to do by Jburkholder · 2001-11-26 05:08 · Score: 2

Far as I can tell from checking out the article and then trying this myself on Google is that you can now target your search to specific filetypes. If you are dumb enough to store passwords or creditcard numbers in an xls file on your website, google makes it easy to find.

I'm at a loss to explain how someone puts sensitive information on the web in an unprotected location and then points the finger at google because they made it easier to find.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

No easy solution in sight? by vrmlguy · 2001-11-26 05:08 · Score: 2

From the article: "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security." -- Gary McGraw

Allow me to disagree. This fellow apparantly agrees with Microsoft that people shouldn't publish code exploits and weaknesses. Sorry, but anyone who had secret information available to the external web is in the exact same boat as someone who has an unpatched IIS server or is running SQL Server without a password.

Let's assume that Google had (a) somehow figured out from day one that people would search for passwords, credit card numbers, etc, and (b) figured out some way to recognize such data to help keep it secret. Should they have publisized this fact or kept it a secret? Publicity would just mean that every script kiddie would be out writting their own search engines, looking for the things that Google et al were avoiding. Secrecy would mean that a very few black hats would write their own search engines, and the victims of such searches would have no idea how their secrets were being compromised.

But this assumes that there's someway of accomplishing item (B), which I claim is very difficult indeed. In fact, it would be harder to accomplish than natural language recognition. Think about it... Secrets are frequently obscure, to the point that to a computer they look like noise. Most credit cards, for example, use 16 digit numbers. Should Google not index any page containing a string of 16 consecutive digits? How about pages that contain SQL code? How would one not index those, but still index the on-line tutorials at MySQL, Oracle, etc?

The only "solution" is to recognize that this problem belongs in the lap of the web site's owner, and the search engine(s) have no fundamental responsibilty.

--
Nothing for 6-digit uids?

And please close the door on the way out.... by pwagland · 2001-11-26 05:09 · Score: 2

But other critics said Google bears its share of the blame.
"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Am I the only one scared by this? The problem is googles, simply because they follow links? I find it hard to believe this stuff sometimes!

<rant>When will people learn that criminals don't behave? That is what makes them criminals!</rant>

As our second year uni project we were required to write a web index bot. Guess what? It didn't "behave". It would search through a robots.txt roadblock. It would find whatever their was there to find. This stuff is so far from being rocket science it is ridiculous!

Sure, using Google might ease a tiny fraction of the bad guys work, but if Google wasn't there, the bad guys tools would be. In fact, they still are there.

Saying that you have to write your client software to work around server/administrator flaws is like putting a "do not enter" sign on a tent. Sure, it will stop some people, but the others will just come in anyway, probably even more so just to find out what you are hiding.

True. by tomblackwell · 2001-11-26 05:09 · Score: 2

It will stop the casual perusal of your data.

The way to stop the determined snooper is to not keep your data in a directory that can be accessed by your web server.

Sure enough. by Joe+Decker · 2001-11-26 05:12 · Score: 3, Interesting

Looked up the first 8 digits of one of my own CC numbers, and, while I didn't find my own CC # on the net, I did immediately find a large file full of them with names, expiration dates, etc. (Sent a message to the site manager, but this case is pretty clearly an accidental leak.)

At any rate--scary it is.

--
I'm a nature photographer.

Re:Sure enough. by Nate+Fox · 2001-11-26 09:33 · Score: 2

OMG...I did the same thing. I found a 1.3M text file with ~3500 CC#s, names, expiry dates, et al. Heck, its a comma delimited file with email addresses and even comments on 'how did you hear about our site?' (to which one person responded: My friend at school told me about it and then i heard it with my ears.)
Actually, it seems as if it was obtained from PimpIT.com, so I suggest to anyone who has bought anything from them online to look into it.
(thankfully my name isnt in there)

Don't know that this is Google's problem.. by sid_vicious · 2001-11-26 05:13 · Score: 2

From the article:
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

Search and replace "Google" with "Microsoft". The lack of security is in the operating system and the applications which launch the malicious files without warning the user. Google just tell you where to get 'em, not what to do with 'em.

--
If it ain't broke, it doesn't have enough features yet.

Web Sites are public by definition by hattig · 2001-11-26 05:14 · Score: 4, Insightful

It is a simple rule of the web - any directory or subdirectory thereof that is configured to be accessible via the internet (either html root directories, ftp root directories, gnutella shared directories, etc) should be assumed to be publically accessible. Do not store anything that should be private in these areas.

Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.

But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).

Disagree With Gary McGraw by devnullkac · 2001-11-26 05:16 · Score: 4, Insightful

Near the end of the article, there's a quote from Gary McGraw:

The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief.

I must say I couldn't disagree more. To suggest that web site administrators can somehow entrust Google to implement the "obscurity" part of their "security through obscurity" plan is unrealistic. As an external entity, Google is really just another one of those "bad guys" and the fact that they're making your mistakes obvious without actually exploiting them is what people where I come from call a Good Thing.

--
What do you mean they cut the power? How can they cut the power, man? They're animals!

Re:Disagree With Gary McGraw by Software · 2001-11-26 06:49 · Score: 2

I agree with your disagreement. The amusing part is that, in the proper context, McGraw's second sentence in his statement makes perfect sense. However, given the context here, it's nonsense. Google is not the insecure system here. It's the silly webmasters who have secret data at publicly accessible URLs that are the problem. Nobody cracked Google to get sensitive data - it's doing what it said it would do. From the quote, it would seem like people are abusing Google; instead, it's the webmasters who are abusing the users who entrusted them with sensitive data.
I would not say, though, that Google is making the webmasters' mistakes obvious. Google doesn't notify webmasters, "Hey, you're an idiot. Fix your site". Furthermore, if I'm a webmaster who thinks there might be some sensitive info from my site in Google, how do I use Google to find it? OK, I could figure out how to search Google for pages only from my site that contain "passwords" or something like that, but that's a bit much for a clueless webmaster to do. If he thought that might reveal a problem, he should know where to look without checking Google. I'm not faulting Google; it's not Google's responsibility to hit webmaster with the clue stick.
Unless McGraw's statement was taken hopelessly out of context (which is quite likely), he's an idiot. It's not Google's responsibility to think about security of other people's sites.

standing naked in front of the window by eddy+the+lip · 2001-11-26 05:17 · Score: 3, Interesting

But other critics said Google bears its share of the blame.

"We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software.

also known as ostrich security...if you're s00p3r s3cr37 files are just lying around waiting for idle surfers, search engines are the least of your worries. if you don't know enough to protect your files (by, say, not linking to them, or .htaccess files, or encrypting them), it's not the search engines fault. it's you're own dumb ass.

this guy's just looking for free hype for his book. if that's the kind of advice he offers, he's doing more harm than good.

--

This is the voice of World Control. I bring you Peace.

Re:standing naked in front of the window by SirSlud · 2001-11-26 05:34 · Score: 2

I agree. I was going to say the same thing as the subject of your post .. should the cencus guy get charged with being a peeping tom if he comes up to your house while you're buck naked in front of an open livingroom window?

> .. "software to behave itself."

When asked to clarify furthur, Gary said, "Uh .. you know, like .. uh, C3PO .. and .. uh, Data." I hate idiots like that ... I suppose he thinks Windows should 'ask' you if you want to install viruses (cause heaven forbid a user should have to know anything about protecting their computer), and your hard drive should kindly suggest you upgrade a few days before it bites the dust? Yeah .. technology .. thats the problem .. we just havn't invented anything perfectly enough yet. Sigh .. grab the O'Reilly, it's time for a good old-fashioned CTO beating.

--
"Old man yells at systemd"
Re:standing naked in front of the window by SirSlud · 2001-11-26 07:28 · Score: 2

Yep. Well, unfrotunately, we're still stuck in the rut of thinking technology solves social problems. Fraud is a social problem, but thinking we can create technology to prevent it (beyond reasonable measures that inconvenience noone but go a long way to prevent spur-of-the-moment offences and casual fraud) is a paradox. Creating technology that cannot be used for questionable purposes is impossible. People who study the interaction between social behaviour and technology know its pretty much the other way around; technology changes and evolves human behaviour above and beyond how it was pre-adoption of a technology, but never ever causes humans to /stop/ doing something.

For instance, the breathalizer that may be installed on car ignitions in the future may prevent some drinking in driving, it is far more likely to change people's behaviours surrounding the issue of drunk driving - for instance, it may form a social pattern where by sober friends help drunk friends start their cars. Engineers (including software engineers) are unable to devine how their innovations will be used (ie, nobody can predict the future beyond reasonable assumptions). Should we be putting the inventor of the aerosol whipped cream can in jail for getting all those high school students high ... ?

More poingant than the actual way the government goes about assinging blame for misused technology is the hypocracy of it all. Considering the pace at which we are forced to invent and deploy to drive capitalism, these types of situations should be considered the cost of doing business the way it's currently done. Now we are stuck with an infrascructure that only a MINUTE percentage of the population actually understands, and the creators of that technology are the ones being blamed for its mismanagement and misuse. Sigh. I can only hopy that in 30 years, the population will be more in tune with the limitations of computer and software technology, in the same way that people have enough of an understanding of cars now so as not to immediately lay blame on the manufacturer when someone drives at 180kph into their local river.

--
"Old man yells at systemd"
Re:standing naked in front of the window by SirSlud · 2001-11-26 09:09 · Score: 2

> cancel the merchants account of anyone compromising data that badly

Which goes back to my point about the public understanding of a technology. You can't do this right now, because in the public's eyes, werbserver admin = smart guy, web crawler = invasive nameless/screenshotless technology. Therefore, public opinion, should push come to shove, would likely fall on the shoulders of Google. I would imagine, in this case, that Joe Shmoe likens Google seeing your credit card number to hooligans breaking into your house, not as the census taker who spots you naked through your curtainless bedroom window.

As He Who Owns And Runs What becomes clearer to the average joe (usually though sitcom jokes, cliches, movies, Journalists Finally Getting It (it usually happens eventually after years of a technology existing) etc), the consumer will know where the brunt of the blame lies. And while we have too much blame in our society, it does have its legitimate place (as in, publicly accepted accountability and a fair assement of a failure), so effort to make sure the blame is going to the proper places, and making sure the support from the public is directed at the correct entities, will enable the accountability in situations to fall where it should; in this case, the users operating/admin'ing the webserver. But until that admin isn't your uncle or brother (hopefully, it'll be 'you' as being able to offer services becomes more accessible to users), and Google is seen as more of a cencus taker than a peeping tom, it's unlikely that the public at large will know who to pressure for administrative mismanagement of your sensitive data, and it's subsequent accessability via search engines.

--
"Old man yells at systemd"

Hint, Hint. by A_Non_Moose · 2001-11-26 05:22 · Score: 2, Insightful

"The underlying issue is that the infrastructure of all these Web sites aren't protected."

Agreed. Such lax security via the use of Frontpage, IIS, .asp and VBS in webpages.
You might as well do and impression of Duncan in the movie Shrek "Ooo! Ooo! pick me! pick me!"

Webmasters queried about the search engine problem said precautions against overzealous search bots are of fundamental concern.

Uhh...they are "bots"...they don't think, they do.
Does the bot say "Oh, look, these guys did something stupid...let's tell them about it."

No, they search, they index and they generate reports.

I've seen this problem crop up before when a coworker was looking for something totally unrelated on google.
Sad part was it was an ISP I had respect for, despite moving from them to broadband.
What killed my respect was at the very top of the pages was "Generated by Frontpage Express"...gack!

I don't recall if it was a user account or one of their admin accounts...but for modem access I kind of stopped recommending them, or pointed out my observations.

I have to parrot, and agree, with the "Human Error" but add "Computer accelerated and amplified".

It happens, but that does not mean we have to like it, much less let it keep happening.

--
Have you read the moderator guidelines? Well, have you, PUNK? (and I want a Karma: Gnarly option)

fun by British · 2001-11-26 05:27 · Score: 2

Try doing a search of the file WinsockFTP leaves(WS_FTP.LOG?). You'll get hundreds of hundreds of results, and you just might find unlinked files mentioned in it.

Of course, there's always good fun going into the /images/ directory(since virtual directories are on by default) on any angelfire user pages. Often you'll find images that the user didn't intend on the public to see.

Of cousre, there's the old fashioned way. If you see an image at http://www.whatever.com/boobie3.jpg, chances are there's a boobie1 and boobie2.jpg.

funny? by 3am · 2001-11-26 05:30 · Score: 2

are you kidding?

they are talking about sensitive personal information - just don't store this online.

if you really need to access something (that isn't a credit card number... just don't do that!) and don't have physical access to the box, try SSH or at least make sure it's a secure directory (httpS://blah/mystuff...)

--

A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.

Totally unrelated to Google by Sloppy · 2001-11-26 05:33 · Score: 2

If Google can find it, then a human with a web browser can find it. That's all there is to it. Have info you don't want to share? Then don't share it!

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:regular expressions to the rescue by 3am · 2001-11-26 05:39 · Score: 2

or these sloppy admins could store them in encrypted form and/or in a private directory....

i'm sure google knows of a dozen ways they can do this, but why should they? it isn't prohibitively hard to write a spider, and with a 160GB HD for $300, someone with not-so-pure motives and the equivalent of an undergrad education in CS could write one, send it out (ignoring robots.txt), do the reverse of that regex search to sniff out cc#'s online, and create a database full of beer money.

ie, (as has been mentioned n+1 times already) Google changing their behavior does nothing to fix the underlying problem of sysadmins that are undertrained and/or irresponsible.

--

A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.

Why are they blaming the search engines? by eison · 2001-11-26 05:46 · Score: 2

"But other critics said Google bears its share of the blame."
Why?!

Google is finding documents that any web browser could find. The fault belongs to the idiots who publicly posted sensitive documents in the first place. Why doesn't the article mention this anywhere? Garbage reporting if I've ever seen it.

--
is competition good, or is duplication of effort bad?

Blaming Google for this... by night_flyer · 2001-11-26 05:52 · Score: 3, Funny

Is like blaming the Highway department for speeders...

--

Thanks to file sharing, I purchase more CDs
Thanks to the RIAA, I buy them used...

comp.risks by coyote-san · 2001-11-26 06:00 · Score: 2

Many years ago on comp.risks somebody actually looked at the contents of a number of robot.txt files - he wondered if they could be used as a quick index into "interesting" files. At the time, erroneous use of the file was still pretty rare... but I'm sure that was a selection effect that is no longer valid.

Bottom line: that standard may be intended for one behavior (robots don't look in these directories), but there's absolutely nothing to prevent it from being used to support other behaviors (robots look in these directories first). If you don't want information indexed, don't put the content on your site. Or at a minimum, don't provide directory indexes and use non-obvious directory names.

--
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken

Re:comp.risks by jfunk · 2001-11-26 06:38 · Score: 2

Um, robots.txt should *not* be used for security reasons. That's just stupid.

It is best used to tell crawlers not to bother with pages that are simply useless to crawl. If I ran a site containing a text dictionary in one big html file, I should use robots.txt. If I had a script that just printed random words, I should disallow that too.
Re:comp.risks by coyote-san · 2001-11-26 09:33 · Score: 2

Um, we aren't disagreeing. Not one bit.

But just because we think that this file shouldn't be used for security purposes doesn't mean that some idiot won't come up with this "bright idea." Just because the spec is intended to list directories and files that a robot shouldn't index doesn't mean that someone won't write a robot that actively seeks them out.

--
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken

Directory searches by wytcld · 2001-11-26 06:03 · Score: 4, Insightful

Some search engines don't just check the pages linked from other pages on the server, but also look for other files in the subdirectories presented in links.

So if http://credit.com/ has a link to http://credit.com/signin/entry.html then these engines will also check http://credit.com/signin/ - which will, if directory indexes are on and there is no index.html page there, show all the files in the directory. In which case http://credit.com/signin/custlist.dat - your flatfile list including credit cards - gets indexed.

So if you're going to have directory indexing on (which there can be valid reasons for) you really need to create an empty index.html file as the very next step each time you set up a subdirectory, even if you only intend to link to files within it.

--
"with their freedom lost all virtue lose" - Milton

Re:Directory searches by epsalon · 2001-11-26 11:11 · Score: 2

Better still, you can simply make all dirs in your webserver except those to be indexed not world-readable, and NEVER put secret data in your public web area anyway.

--

Make even shorter URLs - 8LN.org

Checklist for HTTP Distribution of Sensitive Data by Gleef · 2001-11-26 06:03 · Score: 3, Informative

First, determine if you really need to distribute this via HTTP. It is far easier to secure other protocols (eg scp), so if there's another way of doing this, do it.

Second, if the sensitive information is going to a select few people, consider PGP encrypting the data, and only putting the encrypted version online. Doing this makes many of the HTTP security issues less critical.

Assuming you still have to put something sensitive online, make sure of the following:

Only use HTTPS, never use just plain HTTP.
Use CGI, Java Servlets, or some other server-side program technology to password-protect the site. I will refer to the resulting program(s) as the security program
Never accept a password from a GET request, only accept them from POST requests.
Never make the user list or password list visible from the internet, not even an encrypted password list.
Never place the sensitive information in a directory the web server software knows how to access. Only the security program should know how to find the info.
Review all documentation for your web server software and the platform used for the security program. Pay special attention to seciurity issues, make sure you aren't inadvertently opening up holes. Keep current, do this at minimum four times a year.
Subscribe to any security mailing lists for your web server platform operating system web server software, and for the programing platform you used for the security program. If there is anything else running on this machine, subscribe to their security mailing lists too.
Subscribe to cert-advisory and BugTraq. Read in detail all the messages that are relevant to your setup. Review your setup after each relevant message.
Don't use IIS.
Don't use Windows 95/98/Me. Don't use Windows XP Home Edition.
Don't use any version of MacOS before OS X.
Don't use website hosting services for sensitive information.
Never connect to this webserver using telnet, ftp or FrontPage. SSH is your friend.
Never have Front Page Extensions (or its clones or workalikes) installed on a webserver with sensitive data.
If there is anything above that you don't understand, or if you can't afford the time for any of the above, hire a professional with security experience and recommendations from people you trust who have used his or her services. It's bad enough that amateurs are running webservers, much less running ecommerce sites and other sites with sensitive data.

The above is an incomplete list. It is primarly there to start giving people an idea of how much effort they should expect to put into a properly administered secure website with sensitive information. Do you really need to distribute this via a web browser?

--

----
Open mind, insert foot.

Re:Easy solution by Arethan · 2001-11-26 06:05 · Score: 2

My appologies. I should have been more clear in my intent. Yes, simply masking credit card numbers in pages would allow people to simply search for the mask and follow the same link google did in order to see the unmasked result.

However, my intention was simply to remove Google's legal implication of storing credit card numbers that were not willingly given by the cardholder. They could also autonomously send an email to webmaster@offendingsite.com notifying them of the potentially vulnerable link, entirely from the kindness of their hearts. But, legal issues in the past have shown that this would result in a cease-and-desist and a lawsuit against Google claiming that the crawler/spider has been hacking their website.

Judging from the past, from a legal standpoint, the best thing they can do is simply filter their cached content. If you are worried that people are going to search for ################, then disallow searching for something so erroneous. Or simply change the mask from all # to random special characters.

It's really not that difficult of a solution. Yes, it's a little disturbing that some websites are this easily hacked, but are we really all that surprised? Get into the low-end ecommerce business sometime. You'll be surprised (frightened even) with what some people have been using for their online stores.

MicroSoft Passport Credit Card # avaliable by peter303 · 2001-11-26 06:07 · Score: 2, Interesting

The new issue of "2600" all but gives a kiddie
script for extracting credit card numbers from
the Passport database. Scary. Dont buy anything
through it until they fix it.

Re:MicroSoft Passport Credit Card # avaliable by PaperTie · 2001-11-26 06:33 · Score: 2, Informative

Actually not. The article simply discussed how the Passport system uses cookies to store users' information and how you could possibly get the cookies from a user that still has them. It doesn't detail anything about accessing some magical databse, nor does it mention credit cards.

Are crawlers only using links? by n-baxley · 2001-11-26 06:11 · Score: 2

I haven't looked into how the new crawlers are working. I assume that they still follow links from page to page, but are there new types of crawlers that could be searching the directory sturctures of a site? Not that this excusses the webmasters, but it might explain some of the new search results.

--

THIS SPACE FOR RENT

Blissful ignorance backfires again. by hkmwbz · 2001-11-26 06:12 · Score: 3, Interesting

That a search engine is able to harvest this kind of data just proves that some people don't know what they are doing. Forgive me if I seem judgmental, but these people are probably the same people who think Windows XP is the next step and that IE is the only browser in the world. But as is proven again and again, ignorance backfires. Not only are they attacked by viruses and worms and have all backdoors and security holes exploited - they are ignorant enough to leave users' data in the open, for everyone to get.

Google's comment was:

"The primary burden falls to the people who are incorrectly exposing this information."

This is where they should have stopped. Those who find their credit card information in a search engine will learn a lesson and use services that actually take care of their customers' security and privacy. Google shouldn't have to clean up incompetent people's mess.

In the long run, these things can only lead to the ignorant (wannabe?) players in the market slowly dying because they don't know what they are doing.

I personally hope someone gets a taste of reality here, and that only the serious players survive. The MCSE crowd may finally learn that there's more to it than blind trust in their own (lacking) ability.

--
Clever signature text goes here.

Gary McGraw, super-genius. ;) by bacchusrx · 2001-11-26 06:14 · Score: 2

"The guys at Google thought, 'How cool that we can offer this to our users,' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief." - Gary McGraw (quoted in the CNet article).

*blinks*

Well, actually, Gary, it seems to me that it isn't Google that's been caused any grief here, but, those wembasters who didn't "think about security from the beginning." In fact, it looks like Google runs a pretty tight ship.

This is the kind of guy who blames incidents.org for his web server getting hacked. After all, they weren't thinking about security from the beginning, were they?

Riight.

BRx ;)

--
Life after capitalism? The participatory economics project

For those who must use IIS by JMZero · 2001-11-26 06:33 · Score: 3, Informative

I agree with all of your assertions, except

"Don't use IIS."

This just isn't an option for a lot of people. I would change this to:

"If you use IIS, you need to make sure you check BugTraq/cert EVERY day."

I would also add:

"If you use IIS with COM components via ASP, make sure the DLL's are not in a publicly accessible directory."

This happens a lot, and makes DLL's lots easier to break.

--
Let's not stir that bag of worms...

Different file types make my day by srichman · 2001-11-26 06:39 · Score: 3, Interesting

The big complaint of the article is that Google is searching for new types of files, instead of HTML.

The only people who complain about this are obviously the folks using crossed fingers for security. The rest of us love that Google indexes different file types.

I'll never forget the day I first saw a .pdf in Google search result. Not that long ago I saw my first .ps.gz in a search result. I mean, how dope is that!? They're ungzipping the file, and then parsing the postscript! Soon they'll start uniso-ing images, untarring files, unrpming packages, .... You'll be able to search for text and have it found inside the README in an rpm in a Red Hat ISO.

Can't wait until images.google.com starts doing OCR on the pix they index...

Pusing stuff around by Helmholtz · 2001-11-26 06:44 · Score: 2

"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security..."

Interesting that this is being pushed off onto Google. I think a more appropriate phrase would be "The guys at though, 'How cool that this website is so easy to set up' without thinking about security...."

--
RFC2119

Weakest Link of Security: Human by Pollux · 2001-11-26 06:51 · Score: 2

Alright, so I admit, I was a little curious about how dumb people are with their passwords so I tried the search. It's simply amazing how careless people are with their security...

Here was a simple document found using the exact search method listed above that is just the Minutes from some board meeting. In it, they actually LISTED a website to log into as well as the password required to get in! Right in the minutes! The website is no longer available, so I'll actually post the text from the minutes...

Minutes of the Gulliver Meeting at Carlton Library 17.8.01

...

7. Assessment of database products

B_ spoke briefly about the online tool that is an outcome of work done at Monash University for Libraries online, he will make the URL available so that evaluation of the usefulness of the tool may commence.

The tool is at http://130.194.38.42
Password admin

Talk about careless. Even if you're positive that the minutes document won't be posted on the web, you certainly don't go and actually write it onto something that will be distributed to the public! A hard copy (aka paper) of access to a server is just as dangerous as it being stored online.

The problem is that people don't realize that it's not save to distribute private information through ANY public medium.

Re:out of the loop by CraigoFL · 2001-11-26 07:06 · Score: 2

"Pointy Haired Boss". Go grab a book of Dilbert comics if you don't happen to know what that is.

Re:out of the loop by CraigoFL · 2001-11-26 07:10 · Score: 2

Dammit, my reply to your message wasn't. Sorry. Check it out here: http://slashdot.org/comments.pl?sid=24151&cid=2614 769

Password search by azaroth42 · 2001-11-26 07:19 · Score: 3, Interesting

Or for more fun, do a search like

filetype:htpasswd htpasswd

Scary how many .htpasswd files come up.

-- Azaroth

Check your own mouth too by fizbin · 2001-11-26 07:49 · Score: 2

And "Interesting" posts should know what they're saying, but one rarely gets everything one wants.

The point: the poster is implying that there's some mismatch between looking the password up in a mysql database and doing HTTP/1.0 Basic Authentication. There isn't - the phrase "HTTP/1.0 Basic Authentication" refers to how the password is sent over the wire. The server can look up the password by carrier pidgeon for all that that matters.

It's true that the standard Apache password mechanisms look things up in flat files and not a mysql database, but that's not what the poster said.

DMCA by C. · 2001-11-26 08:09 · Score: 2, Interesting

> You should be writing that type of data on the backs of envelopes and leaving them scattered around your living room...

Not much worse than some "commercial-grade" encryption...

Maybe somebody should consider suing Google under the DMCA. I haven't studied the DMCA with enough detail to be sure of this (and much less studied law, for that matter), but i guess Google is easily guilty of the following "crimes" against modern society:
- linking to decryption algorithms
- linking to reverse enginnering tools
- linking to passwords that could be used to circumvent somebody's copyright.
- storing and distributing all the above (with google's cache)

As I understand current legislation, Google should not even have the right to define what is public or not like they're trying to do. Even the safe-harbour provisions do not immunize them from having to remove unlawful content.

Such a lawsuit would make for an interesting debate, and with a bit of luck could get us all rid of this stupid law.

C.

--
C.

Re:Robots search by links right? by mobiGeek · 2001-11-26 08:13 · Score: 2

So create a folder called "mystuff" and keep everything in it...

That might work fine for you, but when your pin-headed manager or PFY find out about this cache of documents, you can bet your bottom dollar they'll add a link off their homepage or some brilliant location like that.

You mightn't link to it...but someone eventually will.

Also, realize that many robots don't guess at URLs...but it would be trivial to create a 'bot which did just hack away.

for i in `strings /usr/lib/aspell/* | sort -u` do wget "http://www.3l33tsyt.kom/$i" done

--

...Beware the IDEs of Microsoft...

Maintains flying saucers and keeps alien bodies... by Futurepower(tm) · 2001-11-26 08:18 · Score: 2

QUESTION 23: What national-level intelligence assets are available to you, the warfighter?

ANSWER: Area 51 -- Maintains flying saucers and keeps alien bodies in the freezer.

Okay, how did you do that?

(If you read Slashdot enough, sooner or later you see everything.)

--
Bush's education improvements were

Check out Question 28, also: by Futurepower(tm) · 2001-11-26 08:22 · Score: 2

If you throw a cat out the window of a car, does it become kitty litter?

--
Bush's education improvements were

But, that's nothing. Download the entire document. by Futurepower(tm) · 2001-11-26 08:41 · Score: 2

Download the entire document from the U.S. military web site: lg6.doc

U.S. ARMY COMMAND AND GENERAL STAFF COLLEGE

S 510/0 Strategic, Operational and Joint Environments

Lesson Guide for Lesson 6
National and Theater Command and Control

Third bullet under question 28: "If you throw a cat out the window of a car, does it become kitty litter?"

Hey, military commanders, don't be mis-treating cats!!!

How U.S. government policy contributed to terrorism: What should be the Response to Violence?

--
Bush's education improvements were

Re:Nice work, Legion303. by well_jung · 2001-11-26 08:42 · Score: 3, Funny

"Trees cause more pollution than automobiles do." --Ronald Reagan '81

--
Carl G. Jung
--
"With one breath, with one flow, You will know Synchronicity" -La Policia

Re:Not just credit cards by GigsVT · 2001-11-26 09:14 · Score: 2

Why not just tell them to fuck off. If they want to control who links to their feed, then they should.

As long as it is publicly available, I seriously doubt they could successfully charge you for it.

It's like setting up a large arch in a public park, and when people walk under it, demanding $100,000 from them.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.

Re:More Fun with Google by plover · 2001-11-26 10:26 · Score: 2

At least some of those numbers are fakes. I believe they're the postings of people falling for the "order me some stuff" gag web page. (and I do mean "gag".)

This is certainly across the borderline of unethical and nearing the boundaries of "illegal."

John

--
John

Re:More Fun with Google by hawk · 2001-11-26 12:42 · Score: 2

> i found a page with transaction data from some small web merchant. it
> utilized the unfathomably secure method of Black Text on Black
> Background. something any Neo who's surfing Source Code could pick up.

Not to mention what pages look like under Lynx . . .

hawk, lynx user

No, search for the numbers instead by leonbrooks · 2001-11-26 12:49 · Score: 2

/me goes to search for "credit card"

/ME would be better off searching for a known-good credit card number.

If you find it, it might lead you to many other credit card numbers - but first cancel the one that you found, and sue the company exposing it. Ask them if they have the box their computer came in. (-: And maybe post the URL to CERT as a vulnerability :-)

--
Got time? Spend some of it coding or testing

It doesn't last by kimihia · 2001-11-26 13:55 · Score: 3, Informative

Because 28 days after you took your page offline it will disappear from the Google cache.

Google reindexes web pages, and if they 404 on the next visit, then good bye pork pie! You have to get them while they are hot, eg, when a site has JUST been Slashdotted.

An advertisement for publicfile by kimihia · 2001-11-26 14:03 · Score: 3, Informative

Perhaps it would be a good idea after reading this article to examine publicfile.

It was written by a very security conscious programmer who realises that your private files can easily get out onto the web. That is why publicfile has no concept of content protection (eg, Deny from evilh4x0r.com or .htaccess) and will only serve up files that are publically readable.

From the features page:

publicfile doesn't let users log in. Intruders can't use publicfile to check your usernames and passwords.
publicfile refuses to supply files that are unreadable to owner, unreadable to group, or unreadable to world.

A good healthy does of paranoia would do people good.

Re:Not just credit cards by penguinboy · 2001-11-26 14:21 · Score: 2

As long as it is publicly available, I seriously doubt they could successfully charge you for it.

If their newsfeed is copyrighted, I doubt they'd have much trouble prosecuting someone for duplicating it elsewhere without authorization (especially if elsewhere was a commercial site). Looking at unsecured data may be possible, but using illegally is still illegal.

Re:Checklist for HTTP Distribution of Sensitive Da by Gleef · 2001-11-26 14:53 · Score: 2

I have found web server based authentication systems limited, weak, and hard to integrate into authorization systems (to determine whether the given user is allowed to access the given information). Granted, my experience is limited to Apache and Netscape Enterprise Server.

In addition, the client-based authentication scheme that is triggered by web server based authentication doesn't allow for logging out in a manner that is consistant across browsers. Having the ability to log out is critically important for a good security system.

Your mileage may vary.

--

----
Open mind, insert foot.

Re:To test your credit-card ordering site... by DavidTC · 2001-11-26 18:01 · Score: 2, Insightful

Erm...so you're just going to magically verify them without knowing them?

Here's a big hint: Not everyone is running some sort of completely automated, completely external validation service, and, duh, if they aren't, they need to know the numbers so they can actually charge the people.

About the only reason they shouldn't be in your computers somewhere is if you're using a third party to handle all that stuff...and then they will be in their computer. They, rather obviously, have to exist somewhere to be send to the CC companies.

--
If corporations are people, aren't stockholders guilty of slavery?

Assumptions. by AftanGustur · 2001-11-26 19:46 · Score: 2

How does the Google Cache avoid legal entanglements, both for stuff like cc numbers and copyright/trademark infringement?

Huh, ? Maby because it isn't illegal at all ?
Think about it, where does it say it's illegal ?

--
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc

Print out this article !! by AftanGustur · 2001-11-26 19:49 · Score: 3

No, seriously, do it !
Print it out and hand it on the wall, then put a post-it note on top of it saying : "The best example of 'blaiming the messenger' ever !!!"

--
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc

Google realtime query info by pne · 2001-11-27 01:57 · Score: 2

By the way, does google have that realtime display of what people are searching for?

Not to my knowledge. When I asked Google whether such a facility exists, they said no -- but they did point me to Google Zeitgeist, which gives "Search patterns, trends, and surprises according to Google". Usually published once a week and showing e.g. the top 10 gaining and losing queries of that week. So you get some interesting info, but it's not realtime by any description of the word.

--
Esli epei etot cumprenan, shris soa Sfaha.

Slashdot Mirror

The Problem of Search Engines and "Sekrit" Data

160 of 411 comments (clear)