Googling Your Way Into Hacking
knifee writes "New scientist is running an article explaining how hackers can use Google's cache to quickly hunt down sensitive pages, for example, by searching the terms "bash history", "temporary" and "password".
Might be worth looking at this tutorial about robots.txt if you think you might be at risk." That's pretty amusing.
They should mention that disallowing a URI in robots.txt tells crackers which URIs on your site have sensitive information. What I do is create a top-level /unpub/ URI, and everything sensitive goes underneath it with hard-to-guess names. In robots.txt I disallow /unpub only.
Robots.txt only makes well-behaved search engines not index certain portions of your site. You're still going to be vulnerable until you take the sensitive pages off-line completely. But even then, if a passwords list has been indexed by Google, updating your robots.txt file won't remove it from Google's cache until Google spiders your site again. At which time, Google will discover the passwords list doesn't exist and remove it from the cache.
At least that's how it should work. Is anyone aware of Google requesting robots.txt more often than they spider pages? And then proactively removing pages from their cache based on new robots.txt entries?
While the article deals with Google specifically, lots of non-well-behaved spiders go through common locations looking for password files regardless of what you've blocked out with robots.txt. The only way to completely protect your data is to remove it from your site.
my blog
A friend of mine actually used this to steal ICQ numbers. He wrote a perl script wich googles from "00000001.idx 00000001.dat" to "99999999.idx 99999999.dat" and spits out the result links to a textfile if it gets a full match.
;)
The ICQ password is stored in one of those two datafiles and there are dozend of free decrypt programms for that out there.
But if you think about it... how or why does someone put his ICQ directory on a webserver?!
On the other hand... some people are hosting pr0n sites and dont even know about it
--
One by one the penguins steal my sanity...
I regarding the ability to use Google as a warez search machine. The article was about Google censorship and the one response to my post pinpointed almost exactly the point that I brought up, which is the point discussed in this article.
Google has a nice long list of directory lists containing warez (remember the days of l33t FTP searching for filenames? Google for something like, in my last article: "xwin32*.exe * * * * *" "listing of"), serial numbers (Oh, I've found XP's serial number several times in Google's cache) and other "sensitive" information. My question is if other commercial sites are being constantly shut down due to these links (intentional or not), why aren't people targeting Google as well?
In fact, if I'm *cough*too cheap to buy software*cough* or just want to evaluate some crippleware or such before I buy it, I often skip astalavista and cracks.am and just Google it up. Saves me the porn and pop ups, and I don't have to cripple my browser for this (yes I know it's possible to do in other ways, yes I enjoy javascript, no thanks, I don't want comments about how I'm retarded because I don't do it the right way).
This is similar for sites such as the Internet Archive's Wayback Machine that contains other sensitive information.
Because of the academic merit of both of these search mechanisms, I doubt either one will be shut down. Indeed, I highly doubt restrictions will be placed. They're valuable tools for finding more valuable tools. For more information about this sort of stuff, I suggest searching on Fravia+'s web-searching lore. Other information on there relates to "reality cracking", reverse engineering, and other taboo topics. Google's got it all cached. Interested? Just search for (insert topic here) site:searchlores.org.
Sometimes I don't think the comparison of Google to God is that far off. Pardon my heresy.
Kind regards, Devon H. O'Dell
A programmer is a machine for converting coffee into code.
You can probably use this to set up "honeypots" which may be legal in States where traditional fake services would be considered illegal as entrapment.
Simply set up a virtual machine (user-mode linux is a good one for this). Have the root account publicly read/write and somehow "accidently" visible to httpd.
Have the login shell a program which acts as your honeypot, logging activity, tracing back to the user, etc. All the stuff honeypots do so well.
Next is to ensure that the root password is visible, plain-text, and in a file that is visible to search engines. Your average skript kiddie is not going to question the apparent generosity of the admin. To get the engine to find the account, you probably want to have your main web page link into your virtual machine's root account - say via an FTP.
Now, none of this is entrapment, in the sense that the person must pro-actively attempt to present a false identity before the service is accessed. There can be no question that the identity of any user logging in is fake, that the user logging in knows that it is fake, and that there has been a deliberate, pre-meditated attempt to compromise an account.
If you want to go one step further, have the login shell transfer some goodies, such as cpuburn. Now, these have to have a "legit" use by a "legit" user, as anyone who gets burned is likely to complain. You have to be able to stand your ground and say "hey, I use this service as a convenient way to do hardware tests on remote machines - I locked that account against intruders, so if an intruder gets in, it's not my fault if they get burned."
(If you leave something dangerous "just lying around", you could probably be held accountable if someone gets hurt, even if they were stupid or malicious. But if you make a "reasonable" attempt to deny access, then it's not your problem.)
In fact, if you do any freelance tech stuff, you might very well use the service for real as a way of fetching over stress-testing software. It would make it a lot harder for "victims" of your root snare to complain, as you could then prove a legitamate use by legitamate users - the victim not being one of them.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)