Slashdot Mirror


Googling Your Way Into Hacking

knifee writes "New scientist is running an article explaining how hackers can use Google's cache to quickly hunt down sensitive pages, for example, by searching the terms "bash history", "temporary" and "password". Might be worth looking at this tutorial about robots.txt if you think you might be at risk." That's pretty amusing.

28 of 431 comments (clear)

  1. Even better than Google by Anonymous Coward · · Score: 3, Interesting
    I tried this a while back - it isn't as easy as it looks with Google. I recently discovered WhittleBit and it is pretty good at narrowing down what you are searching for because it lets you indicate which search results are good and which aren't, and re-search on that basis.

    This is particularly useful for this type of thing since it isn't always obvious what the criteria are for what you want to search for - with WhittleBit you don't need to know, it figures it out for itself.

    1. Re:Even better than Google by lightcycle · · Score: 2, Interesting

      The bottom of the page has a "send feedback to Ian Clarke" mailto link, would that be the Ian Clarke that's behind freenet?

      --

      The stars that shine and the stars that shrink
      in the face of stagnation the water runs before your eyes
  2. aha! by Frymaster · · Score: 2, Interesting
    this explains the trememndous number of google searches for "index of /scripts" that come from google to my site...

    of course i have section on my site for bash scripts... and it has an index page. looks like someone got dissappointed.

  3. problem with robots.txt tutorial by brlewis · · Score: 5, Interesting

    They should mention that disallowing a URI in robots.txt tells crackers which URIs on your site have sensitive information. What I do is create a top-level /unpub/ URI, and everything sensitive goes underneath it with hard-to-guess names. In robots.txt I disallow /unpub only.

    1. Re:problem with robots.txt tutorial by brlewis · · Score: 3, Interesting

      Password-protected directories wouldn't need to be in robots.txt. Using robots.txt + security by obscurity is for things like family photos, where I don't want to maintain usernames and passwords for my entire extended family, but it isn't absolutely critical that no unauthorized person ever see them. I doubt I could trust my entire extended family to keep passwords secure anyway.

      Yeah, cheap shared hosting is largely insecure. I wonder how tough it would be to set up shared hosting using squid as an http accelerator, and let users run web servers under their own UID on different ports, while squid forwards from port 80.

  4. robots.txt? by Karma+Sucks · · Score: 4, Interesting

    You're kidding right? Putting stuff in robots.txt is the best way to *guarantee* that robots will go specifically for the file/directories you choose to deny.

    Don't be naive about robots.txt... expect to have to do some relatively fancy hacking to actually enforce it.

    --
    (Please browse at -1 to read this comment.)
    1. Re:robots.txt? by pclminion · · Score: 2, Interesting
      And that's why I have a disallow for a trap directory. Accessing it gets you added to a mysql database and you are blocked with iptables.

      Awesome! I'll post a link to that location on my web page. Everyone who clicks on it will be banned from your site, even though they aren't a spider!

      Oh, the fun I'll have...

  5. robots.txt by panaceaa · · Score: 5, Interesting

    Robots.txt only makes well-behaved search engines not index certain portions of your site. You're still going to be vulnerable until you take the sensitive pages off-line completely. But even then, if a passwords list has been indexed by Google, updating your robots.txt file won't remove it from Google's cache until Google spiders your site again. At which time, Google will discover the passwords list doesn't exist and remove it from the cache.

    At least that's how it should work. Is anyone aware of Google requesting robots.txt more often than they spider pages? And then proactively removing pages from their cache based on new robots.txt entries?

    While the article deals with Google specifically, lots of non-well-behaved spiders go through common locations looking for password files regardless of what you've blocked out with robots.txt. The only way to completely protect your data is to remove it from your site.

    1. Re:robots.txt by KenSeymour · · Score: 2, Interesting

      I think you have to do more than that to get it out of the cache.

      I once had family phone numbers on a web page. Upon reflection, I decided that was no good and deleted the web page.

      It remained in the google cache until I replaced the file with a blank one with the same URL.

      --
      "We can't solve problems by using the same kind of thinking we used when we created them." -- Albert Einstein
    2. Re:robots.txt by frodo+from+middle+ea · · Score: 5, Interesting
      Check out Sun's robots.txt

      Part i like best

      # If you do actually go to the trouble of figuring out how to download # the files without registering, what you'll end up with is 1 or 2MB of # stuff that is meaningless to you unless you have purchased an # Ultra AX board from Sun. So, please do purchase an Ultra AX board, # but then you might as well use the URL you'll be given along with it.

      --
      for the last time people, I am "frodo from middle eaRTH", not "middle eaST".
  6. robots.txt by zero-one · · Score: 4, Interesting

    Having a robots.txt is a good idea but it always amuses me when web sites use robots.txt to list all the areas of their site that they don't what people to look at. When robots.txt contains entries like "Disallow: /admin.asp" or "Disallow: /backdoor.asp" it stops being a way of controlling search engines and becomes a site map of all the places hackers might be interested in.

  7. ICQ by bazik · · Score: 5, Interesting

    A friend of mine actually used this to steal ICQ numbers. He wrote a perl script wich googles from "00000001.idx 00000001.dat" to "99999999.idx 99999999.dat" and spits out the result links to a textfile if it gets a full match.

    The ICQ password is stored in one of those two datafiles and there are dozend of free decrypt programms for that out there.

    But if you think about it... how or why does someone put his ICQ directory on a webserver?!

    On the other hand... some people are hosting pr0n sites and dont even know about it ;)

    --


    --
    One by one the penguins steal my sanity...
  8. BZZZZZZZT! Wrong! by Entropy248 · · Score: 2, Interesting

    I don't think so.

    I went through all 6 pages of results and found nothing. Ditto for searches on any of the terms individually. I imagine that searches on individual sites might be what the author is actually talking about, but have no independant means of verifying this. This FUD detected by Entropy248. Wow. I just RTFA and tried it at home...

  9. One word about the google cache... by presroi · · Score: 2, Interesting
    Some people think that the google cache does not reveal the host name to the http-server.

    The result looks like this:
    proxy1.health.magwien.gv.at - - [29/Jul/2003:22:27:14 +0200] "GET /hfaq/icons/linki.png HTTP/1.0" 200 278 "http://www.google.at/search?q=cache:QIq92lU3jkUJ: www.presroi.de/hfaq/+heroin&hl=de&lr=lang_de&ie=UT F-8" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; ENR 2.0 emb)"
    proxy1.health.magwien.gv.at - - [29/Jul/2003:22:27:14 +0200] "GET /hfaq/icons/bt3.gif HTTP/1.0" 200 3170 "http://www.google.at/search?q=cache:QIq92lU3jkUJ: www.presroi.de/hfaq/+heroin&hl=de&lr=lang_de&ie=UT F-8" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; ENR 2.0 emb)"
    proxy3.health.magwien.gv.at - - [29/Jul/2003:22:27:43 +0200] "GET /hfaq/stats.html HTTP/1.0" 200 5231 "http://www.google.at/search?q=cache:QIq92lU3jkUJ: www.presroi.de/hfaq/+heroin&hl=de&lr=lang_de&ie=UT F-8" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; ENR 2.0 emb)"
  10. phpmyadmin same thing by joeldg · · Score: 4, Interesting

    I have seen more phpmyadmin pages wide open on google that anything else.. Not putting things like that under htaccess at least is pure laziness and stupidity.

    Also it seems people put mysql dumps on their webservers as well..
    search for ' "SELECT * FROM credit" + "###" ' and you will see.

    This has been going on since google introduced the site cache.

  11. Re:Google is good for free money by anthony_dipierro · · Score: 2, Interesting

    Better to search for the first 8 digits of a known credit card number. Last time slashdot hahd a story about a site which was publishing credit card numbers on the internet, I googled for the first 8 digits of my CCN and found the site.

  12. My favorite: access_log by shoppa · · Score: 2, Interesting

    At least 5 years ago it was fairly common knowledge that if you found any webserver's access_log you would get some juicy URL's. The method still works...

  13. For more h4x0r fun . . by scarolan · · Score: 3, Interesting

    try searching for _vti_pvt and service.pwd on Google. There are lots of people still using frontpage 4.0 or whatever, with their frontpage password file in plain view. I won't tell you what to do with that file, if you don't know already.

  14. Google Warez Machine by dhodell · · Score: 5, Interesting

    I regarding the ability to use Google as a warez search machine. The article was about Google censorship and the one response to my post pinpointed almost exactly the point that I brought up, which is the point discussed in this article.

    Google has a nice long list of directory lists containing warez (remember the days of l33t FTP searching for filenames? Google for something like, in my last article: "xwin32*.exe * * * * *" "listing of"), serial numbers (Oh, I've found XP's serial number several times in Google's cache) and other "sensitive" information. My question is if other commercial sites are being constantly shut down due to these links (intentional or not), why aren't people targeting Google as well?

    In fact, if I'm *cough*too cheap to buy software*cough* or just want to evaluate some crippleware or such before I buy it, I often skip astalavista and cracks.am and just Google it up. Saves me the porn and pop ups, and I don't have to cripple my browser for this (yes I know it's possible to do in other ways, yes I enjoy javascript, no thanks, I don't want comments about how I'm retarded because I don't do it the right way).

    This is similar for sites such as the Internet Archive's Wayback Machine that contains other sensitive information.

    Because of the academic merit of both of these search mechanisms, I doubt either one will be shut down. Indeed, I highly doubt restrictions will be placed. They're valuable tools for finding more valuable tools. For more information about this sort of stuff, I suggest searching on Fravia+'s web-searching lore. Other information on there relates to "reality cracking", reverse engineering, and other taboo topics. Google's got it all cached. Interested? Just search for (insert topic here) site:searchlores.org.

    Sometimes I don't think the comparison of Google to God is that far off. Pardon my heresy.

    --
    Kind regards, Devon H. O'Dell
  15. Re:This happens because of dumb admins, not google by inertia187 · · Score: 5, Interesting
    It's happened to me. My .bash_history has contained passwords. Why? Because I'd type too fast and not look at the screen. For example:
    bash-2.05a$ ssh inertia@whatevre
    ssh: whatevre: no address associated with hostname.
    bash-2.05a$ f33lokihum
    Oops.
    --
    A programmer is a machine for converting coffee into code.
  16. Re:This happens because of dumb admins, not google by Cramer · · Score: 2, Interesting

    And on Linux, /bin/sh is bash. And you'd be very surprised to see how many "hackers" fail to clear out the history. It has been my experience that most of the nuts breaking into systems are mostly idiots simply running stuff someone else designed.

    I've never ran into a real hacker... they know how to cover their tracks so they aren't noticed. And, I don't have any systems containing information of any value from which the real hacker could profit (thus, I'm left alone.)

  17. Google file searching.... by Rahga · · Score: 4, Interesting

    I honestly know of nobody else who uses this technique, I just figured I would try it back when I was hunting down upgrades for old games like Quake 2 while places like FilePlanet were getting hammered:

    At google, type "index of", followed by the precise name of the file you are looking for.

    I'd say this gives me good results on a fast server 95% of the time.

  18. Not always dumb... depends on what's there by jd · · Score: 5, Interesting
    #include "IANAL.h"


    You can probably use this to set up "honeypots" which may be legal in States where traditional fake services would be considered illegal as entrapment.


    Simply set up a virtual machine (user-mode linux is a good one for this). Have the root account publicly read/write and somehow "accidently" visible to httpd.


    Have the login shell a program which acts as your honeypot, logging activity, tracing back to the user, etc. All the stuff honeypots do so well.


    Next is to ensure that the root password is visible, plain-text, and in a file that is visible to search engines. Your average skript kiddie is not going to question the apparent generosity of the admin. To get the engine to find the account, you probably want to have your main web page link into your virtual machine's root account - say via an FTP.


    Now, none of this is entrapment, in the sense that the person must pro-actively attempt to present a false identity before the service is accessed. There can be no question that the identity of any user logging in is fake, that the user logging in knows that it is fake, and that there has been a deliberate, pre-meditated attempt to compromise an account.


    If you want to go one step further, have the login shell transfer some goodies, such as cpuburn. Now, these have to have a "legit" use by a "legit" user, as anyone who gets burned is likely to complain. You have to be able to stand your ground and say "hey, I use this service as a convenient way to do hardware tests on remote machines - I locked that account against intruders, so if an intruder gets in, it's not my fault if they get burned."


    (If you leave something dangerous "just lying around", you could probably be held accountable if someone gets hurt, even if they were stupid or malicious. But if you make a "reasonable" attempt to deny access, then it's not your problem.)


    In fact, if you do any freelance tech stuff, you might very well use the service for real as a way of fetching over stress-testing software. It would make it a lot harder for "victims" of your root snare to complain, as you could then prove a legitamate use by legitamate users - the victim not being one of them.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  19. "/_vti_pvt" by domenic+v1.0 · · Score: 1, Interesting

    That was my favorite google search back in the good old days....finding the "service.pwd" or "admin.pwd" files, then cracking them with John the Ripper. Too bad that exploit is patched and next to non existant now :(

  20. Re:This is news? by karlandtanya · · Score: 2, Interesting
    Hmmmm... reply seems to have failed earlier...


    This situation is a consequence of living an open society that information which "should not" be available is available.


    This has nothing to do with google and cracking.


    Exactly the same situation was demonstrated in the '70's by Princeton student "John Artistole Phillips", better known as "The A-Bomb Kid". For him, it was the telephone, university and public libraries, and fission weapons instead of google and cracking.


    Again, news it ain't.

    --
    "Reality is that which, when you stop believing in it, it doesn't go away." - Philip K. Dick
  21. Re:This happens because of dumb admins, not google by drinkypoo · · Score: 2, Interesting

    You have to have execute permission on each interim directory between / and public_html (or whatever you have it set to on your server.) This is because the directory execute bit is the "change to this directory" bit. A lot of users fuck this up and just make their home directories world readable, or even writable. Just another reason to separate the user from his data whenever possible. The trick is to do it in a way that won't make them feel left out. Obviously some people are more willing to put in the time to learn the intricacies of an obfuscated system like Unix than others.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  22. Re:Entrapment by fizbin · · Score: 4, Interesting

    Probably not, but his statement of the situation squares with my experience when I talked to an FBI agent after having discovered (and logged) some IRC kiddies who were constructing a DDOS network out of sub7-infected machines.

    I'd created a sub7 honeypot on my linux box with a little perl script; after that collected the IRC server ip and channel name, I connected with a random username (pretending to be a bot) and just logged the conversation.

    The FBI agent interviewed me very carefully to make certain that my setting up monitoring, etc., was not in any way instigated by a law enforcement officer. (No, I'd just gotten annoyed at random SYN packets) Then, he had no trouble with it. I don't know if this makes the evidence I provided useable legally, but it never came to that. As he explained it, the question was whether I was acting as an agent of the state when setting up the honeypot. Committing entrapment is not anything that non-state actors ever need worry about.

    Not that this lets you off the hook entirely - there may be charges of wiretapping involved; monitoring your own machine should be safe legal ground, but connecting to the IRC network (as I did) is a slight bit more dicey legally, and shouldn't be done if you have any reason to believe that the relevant prosecutor would like to hang something on you as well.

  23. Scary, very scary by Hatta · · Score: 2, Interesting
    --
    Give me Classic Slashdot or give me death!