Malicious Injection — It's Not Just For SQL Anymore
nywanna writes "When most people think of malicious injection, they think of SQL injection. The fact is, if you are using XML documents or an LDAP directory, you are just as vulnerable to a malicious injection as you would be using SQL. Bryan Sullivan looks at the different types of malicious code injections and examines the very basics of preventing these injections."
When most people think of malicious injection, they think of SQL injection.
Come on now, considering your audience, you might want to re-think that statement.
Push Button, Receive Bacon
Shell scripts have been vulnerable to similar "injection" exploits since the invention of CGI.
From TFA:
Seems simple enough, but it's amazing how often this is ignored or implemented badly.
I hear there's rumors on the Slashdots
...is to replace database storage, xml, and ldap with comma-delimited text files on anonymous ftp. In fact, my last job fired me for gross incompetence because the other programmers were jealous of the simplicity of my solution.
"including LDAP injection and XPath injection. While these may not be as well-known to developers, they are already in the hands of hackers, and they should be of concern."
How come they are not well known to developers. Last time I checked if I dont use ldap somewhere along my lines
of codes I'm not in trouble of a ldap injection. Know your systems and check yours inputs! god damn!
Overuse of the Pumping Lemma causes blindness
In his XML example with XPath injection he states that running a certain query can return the entire order history of all customers. That may be true, but if the application is returning an XML document containing the entire order history of all customers for each customer request before running an XPath query, then I think the application has more problems than being vulnerable to XPath injection.
Bob
Listen to my latest album here
RE: validating input fields...
I can't help but feel that most developers have at least a little common sense and do something along those lines anyway.
I often write little validate_input(char *string, char *format) that checks all input string from a user are simple, but more often than not very effective. How is this any different from using white and black lists. Any coder worth their salt would do something to stop malicious input, but no one in completely infallible.
Security of anything in this world is near on impossible. Hackers will get around anything given time.
Signature v3.0, now with 42% less memory usage.
I blame Microsoft for a lot of these vulnerabilities.
I recently attended a Microsoft-sponsored seminar on web site security at the DeVry Institute in Decatur, GA.
One of the speakers was a man from SPI Dynamics (sorry, forgot his name). He demonstrated how Microsoft's tools make it very easy to expose a database to the web, but how the same tools make exploiting the database very easy. He demonstrated an application that used SQL injection to first reverse-engineer the schema of an exposed database, and then the data in the database. It was quite a revelation.
668: Neighbour of the Beast
Heh, remember when we had binary file formats and protocols, fixed-length fields (didn't need delimiters), and there was no parsing or worrying about "escaping" data? We didn't have these problems.
Anyway, I like this article because it admits that whitelists are better than blacklists. You have to validate data: make sure it is known to be non-harmful, rather than looking for whatever problems that you have imagined so far. You'll never guess all the things that can go wrong; you just know what is right.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Any user input should be scrubbed sanitized and checked before using it
This has been true since the dawn of programming. NEVER trust the user. Oh before it was just entering text when the program expected an integer, or a negative value when it expected a positive etc. Now we don't get "? Redo from start" errors that crash the BASIC programs. But it's essentially the same thing. Never expect the user will cooperate with the program. Especially a program that is available to potentially malicious people out on the internet.
Seven puppies were harmed during the making of this post.
Phishers have been known to use frame injections to insert their content into framesets, allowing them to grab login info from within the bank's own web site. It's not nearly as fancy as an SQL injection, but it's sure malicious and quite difficult for victims to recognize.
RichM
Data Center Knowledge
"It's not just for breakfast anymore."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
A webmaster hosts a contact form on his website that allows users to fill out a form to contact him. He allows the user to specify a subject and a message but the recipient is hard coded to webmaster@example.com.
The message ends up looking like this:
Where $subject and $message are captured from the user on the website.If the $subject is not properly sanitized, a bot could submit it with a new line in it and be able to start a new line in the headers of the email. That new line could be, for example, a large CC list of people to spam with his message:
Which is why I would suggest using a contact form such as the one that I have written that has already thought about this sort of thing.
It's a simple matter of hygiene:
Wash it before you eat it.
All data read from external sources must be validated before being used. In some languages/frameworks this is as hard as nails (ie. I programmed a pretty large web application with only straight CGI programs written in pure Unix/C), in some you have help (Perl with taint), in some it's kinda-sorta-almost not an issue (PHP with Agavi and Creole).
If I had to choose, I would have to say that the middle way, the Perl way, is the best. It does not pretend to solve all your problems for you, even when it can't really. Rather it brings the problems at hand to your attention. Problems surface, fix problems, code gets better.
I choose to remain celibate, like my father and his father before him.
Then your program needs to be aware of LDAP, SQL, XML and XPATH syntax. Validating user input, as in using regular expressions, will protect you from "FutureML" injection attacks without the need of knowing how "FutureML" will work. In my opinion validating user input IS the correct way of doing it.
moi
I think you're right - as long as you are sure that you know what's going to be done with the data after its been written away to your database. You might have your escaping/quoting routine solidly implemented for all inputs to your system, but the trainee down the hall who writes the reporting application that parses the table once a month might not be so savvy - the cunningly crafted SQL injection attack that your quoting has preserved and saved away into the db could wreak havoc when it gets read out again at the other end. The same goes for any HTML/XML that has been saved away, and then gets blindly written out by a web developer on the Order Summary page, or merged into some larger XML document without proper checks.
I suppose an apt analogy would be saying that it's ok to allow infectious material into a building as long as it is first correctly sealed in a bio-safe container - well that's true as long as you're sure the janitor isn't going to open it up later that evening and use it for a cookie jar.
Each value was put in "quotes"...
Comment removed based on user account deletion
If you develop software that follows the usual layered model (web, business, persistence), you have code in place to isolate the web bit from the database bit. MS tools sort of shortcut the web/database bit, making it easier to exploit what's in the database.
Hope this helps.
668: Neighbour of the Beast
It is well known amongst the more experienced software developers out there that all user input to ANY software system should be considered suspect and therefore must be checked for invalid inputs, boundary, and special cases. The solution has been around for decades, but it is really surprising how many developers out there have NOT heard of regular expressions or do not know how to properly use them. There are some cases, usually when widely variant free-form input is required, that are difficult to use with regular expresssions, but for the most part they have proven to be remarkably effective in my own experience and I use them regularly (pun intended) in my website and application development. If you have not gotten in on the regular expression game then consider picking up the O'Reilly Mastering Regular Expressions book or visiting Regular-Expressions.info before building your next project. The project you save might be your own!
There is a solution to injection vulnerabilities, but it's not validation. Sure, if you validate everything properly, you won't suffer from injection vulnerabilities. However, writing the code for that is cumbersome and error-prone, and thus, in practice, we see that values are not or not properly validated.
The solution I've been championing is structured composition. Instead of verifying that the input won't alter the structure of whatever you're composing, you use APIs that ensure that this won't happen. Some examples of this, as well as other bug-eliminating language/library features, can be found in my essay Better Languages for Better Software.
Please correct me if I got my facts wrong.
So far everyone seems to be focusing on "input" and forgetting about "output", or even mixing the two.
Anyway, my suggestion has always been to do something like the following:
Inputs to your program
|||
Corresponding Input filters
|||
Your program
|||
Corresponding Output filters
|||
Outputs from your program
|||
Stuff receiving the outputs
You have a different "input filter" for each class of input so that your program can handle those inputs correctly.
Then you have a different output filter (e.g. SQL bind vars, HTML, XML) so that the stuff receiving your outputs (browser, database, viewer, etc) will handle them correctly.
NEVER do stuff like magic quotes (PHP is one of the worst and most braindead language in popular use) - mixing input and output filtering is so wrong it isn't funny (there are so many other things PHP does wrong that it's almost criminal).
Depending on the circumstances your program could output a single quote ' differently e.g. %27 for a cgi parameter, '' for Oracle data and \' for MySQL data (BTW MySQL is the PHP of databases). So it should be obvious that "one size fits all" doesn't work.
By filtering I mean quoting/encoding sanity checking etc - whatever it takes to get the data in a suitable form (with hopefully minimal data loss/corruption).