Secure Programmer: Keep an Eye on Inputs
An anonymous reader writes "This article discusses various ways data gets into your program, emphasizing how to deal appropriately with them; you might not even know about them all! It first discusses how to design your program to limit the ways data can get into your program, and how your design influences what is an input. It then discusses various input channels and what to do about them, including environment variables, files, file descriptors, the command line, the graphical user interface (GUI), network data, and miscellaneous inputs."
If you know your inputs, then you know your outputs. This is an important lesson to learn, but most of us already know this...
on your input!
Written by a contractor for sure!
Jamey Kirby
Doesn't windows throw random crap around?
This unpredictable event seems to screw with programs when the ram is low.
Also, those microsoft security holes we have seen in the past year of 2003 are no confidence to M$ security.
Lastly, Belkin routers are no good for security. AFter all, they hijack your http requests and direct you to somewhere you didnt want to go!
Grump.
Is it true that more people vote for the winner of American Idol, than vote for the president? -Ali G.
That goes against my ideas that you should allow any type of data to enter your program anywhere, anytime.
Microsoft products such as Outlook make the news? Garbage In Garbage Out.
You'd be wise to add Cross Site Scripting attacks to your list of things to protect against.
I believe code reviews with a large enough group of people to be extremely useful. Yeah, it takes time and you get some irritating comments from a few people about how there is a space between something or comma between something, but when multiple eyes look at it, someone always catches something you didn't. A few hours of extra pain on the side of programmers can prevent pain for millions in the form of blaster viruses, etc.
The article's worth reading, and really does justify it's "Level: Intermediate" label. Unlike when I was learning to program, there are lots of sources of input beyond your deck of punch cards (:-), and the author does a good job of explaining many of them, such as evil things that environment variables and file descriptors can be used for.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Yes, they did (for those working for Microsoft).
A truly "secure" program would have no inputs, but that program would be useless.
Not necessarily. What about a program which calculates pi or runs some kind of simulation? The 'input' is in the form of constants compiled into the executable. Technically there is no input, but the program is hardly useless.
NO fscking SHIT!
jesus christ on a pogo stick... somebody had to put this in an fsking security magazine?
Given that every single way to compromise security involves bad input, it's not surprising that it's in a security magazine.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
The Perl language has built-in "taint-checking" enabled via the -T command line switch which causes Perl to automatically keep track of all information that possibly came from a user input and not allow any of it to do anything harmful (basically end up on a command line or in a file name).
We don't see the world as it is, we see it as we are.
-- Anais Nin
So don't trust the user, double check the data, and make sure than and error in the entry will not damage the program (Buffer overflow anyone?). This all sounds like common stuff that has been rehashed over and over again.
I don't program much, but all I can think is "well.. ya."
Not to say it wouldn't be a nice intro, but by itself that artical didn't have any substance. Go into more sophisticated problems or something...
I'm inclinied to agree with that - actulay it seems like there has not been much the past few weeks. --- Anonymous Coward: Preventing Karam loss since 1997
It is a widely accepted engineering maxim that systems should be designed so that it is difficult to use them improperly. This is why (for example) a 110 volt plug will not fit in a 220 volt outlet. Developers who are concerned about the quality of the software they make would do well to follow this rule, and not just for security reasons. You should verify input data as early and as rigorously as possible wherever you can. Take advantage of things like XML validation and text box constraints to make it hard for users to enter bad data. And always follow the Fail-Fast principle...if something goes wrong: Complain! Loudly!. Don't let the user continue working if something has gone wrong. It's better to crash than to produce an erronous result.
Just a little advice from a developer who's made enough mistakes to know better.
And why should anyone be surprised? In this age of "I read a book on VB last week and now I'm a software engineer!" type environment?
I am not surprised that simple things like this are rehashed over and over. This is more suited to the programmer group of people who will sort data based on string comparisons, instead of learning how to use a real algorithm to do it, or keep writing static forms, instead of learning how to use a loop with a db backend - because they don't understand true programming concepts. In other words, about 80% of the current crop of overpaid, undereducated programmers that built corporate apps.
- Eric
You make it sound like the article only says that. Well, you might be an expert, but speaking as an experienced userspace programmer with only limited security knowledge, I can tell you that to me it's of great interest. The (historical) IFS vulnerability, for example - I would never have thought of checking for something like that.
Of course, you probably didn't bother to read the article before trashing it... this is Slashdot, after all.
Guess who's going to be working at that N. American office?
Hint: Not Americans
Conformity is the jailer of freedom and enemy of growth. -JFK
What is so interesting about this article?
I just wrote a document about secure programming and I found dozens of better articles about the exact same things like: here
Perl programmers interested in writing secure scripts should *definitely* know about the -T (taint checking flag).
From the FAQ:
As we've seen, one of the most frequent security problems in CGI scripts is inadvertently passing unchecked user variables to the shell. Perl provides a "taint" checking mechanism that prevents you from doing this. Any variable that is set using data from outside the program (including data from the environment, from standard input, and from the command line) is considered tainted and cannot be used to affect anything else outside your program. The taint can spread. If you use a tainted variable to set the value of another variable, the second variable also becomes tainted. Tainted variables cannot be used in eval(), system(), exec() or piped open() calls. If you try to do so, Perl exits with a warning message. Perl will also exit if you attempt to call an external program without explicitly setting the PATH environment variable.
I'm a bloodsucking fiend! Look at my outfit!
Before someone says "but your link is about a book, not an online article!" please see here
But you don't understand!
This is IBMs ON DEMAND Strategy!
:wq!
Michael dropped the soap again. Be gentle.
Ya you can talk about inputs to programs and how misc. and unwanted data get in there but watch for buffer overruns because thats what can really kill your program.
There is or can be built a machine that can simulate any physical object. -Church-Turing principle
Is news to others. Many "Programmers" out there write code that does not do any error checking or catching and the result is all the crapware that we see today. We were all warned in our programming classes about memory leaks and buffer overflows, but they are still very prevalent in today's software. Perhaps we should all look harder at our code before selling off one it as a final product.
The recommendations on dividing the program into unsecure and secure binaries to handle setuid access in GUI's can very properly be extrapolated to non-graphical programs. This is a very good strategy for allowing relatively wild programs access to important facilities and can involve many types of IPC including memory-mapped files (with proper protection) and sockets. To really secure a client program that needs access to criticals, put it in a chroot jail and have it communicate with an outside process through (e.g.) a socket. Separating programs into safe and unsafe sections and applying different security techinques to each is far more effective, imo, than trying to secure a single, large application. It can also provide many other benefits of encapsulation, etc. The security onus shifts to handling client requests in the secure section which is usually much more easy to do.
Hacking articles at http://www.geocities.com/chroo
Hint: Not Americans
There will probably be a number of Americans employed at TCS Buffalo--as administrative assistants and salespeople. There will also be a need for some janitorial types.
These job gains, however, will be offset by many multiples of job losses as TCS offshores IT, engineering, and back-office functions for American companies!
Hooray for Hillary!
You're on, foolio.
Props to GNAA
This will go on and on till the end of time.
Nothing is truly secure.
There will always be a hole somewhere.
It is only meant as a deterent.
There is (TTM) Time-To-Market which will screw up code reviews.
There is insiders, There are spies, etc, etc.
Take all that time to write a program and it is obsolete before it is released.
"Given that every single way to compromise security involves bad input, it's not surprising that it's in a security magazine."
What about program bugs that are not input related? If a program breaks when an internal timer overflows for example, or accessing a section of memory that has been deallocated. Such bugs can easily cause breaches in security as well as general system failure, all without any human intervention. It reminds me of the black out that Sterling mentions:
http://www.lysator.liu.se/etexts/hacker/crashing.Hacking articles at http://www.geocities.com/chroo
Java
.NET
XML
Basic point, I suppose, is that if you insist on using a U*ix-family OS in such an environment then you must ensure that the U*ix environment is clean at the beginning, which may well be more a matter of the procedures and quality control of the platform and the application deployment than of the individual apps.
Oh, and btw, I thought the head of the thread was a kneejerk reaction, but - flamebait? Shame on whoever moderated it that way.
How about when the applicacion is on the web? Javascript? Server roundtrips?
:: Andrea
Anime Wallpapers
How is this moving business offshore? It looks like she is bringing offshore businesses HERE.
Use your "critical thinking skills"! Why the fuck would a company that competes on price open an office in an "expensive" labor market? Do you really think they're in business to lose money?
Answer: The office is a sales office, not a design center. Tata gains a sales foothold in middle American where it can quietly win business from American firms desperate to reduce costs. Hillary can "claim victory" by appearing to encourage job creation in sales and sales support while the reality is she's increasing job losses in the high-tech sector.
Try actually reading the article.
It's a press release straight from the Tata public relations department, you dipshit.
Another excellent article by David... oddly enough, I was reading his Program Library HOWTO (http://www.dwheeler.com/program-library/) just the other day to learn about dynamic loading libraries in Linux.
Adding Ad Absurdium, Ad Nauseum, or Ad Infinitum to your .sig would create another round of Adding Ad Absurdium, Ad Nauseum, or Ad Infinitum to your .sig would create another round of Adding Ad Absurdium, Ad Nauseum, or Ad Infinitum to your .sig would create another round of [/snip]
The preceding comment has been documented as containing no EPHI and is certifiable as HIPPA Phase II Compliant.
inputs want to be free, as in beer and sex with hookers
The proliferation of proprietary formats we are seeing that all do basically the same thing, like send sound files over the net, or view video clips, are encouraging mass downloads of programs from third party providers. These programs may well do what they said they would do, but with all this DMCA crap going on, its getting harder and harder to see if they are doing a little extra that wasn't in the bargain, like doing zombie work on the side to assist in little capers the originating author needs to pull off.
What firewall or systems programming can stop a deliberately malicious program installed by an ignorant user? Say the program "demands" access to the internet for "verification/auto-update", then you have to set the firewall to allow this program access to the net. Now what happens? Its like giving car keys to a valet parking agent. You only have to trust he's only going to do what he says he will do. To add insult to injury, consider you generally have signed any recourse you have when you click that "I agree" button that confirms you have read and understood the EULA.
What irritates me so about these "plug-ins", "macros", and "scripts" is that they are indeed executable. Nothing says the malicious person coding these things is gonna follow the rules. He is free to code some really nasties in assembler if he so desires. The state of music file distribution I find really disturbing. We have an MP3 format which is generally well understood, yet it seems everybody jumping on the bandwagon wants to use proprietary formats which are not generally understood, leaving us all open to the risks resulting from ignorance.
As a public, we aren't helping much. We agree to any damn thing they print in the EULA. As a public, we should INSIST that if we are to be kept ignorant by law how something works, if that something does something malicious, then its maker should have full responsibility for the problems it generated.
Basically I am proposing a trade. If you want the protection of law to keep the public ignorant, then you waive indemnity.
We have a patent system and copyright system in place. Both were implemented on the concept that the work was to be in the open. Why aren't encrypted work also known as "trade secret" and not afforded protection by copyright or patent? Basically, any work encrypted would be considered a "trade secret", not in the open, hence not eligible for protection by the patent or copyright system at all? But to make this happen, its gonna take the will of a lot of people to pressure the legislators to enact this. Pressure as in "if you do not do this, start polishing your resume.".
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
I wrote a similar article recently for SysAdmin magazine, although the focus is more about Perl.
The Kernighan & Plauger book "Elements of Programming Style" dated 1979 talked extensively about the need to validate all inputs to subroutines and from the user. This is *not* new, it is just that few programmers have the discipline to follow the rules.
The issue is making *no* assumptions about anything. The programmer *thinks* the file will be written be another piece of code that a team member is writing. But that program has a bug. or three years from now, other programs are creating the file and don't know abut some verbal discussion about field data. It takes great dligence and paranoia and management that allows you the time in the schedule to do this.
The article is interesting, and they are right to point out the many dangers of relying on environment variables. Where I work (unidentified to protect the incompetent), programmers are not allowed access to the unix command line. Instead, all user exits are trapped, and programmers are forced to navigate through a homegrown menu system.
This menu system relies on an environment variable ${WHATCANIDO} to store a list of permissions available to that user. Of course, I changed my .profile to add my own extension to the permission list. I even nicely dated, initialed, and described my change. ;)
export WHATCANIDO=world_domination:$WHATCANIDO # 2000/10/31 tw Too easy
So now when I get frustrated with the absurdity of this arrangement, I just take echo the environment variable to remind myself why I'm right and they're wrong.
> echo $WHATCANIDO
world_domination: [deleted]
Somewhere along the line every application must trust something. At the very least, BIOS settings and environment variables that are owned by deeper layers of the OS must be trusted because they are inaccessible or indecipherable at the application layer. Reaching too far would break encapsulation and create brittle dependencies. An application can only check the variables and direct inputs that it has access to.
I don't argue against validating inputs. Certainly all of the direct inputs to an application should be assumed to be untrustworthy unless a secure checksum validates that the inputs are indentical to some previously validated inputs. Checking inputs (or environmental variables) of immediately adjacent processes is probably also warranted (as a redundant "brother's keeper" policy).
The real problem comes if the OS has a faulty validation methods. (And I won't get into the neccessity of trusting the hardware or bugs such as those that plagued the early Intel 586.00001 processors) If I check the validity of a user, filename, or geographically localized data format (e.g., a date), then my application is dependent on the quality of the OS's validator (and a lack of intervening malware).
Two wrongs don't make a right, but three lefts do.
>> overpaid, undereducated programmers that built corporate apps.
Why the f*** does somebody have to tell you that you need to validate input? Why the f*** isn't that obvious??? If that sort of thing isn't painfully obvious to you, you probably should be in a different line of work, educated or not.
What about addslashes()?
Since addslashes() is only truly useful when magic_quotes are off, I wrap it all in a function that checks the status of magic_quotes. Defined as something like:
dbprep($string, $length=-1);
So that if $length>0 the input cannot be longer than $length.
I do the same for databased output, EG: dbout().
Combine addslashes with a few others, such as htmlentities(), and perhaps a regex or something to check for [a-zA-Z0-9], and you have *very* powerful input validation.
Am I missing something?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Almost everything in this article only applies because of hacker languages like C and C++, which Linux and FreeBSD use for virtually everything. It is so easy to forget to double-check bounds, input format, pointers, and all the other usual suspects. It's bizarre how programmers will use these error-prone languages for marginal performance gains just because their ego and haxor status is on the line. Sure, the kernel and drivers need to be in C. Sure, a Java VM needs to be in C. Sure, C++ is a good langage for game engines. But almost nothing else should be written in C/C++.
Command-line type programs can be written in Java and statically compiled into small, low-memory, fairly fast programs. And the JVM overhead is has almost no affect on the larger programs. But you have to work really hard to put a security problem into a Java app instead of working really hard not to. And you get garbage collection, an awesome API, security, faster compiling, dynamic classloading/linking, easier coding, etc. People think Java takes a lot of memory, is slow, and ugly. But that's almost entirely because of the Swing GUI, which is not actually all bad. Replace with IBM's SWT and you'll see a dramatic difference.
Of course there are other languages besides Java that protect against security problems, but few that do so as completely and easily. If half the effort had been put into inplementing the Java APIs in open source as just on GTK/GTK+ then linux/bsd could do nothing for ten years and still be ahead of the rest.
The problem is that too many people aren't sufficiently careful, including the people who wrote the gets() I/O subroutine, so their Internet implementations typically resemble large quantities of foot-sized bright-colored bull's-eye signs marked "YOURS ARE HERE" and large numbers of guns and bullets of various sizes distributed to lots of people along with README files about attaching a sign to your foot to make sure it gets shot at.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Ok, every last subroutine validates every last input. Then what do you do? Suppose an input is invalid -- do you halt? Throw and exception? Patch the input and keep going? Keep going but make an entry in a log file?
It is excellent policy to be ultra paranoid about user input and to put "firewalls" between major program modules. But for every last subroutine to have its own error checks -- what if you have a top level subroutine that performs error checks and than passes validated results to helper subroutines? Do the helper subroutines need to repeat the checks?
I think there has to be some analysis of the data flows and designation of raw and filtered data flows, who does the filtering, and what assumptions or assertions can be made about filtered flows, and assignment of responsibility to do the checking.
In summary 1) defensive programming is not a substitute for good overall design, 2) there is a place for delegating responsibility for error checking and not chronically worrying about checked data.
You can certainly trust most system calls
What if I'm writing a program to display a work under copyright or trade secret restriction? How can I be sure that the system calls haven't been patched with the cooperation of the owner of the machine to leak the work to a third party?
I'm not an experienced software engineer, but I think it would be good practice to first give the HTML pages a "development" mode where no input validation is done at that stage. Even better, all input should be entered from text boxes, so that you can give your program arbitrary input without handwriting URLs much. Front-end checking (such as client-side javascript) can be added at the same time or after the core engine is debugged, but it is still better to preserve the "development" mode in the code (for testers and developers only, of course).
A monitor can be very useful without ever getting input from another process
Where to write output is itself input, which comes ultimately from another process, through redirection of file descriptors or through chroot() of the file system.
For example, when talking about being aware of \0 characters, you could mention something like this: a friend of mine once wrote a jukebox-like web application that allowed people to queue requests on his machine. There was a certain input parameter that was restricted to being the name of a subdirectory of his music collection. So, here's how he validated the input (more or less, since this is from memory):QUEUESCRIPT was then a shell script that was queued up and ready to go.
The consequences of passingto the script are left as an exercise for the reader.
Now talk about the hazardous effects of \0 characters - instead of saying just "system calls aren't fine with \0 characters", you can say "many perl functions that interact with the underlying operating system - such as open, diropen, or the file tests (-r, -x, etc.) - will consider a string only up to the first \0. Therefore, if you are relying on calls to the operating system for validation of any parameter, be certain to strip out \0 characters first. Better yet, allow in only what you expect, and no more. A simple s/[^ -~]//g (assuming only US ascii input is allowed) will go a long way."
A similar example of halting a buffer-overflow exploit might also be a good idea.
Also, having a typo in the "Know the Language" section (= instead of =~) really obscures your point.
What part of "Not wishing to start a flame war" was it that went rocketing over your head. All of it along with the rest of the post apparently.
:-)
DB quoting/filtering should be left to the Database API.
What like using seperate commands for different databases. You can look the rest up yourself, I'm bored of reading the PHP manual for other people.
Register globals is there to allow backwards compatibility. Everyone, especially php.net, shout from the hills about how insecure it is and how it shouldn't be used.
Magically sticking backslashes in front of everything is stupid for nontrivial apps and is likely to corrupt data.
You must be in management, argueing with me by repeating what I said re-worded. As I said if you do filter your input then magic quotes GPC gets annoying, good job its simple to turn off really.
The fact that PHP programmers are commenting here about the joys of addslashes vs SQL injection
Well pedantically they should be talking about mysql_escape_strings or whichever for the database they're using. Unpedantically, nobody apart from you is under the illusion that addslashes is the only way to escape SQL strings. Coming from an ODBC background by any chance?
You do seem to be suggesting that escaping SQL strings is a bad thing (tm), care to explain why?
Also, please can you actually specify some preference of alternative language so we can get a proper flame war going instead of just flirting about like this
"What like using seperate commands for different databases"
That's not what I'd call a database API. That's pretty silly and error prone. Take a leaf from JDBC or Perl DBI, etc, or come up with something better, not worse.
PHP should at least have a DB API that supports bind vars or placeholders. Or some other API so that data is automatically escaped the way it should be, and the risk of data being treated as commands is low/nil and it is easier to audit and find uncompliant DB related code. If the PHP developers can come up with an even better way to do it rather than bind vars and placeholders or whatever, then that'll be great.
Think about it. Is it really such a good idea or design to have "mysql_escape_string", "pg_escape_string" and "xxxx_escape_string"? What happens if you change databases? What happens if you forget to use it, how do you check?
In contrast if you require the use of prepared statements and bindvar stuff - the DB quoting is "included" with the DB call, so is harder to leave it out, and in most cases things become easier for the lazy programmer. Also, detecting programmers NOT doing the right thing is easier - any SQL statement with variables stuck in the string can be treated as not doing the right thing.
The more people using xxx_escape_string means a lot more legacy code to deal with later on when the PHP folk figure out how to do things better. I
"Register globals is there to allow backwards compatibility"
Wow. Great design putting register globals and other good old PHPisms there in the first place.
PHP deserves to be flamed till it's a bit better cooked. Right now while it is tasty it isn't really that safe for mass consumption.
I'm sure I'm not the only one who has tried getting the PHP people to come up with a proper DB API. Maybe if enough people make enough noise someone who knows PHP enough would fix things.
There is the pear DB module, but I see your point. A "PDBC" or similar would be a good move.
Personally I use a tiny SQL class that can be changed to allow different db to be used. Seems a tidier solution since the class is only about 20 lines.
Can't argue that register globals was a bad idea in the first place, but allowing users to still run older software needing it is a good idea. Shame they keep breaking other major functions in minor version increments though really, kinda makes keeping the stupid functions in redundant.
Pear DB looks decent enough. Actually it looks rather like Perl DBI. Getting more people to standardize on Pear DB and depreciating mysql_escape_string, addslashes etc would be a good idea.
Breaking major functions in minor version increments is another sign of how serious the PHP devs treat PHP. Not very reassuring - esp if you need to update PHP for say a security issue (we had to do a few of those). Only a few people will have tests for their entire code, and even then you never really know and have to go by faith...
Oh well. For some reason it seems like Java breaks lots of stuff every version anyway (1.1, 1.2, 1.3). Looks like Perl will start breaking major stuff in the next major version. Still, it's not often I have to bump up a minor increment with Perl. PHP and Java do look a bit "beta" in comparison (heck PHP looks alpha sometimes- at one point in 4.0 history it was release every 2 months for security probs).
Well I guess that's coz almost all the functions are bundled into PHP. Sure a security prob may not affect your code - doesn't use that feature, but try explaining it to a customer that you don't have to upgrade and worse - possibly break things. Esp when they're told truthfully by an independent auditor who says they are using a version of PHP that has security vulns. Gack.