How Do You Know Your Code is Secure?
bvc writes "Marucs Ranum notes that 'It's really hard to tell the difference between a program that works and one that just appears to work.' He explains that he just recently found a buffer overflow in Firewall Toolkit (FWTK), code that he wrote back in 1994. How do you go about making sure your code is secure? Especially if you have to write in a language like C or C++?"
Just get others to formally review it so if anything is found, there's collective responsibilty
I get mine verified by microsoft
I hit it with a shovel. If the code doesn't fall apart, I know it's pretty securely attached to my computer. If not, I add more epoxy glue.
... and then they built the supercollider.
Modern C++ provides a very nice and functional Standard Library which provides a lot of functionality and data structures such as strings, vectors, lists, maps, sets. While using these available classes does not completely rule out making programming mistakes related to buffer overflows and such, it at least minimizes the risk of producing stupid buffer overflow through badly done string handling. At least that's what my experience is.
Actually, the best thing would be not to use C or C++ at all, but that's where reality comes into play. Most developers don't even have the choice which language they should use, but that is predetermined by the employer and/or supervisor.
A monkey is doing the real work for me.
You introduce buffer overflows when you deal with buffers directly. In conventional C with its standard library you're encouraged to do this rather a lot, for example many of the string functions expect you to allocate a char buffer of big enough size and pass it in. The language's arrays are just syntactic sugar for accessing raw memory, with no bounds checks.
However you don't have to do it like this, especially not in C++ which has a safe string class (for example) as part of its standard library. Unfortunately C++'s vector type still doesn't do bounds checking with the usual [] dereferencing - you have to call the at() method if you want to be safe. But the general principle is: don't do memory management yourself, use some higher-level library (which exist for C too) and let someone else do the memory management for you.
You can write a C++ program and be pretty confident it doesn't have buffer overruns simply because it doesn't use pointers or fixed-size buffers, but relies on the resizable standard library containers.
-- Ed Avis ed@membled.com
0) Don't "roll your own" security unless absolutely necessary. Find someone else's implementations and work with those.
1) Design the code for security, code to that design. I've seen of security bugs creep into code because it was never designed to be secure.
2) Use static code checkers--such as Splint for C/C++ and FindBugs for Java--that look for security vulnerabilities.
3) Peer reviews/code audits. Sit down with your code (and have others who know how to look for security vulnerabilities sit down with your code) and do a full review.
Nothing is foolproof, but every little bit helps. It should be noted that all of the above also improve the overall quality of the code and reduce the number of overall bugs: Finding existent implementations of features that can be used can reduce maintenance and reduce bugs; Designing the code and putting it through a proper design review can catch a lot of logic problems and ensure that the code fits the requirements list--I've seen a huge number of synchronization bugs in Java simply because the author didn't know how to use synchronization properly; static code checkers find a lot more than just security bugs; and Peer Reviews/Code Audits can help isolate a variety of problems.
Integrate Keynote and LaTeX
Anyone who develops software knows the axiom - the number of bugs discovered in any piece of software is directly proportional to the amount of testing you perform on that software. From this, it follows that you can keep testing forever and at best only asymptotically approach bug-free code. Sounds hyperbolic, but I've observed it to be true in my experience. And as long as there are bugs, there are bound to be security bugs.
You can only minimize the risk that security issues will be found with any software. The best way to do this is to perform a rigorous code audit, preferably by security professionals. And if you can, make the software open source. You get a lot more eyes staring at it for free that way.
It's not that C/C++ is so insecure by itself, the problem is that programmers may not have used the best programming practices. There are plenty of libraries for handling strings and memory allocation in C, in C++ there are string and storage classes that do as much or as little checking as you need.
When you are an expert programmer there are places where you need more efficiency than the super-safe string routines can give you. It's the job of the expert to determine exactly how to balance efficiency against security, and only C/C++ can give you this balance.
By using valgrind. It's a virtual machine of sorts that runs your code and checks for any memory problems at all, including use of uninitialized memory. Combine that with thorough test cases, and you can be virtually assured that you have no memory errors in your C/C++ code.
However, security is a lot more than buffer overflows... but at least it brings you up to the relative security of Java, with speed to boot.
Open Your Mind. Open Your Source.
Every function should be designed with the assumption that its input is faulty, and should have safe failure modes for every possible value and all possible content. Any unsafe external libraries must be wrapped in handlers which verify the data being passed to them with a similar mindset. Do not EVER presume data will be of a certain form, or that a function will be used a certain way. If sequential routines are becoming long such that you cannot verify the accurate function or the absence of a buffer overflow immediately in your head, then stop and look for a way to break it down into smaller abstract pieces.
Combine this mentality with the usage of safe classes as datatypes whenever possible, so that you can wrap your input verification into the functionality of the classes. If prudent, wrap external library routines in classes which manage the interaction with them, and which verify the data content being passed.
Use test suites to test every component of your program, and be sure to include invalid and pathologically insane input in your test suites.
Do not trade security for efficiency. And don't forget to cross your fingers.
Grammar tip: "Effect" is a verb. "Affect" is a noun.
Um, how's that?
Your poor grammar has a chilling effect on me. If I were you, I'd find a way to effect an improvement in your knowledge. Luckily it affects me only a little. But the fact that so few seem to understand that these two words are both verb and noun leaves me of sad affect.
In the words of the great Donald Knuth, "Beware of bugs in the above code; I have only proved it correct, not tried it."
I let my code have evident, gaping security flaws and make them well known. This way people will never use it in situations where security matters.
regards,
The author of sendmail
You would sacrifice the flexibility and usefulness of the STL to get a class of bugs that are old and well-known? Hardly seems like a fair trade-off to me.
If you program using strictly functional programming, you can not only verify that your code is 100% secure, but you can even automate the process. (Preferably in a functional programming language such as Scheme, caml, Haskel, LISP, or Erlang; imperative languages make it very difficult/slow to do with functions what functional languages do very naturally and easily.) Purely functional code can be subjected to automated code auditing easily, whereas code auditing imperative code cannot be guaranteed to catch every bug and unintentionally available abuse.
Here's why, and why just about any computational problem can be solved using FP (functional programming):
Functional languages conform to lambda calculus, which has been shown to be Turing equivalent, which means that any program that can be computed on a Turing machine can be solved using Lambda calculus. So long as you program using strictly functions, your program can be verified according to the rules of lambda calculus, and the verification would be as sure as a mathematical proof. This is the only sure way I know of really knowing with mathematical certainty that your application is secure.
Pure functional programming has no assignment statements; there are no state changes for you to keep track of in your program, and in many cases abuses resulting unintended changes of state are the root of security problems. This is not to say that there is no state in functional programming; the state is maintained through function call parameters. (For example, in an imperative programming language, iteration loops keep track of a state variable that guides the running of the loop, whereas a functional program never actually keeps track of state with a variable that changes value; a functional program would carry out iteration by recursion, and the state is simply kept as a parameter passed to each call of the function. No variable with changing state is ever coded.)
Since functional programs lack assignment statements, and assignment statements make up a large fraction of the code in imperative programs, functional programs tend to be a lot shorter for the same problem solved. (I can't give you a hard ratio, but depending on the problem, your code can be up to 90% shorter when described functionally.) Shorter code is easier to debug, which helps in securing code. The reason functional code is so much shorter is that functional programing describes the problem in terms of functions and composition of functions, whereas imperative code describes a step by step solution to the problem. Descriptions of problems in terms of functions tend to be far shorter than algorithmic descriptions of solving them, which is required in imperative code.
Here's the biggest benefit of managing complexity with functional programming: as a coder, you NEVER have to worry about state being messed with. The outcome of each function is always the same so long as the function is called with the same parameters. In imperative programming as done in OOP, you can't depend on that. Unit testing each part doesn't guarantee that your code is bug free and secure because bugs can arise from the interaction of the parts even if every part is tested and passed. In functional programming, however, you never have to deal with that kind of problem because if you test that the range of each function is correct given the proper domain, and pre-screen the parameters being passed to each function to reject any out-of-domain parameters, you can know with certainty where your bugs come from by unit testing each function.
If you need to guarantee the order of evaluation (something that critics of FP advocates sometimes use to dismiss FP advocacy), you can still use FP and benefit: in functional programming, order of evaluation can be enforced using monads. Explaining how is beyond the scope of a mere comment though, but in any case, if you need really reliable code, consider using a functional programming style.
I can't do justice to the matter here; for more information, see th
To summarize, here's how you verify with mathematical certainty that a functional program is secure:
That's the gist of it. Anything more on this topic, such as automatic code auditing with the certainty of mathematical proofs (by means of lambda calculus proofs) is beyond my expertise. I just know that it's possible to truly secure functional code with mathematical certainty, whereas with imperative code, you can only be sure that your code has not yet failed or exposed a rare bug or failure condition.
Yes, sure, if you use STL, you need not worry about getting the buffer size wrong. And that's about it - container indexing is not bound-checked (unless you use at() instead of operator[] - and that's about the only instance of run-time safety check I remember seeing in STL!), iterators can go outside their container without notice, or can suddenly become invalid depending on what their container is and what was done to it. Even leaving library issues aside, there are some nasty things about the language itself - it's just way too easy to get an uninitialized variable or a class member, or to mess up with the order of field initializers in constructor.
This is not to say that C++ is not a good language. All of the above are features in a sense they are there for a reason - but they certainly don't make writing secure software easier.
We all know the answer if we've studied computer science. The problem is that the answer is boring, lots of work and totally non-hip.
It's specifications, pre- and post-conditions, all that "theoretical bullshit" we learned in university. It's just that writing code that way is very un-exciting, and that's a vast understatement.
Assorted stuff I do sometimes: Lemuria.org
Make it part of the critical path in music DRM. Then you know it's not secure.
Not sure about the flip-side, though.
For high-integrity stuff, we use SPARK (http://www.sparkada.com/) - a design-by-contract subset of Ada95 that is entirely designed-from-scratch for verification purposes. :-) )
The verification system implements Hoare-logic and is supported by a theorem prover. Buffer Overflow is only one of many basic correctness properties that can be verified. Properties that can be verified are only limited to what can be expressed as an assertion in first-order logic.
SPARK is a small language (compared to C++ or Java...) but the depth and soundness of verification is unmatched by anything like FindBugs, SPLINT, ESC/Java or any of the other tools for the "popular" languages.
(If you don't know or care what soundness is in the context of static analysis, then you've probably missed the point of this post...
- Rod Chapman, Praxis
today is spelling optional day.
Once you go outside of a container, you already have a fatal error and the appropriate response is to crash (albeit gracefully if possible). The problem isn't so much that the program crashes, but rather that the program may consider data outside of bounds as valid memory, thus allowing buffer overflows and undefined behavior to occur.
The difference between pure C/C++ and the STL is that something like strcmp can create a rather subtle sort of buffer overflow error, whereas buffer overflows involving STL containers are generally easier to avoid and detect. For that matter, if you use the STL algorithms library to its full potential, you may find that you hardly ever need to use explicit indexing or iterators other than begin() and end().
Those bugs aren't harder to track down than "old-style" bugs, in fact I think they're vastly easier to track down than, say, a wild pointer. The difference is that you're less experienced at dealing with the new problems, so they seem harder to you. With time and practice, you'll see through copy/reference errors quickly. In the meantime, a little discipline can cover your lack of experience -- never store raw pointers in collections, always "objects". If you don't want to create copies, then store objects of a smart pointer class. In fact, avoid ever using raw pointers at all. *Always* assign the result of a 'new' operation to a smart pointer (auto_ptr works for a surprisingly large set of cases, but you may have to get a reference counted pointer type or similar for others -- the BOOST library has some good options if you haven't already rolled your own).
If you really run into different behavior with different compilers, then at least one of the compilers is buggy. That does happen, but it's a lot rarer today than it was a few years ago. When you find that situation, wrap the tricky bit behind another abstraction layer and implement compiler-specific workarounds so that your application code can just use the abstraction and get consistent behavior. In most cases, someone else has already done this work for you. Again, look into BOOST.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
And that's exactly why so many things are "implementation defined" or "undefined". Many real-world users of C++ demand that, for instance, vector::iterator be a typedef for a raw pointer for efficiency reasons. Other equally-important users would prefer an iterator type that guarantees sensible behavior in the face of real errors. The ISO standard allows for both behaviors by conforming C++ implementations.
There's something attractive about the Java and C# languages having all constructs so well-defined. But both of those languages could afford not to support real hardware. Both target abstract machines and are happy with the results. C++ can afford no such conceit: it thrives in high-performance, customized, and otherwise exotic environments.
It's always a long day... 86400 doesn't fit into a short.
Let's not forget their wonderful documentation! Complete and accurate API documentation is absolutely necessary for writing secure and reliable software. And of course the programmers should actually read the documentation and check all the details of the API calls they are using (return values, etc...)!