Is Code Verification Finally Good Enough?
Jonathan asks: "As someone who has been following the development of software verification technology, a recent trend has intrigued me. It seems that the formal method people have finally come down off their high horse and are offering code verification as just another tool. This approach shows up in recent Java and C# based code verification tools that aren't aimed at 'proving correctness' so much as finding potential errors. Now it seems that such an approach is beginning to find its way into XP methods[pdf] as another verification tool to supplement unit tests. Given the current speed and effectiveness of tools like the Spec# verifier, is code verification via automated theorem proving finally going to make its way into the mainstream?"
What does "good enough" mean? Good enough for it to be included as a general-use software development tool?Good enough to guarantee that released code is bug-free? Good enough to guarantee that released code has no glaring security-affecting errors?
It makes a difference which "good enough" you're asking for.
(The answers to those questions, to channel one Urban Chronotis, are yes, no, and maybe. Not necessarily in that order)
Reality has a conservative bias: it conserves mass, energy, momentum...
I think it's legitimate to expect verification tools to make their way into mainstream software development, but not as any sort of "catch all".
Developers will have to analyze the specific tools and decide if it is a good supplement to current verification efforts or if it can replace some steps. I think though once it worms its way into mainstream (if only light) use, the tools will continue to improve and give us more options, and become a more integral part of verification.
I know from work I've done, that the next revision of the FAA's safety standard for software (software for commercial aviation) may include new provisions to allow formal methods tools to replace some steps of the old verification process.
Mankind's ability to write software is far in advance of mankind's ability to figure out what it does or does not do.
If Microsoft ship 100 million copies of Windows Vista to 100 million people on public Internet, where the people have mutually-incompatible agendas, then at least some of the people will end up unhappy.
I dunno, as more tools to verify code become available, businesses will hire cheaper and sloppier programmers and just force all the code through the verifiers and hope for the best.
These aren't tools to make the best code better, they are tools to make the worst code okay.
Finding errors /= Proving correctness. Part of proving a program, imho, should include making sure the program only does what it claims to do. No more. (Say goodbye to tag-a-long malware!)
Of course, this is a pretty idealistic outlook. We could all just program in ML, instead.
Good enough to yield some benefits in chasing down certain classes of bugs on large application bases? Sure. See coverity, and their open source static analysis project. See Microsoft's reports on the effectivity of such tools in their work, at this year's FLOC, for example. Spec# is a great concept, and fast enough in execution for small demos, at least. I'd love to be able to program that way in a C-ish language, and have my IDE tell me I'm about to screw up some invariant if I don't fix some newly added code. But I've got my doubts how well that would scale to good old messy C code.
Good enough for practical use? Not yet. Questionable if it will ever be: on a psychological level many developers hate being told their code is a pile of crap by other people, nevermind being told that by buggy/incomplete tools spitting out pages of warnings. See mini-msft's blog for Microsoft employees ranting about the PREfast fetish on management levels, the buggy static verification tools causing more work to everyone, etc. Heck, see most build logs of C/C++ projects on Debian buildds and ask yourself who's ever going to bother fixing those ever mounting numbers of gcc compiler warnings, and who's going to pay for that work, and, most importantly, who's going to pay to fix the bugs consequently introduced by fixing those warnings.
On the other hand, if one can manage to strike a balance between usefulness of the tools, and their usability, they can be very nice. See FindBugs, and see PMD, for example. But that's all for Java, and static analysis of Java is somewhat nicer task than dealing with the many fun issues of C or C++.
cheers,
dalibor topic
The problem with any verification tool is that it can't know what you expect the program to do unless you tell it. Why should this be any more correct than the program itself? In other words, if you can write a description of what the program should do that is more accurate than the program, then that description should replace the source code.
Intron: the portion of DNA which expresses nothing useful.
Since when did this move from the realm of undecidable problems into solvable with a Turing Machine?
The paper you linked to -- I just finished reading it. The "methodology" they provide has more comedic value than anything else. The steps in his methodology are counter-intuitive and therefore stupid. He wants us to iterate over each requirement, re-writing a function as the most simple program which passes the tests? What possible reason for this would we have other than an exercise in futility? Writing code we *KNOW* has no value with the intent of re-writing it later? Futile.
Perhaps you could accept this premise if the methodology scaled for larger problems -- but clearly it does not and can not. At this point I suspect you are the author of this paper.
Religion is a gateway psychosis. -- Dave Foley
Before tools can check the code (eg, static analysis) we need to know what constructs they need to flag. The formal verification people will claim that checking a program's source against its specification is the solution. However, that is a very expensive and time consuming process. There is an ISO group looking at vulnerabilities in programming languages.
- Do theorem provers prove that what the user told you they wanted made sense?
- ... that what they wanted would solve the problem they were trying to solve?
- ... that the problem they thought they were trying to solve was really the problem they *needed* to have solved?
... that what they wanted when you started was still what they wanted when you were done?
Uh... no.But they can probably be considered to be a newer and better version of lint, and that's a good thing.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
The main reason program verification didn't catch on was that it was hopeless for C and C++. The semantics of those languages were so messy that formalizing them was nearly hopeless.
Java and C#, however, are good enough. (So were Pascal, Modula, and Ada.)
Here's the manual for the Pascal-F verifier, a system written by a team I headed back in the early 1980s. This was a proprietary system done internally for Ford Motor Company. Take a look at the example real time engine control program beginning on page 155. It was painfully slow back then; it took 45 minutes of VAX 11/780 time (1 MIPS) to verify that program from a cold start. Today, it would take about a second.
What's being proved in that example? First, that there are no subscripts out of range or arithmetic overflows. Second, that all loops terminate. (Yes, you can prove that for most useful programs; the halting problem applies only to pathological programs.) Third, that the following constraints hold:
Useful stuff, the conditions needed to keep the engine running.
This is "design by contract" with teeth. Each function is checked to insure that it always satisfies its exit conditions if its entry conditions are satisfied by the caller, and that the function doesn't overflow, subscript out of range, or fail to terminate. Each call is checked to insure that its entry conditions are always satisfied. The end result is a guarantee that those properties hold for the whole program.
This is a very valuable check. It insures that caller and callee are in agreement on how to call each function. That's the cause of a huge number of software bugs - the caller made some incorrect assumption about the function being called, or the function didn't check for something which it needed to check. Both of those can be statically machine checked.
It's not easy to get a program through formal verification with a verifier like that one. The verifier does almost all the work on easy sections of code, but where correctness depends on anything non-trivial, you have to work with the theorem prover to get the proofs through. This isn't easy. The DEC Java checker and Microsoft's Spec# checker aren't as hard-line.
Non-Linux Penguins ?
How could a program ever find this bug:
if (vote == "democrat")
{
republican++;
}
There some programming lanugages (Ada, SPARK and Eiffel spring to my mind) which have the verification build in (to some degree). Just in case you wonder how that works here my favorite Ada example:
.. 31;
type Day_Of_Month is range 1
Neither Java nor C# allow that kind of controll over integer types. Mind you there is A# (Ada for CLI).
Martin
Well at least Ada still is: http://www.praxis-his.com/sparkada and very successfull as well.
And Ada isn't dead, it just got another rejuvenation in the form of Ada 2005: http://www.adaic.com/standards/ada05.html.
So the use of past tense was not appropiate.
Martin
Who is using Ada?
If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
Verification is speeding along because of managed languages. Verification for managed languages like Java and C# is easier to implement. Not so much for C/C++ where pointers are raw. (Note: C# allows unmanaged code but by default it's managed and I assume most developers work in C# in managed mode, except where performance or interopability is required)
http://www.adacore.com/home/ada_answers/lookwho
;-) and for our "mission critical" applications we have no intention to change.
I work for one of the named companies
But Ada does not expose the use of Semaphores - The task model of Ada is based around Rendezvous.
There are some attempts of Rendezvous based threading for other languages but it proves difficult since you need some inter-task parameter passing mechanism. Remember that a Rendezvous is a bit like a function/procedure call only the parameters are passed from one task to another.
And parameter passing is a language feature and quite difficult to implement as a library. I know because I tried using by using Observer/Notifier pattern and adding the parameters to the message. But it was never as elegant or worked as flawless an Ada Rendezvous.
Martin
``Is Code Verification Finally Good Enough?''
Please verify that
(define (foo n)
(cond
((= n 1) 1)
((even? n) (foo (/ n 2)))
(else (foo (+ (* n 3) 1)))))
returns 1 for any positive integer n.
Please correct me if I got my facts wrong.
Anyone can come up with a list of algorithms for which the answer to all inputs is unknown, of which the 3n+1 problem is one (so far). Even though we can't prove that the algorithm halts we can still prove that it's well typed, which in practical terms is much more important because it prevents buffer overflows and any number of other programming errors. In addition, we can also prove that the algorithm actually does solve the 3n+1 problem and not some similar but slightly different one. For instance, if you weren't using a LISP variant the algorithm would have to handle numbers larger than machine integers appropriately, especially because inputs within the machine integer range can easily grow much larger over the course of the algorithm.
Most software today only uses algorithms that halt with known time and space complexity, which makes software verification tractible in practice. The problem is that most programmers do not have the necessary training and experience to write proofs for complex programs, and are unable to simplify their programs into small provable functions and modules. Modern operating systems also impose a lower bound on the usefulness of correctness proofs because they have not been proved correct. Even if an operating system was proved correct, all the underlying hardware would need formal verification, and even then there would only be a statistical guarantee of correctness due to random uncorrectable hardware errors. At least it's always possible to increase the reliability of hardware with redundancy and error correcting techniques, but software needs to be formally verified at some point.
``Even though we can't prove that the algorithm halts we can still prove that it's well typed, which in practical terms is much more important because it prevents buffer overflows and any number of other programming errors.''
For preventing buffer overflows, you will have to prove that some value being used as an array index is an integer in the range from 0 to the size of the buffer minus 1. In this case, the problem is to prove that the result is an integer in the range 1 to 1. Most type systems aren't expressive enough for these kinds of proofs, and those that are suffer from the halting problem. Either way, type checking doesn't cut it.
Please correct me if I got my facts wrong.
For preventing buffer overflows, you will have to prove that some value being used as an array index is an integer in the range from 0 to the size of the buffer minus 1. In this case, the problem is to prove that the result is an integer in the range 1 to 1.
Generally, proofs for this are expressed as loop invariants inside of functions, and as predicates on the entry and exits of functions. If a function takes an array index as a parameter, there is a predicate that 0<=i<n, and it is the responsibility of the caller to prove this condition. If an array index is returned, the function must prove 0<=i<n before it returns. This can usually be done at compile/proof time, but if necessary dynamic runtime checks can be added to simplify the proofs. Proving that the 3n+1 problem halts or not is quite different from proving memory-safe array access.
Most type systems aren't expressive enough for these kinds of proofs, and those that are suffer from the halting problem. Either way, type checking doesn't cut it.
That depends on the proof system, I suppose. Proofs in LF for instance are always decidable via typechecking of the proof itself, so if a proof can be represented in a logic using LF, the halting problem will not hinder proof checking. LF can encode second order logic just fine, so I don't think there are any big limitations on what algorithms can be proved. Generating a proof may prove difficult or impossible for certain algorithms, but that is strictly due to our limited knowledge, not a limit in the logical framework.
Ok, I have a program that, if it halts, causes a buffer overflow.
Otherwise, it doesn't.
Can your type system prove that there is no buffer overflow?
"Any connection between your reality and mine is purely coincidental." -Slashdot
Formal Verification has become a staple of hardware design. It basically comes in two forms: equivalence checking (used a LOT to compare two similar designs) and model checking (which is slowly making inroads; used to compare a design against a set of rules, another model, etc).
I have often wondered why more formal techniques aren't deployed in the software world.
As has been mentioned here, the thing about formal is that you will often "stump it" and so you may need to "help it" in complex, real world designs. But it does find many silly mistakes, not necessarily (or even usually) design mistakes, but version mismatches, tool bugs/config errors, etc.
Equivalence Checking is VERY useful for checking last minute changes. For example, when adding an option to the design, it is quick and easy to prove that the design is unchanged when the option is not enabled. We also use it to check compiled (gates) versus source (RTL) versions of designs.
I could imagine both models being useful for software developers. For example, formal equivalency tools could help prove that the "golden version" of a function is functionally equivalent to the "performance tuned" version (hand-tuning the code, compiling it, etc.)
Formal Model Checking is harder because writing formal rules is a painstaking process. You'll never realize how incomplete "human" specifications are until you try to constrain a formal model checker. It will find a boatload of failure cases that are "obviously not going to happen" (so obvious that they weren't included, and need to be added, usually iteratively).
Model checking has seen an uptick lately through shipping tools with lots of default rules included (making it a form of "super LINT"). At which point adding your own rules/specifications is something you do at your leisure.
It's these kinds of point uses, versus a theoretical complete verification framework, that formal methods can deliver today.
-netchipguy
Because too many software "craftsmen" think "methodologies ... are all bullshit". They believe they are artists, not scientists, and reject rigor as untried academic exercises that stifle creativity.