Microsoft Research Touts Its 'Checked C' Extension For 'Making C Safe' (microsoft.com)
Microsoft Research has pre-published a new paper to be presented at the IEEE Cybersecurity Development Conference 2018 describing their progress on Checked C, "an extension to C designed to support spatial safety, implemented in Clang and LLVM."
From "Checked C: Making C Safe By Extension": Checked C's design is distinguished by its focus on backward-compatibility, incremental conversion, developer control, and enabling highly performant code... Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, unchecked pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations.
Checked C's bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-or-nothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.
The extension is enabled as a flag passed to Clang -- the average run-time overhead introduced by adding dynamic checks was 8.6%, though in more than half of the benchmarks the overhead was less than 1%. They also note that from 2012 to 2018, buffer overruns were the leading single cause of CVEs.
Microsoft Research says they're now evaluating Checked C, formalizing a proof of its safety guarantee -- and developing a tool to semi-automatically rewrite legacy C programs.
From "Checked C: Making C Safe By Extension": Checked C's design is distinguished by its focus on backward-compatibility, incremental conversion, developer control, and enabling highly performant code... Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, unchecked pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations.
Checked C's bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-or-nothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.
The extension is enabled as a flag passed to Clang -- the average run-time overhead introduced by adding dynamic checks was 8.6%, though in more than half of the benchmarks the overhead was less than 1%. They also note that from 2012 to 2018, buffer overruns were the leading single cause of CVEs.
Microsoft Research says they're now evaluating Checked C, formalizing a proof of its safety guarantee -- and developing a tool to semi-automatically rewrite legacy C programs.
clang/LLVM had been developed in tandem with, practically for a project for making C code safer in the first place: SAFECode.
"We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
There's a difference. CompCert/Verified C is concerned with formally verifiable source code and provably correct compilation, which means pointers are bad.
CheckedC doesn't do any of the above, it is only a secure pointer system. Microsoft's Z3 handles formal verification.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
How many errors are due to C syntax, e.g. "=" vs "=="?
At what point do we finally decide that C just wasn't the best choice for large scale long lived systems?
(And don't tell me about "experts don't make those mistakes". See, for instance https://www.researchgate.net/p... )
What's needed is not independent comparison (well, that's needed, but that's not the problem). What's needed is a license that guarantees that there's no copyright or patented code in the result. I.e., a guarantee that the generated code can be used under any license of your choice without legal danger from either Microsoft or from any company with which they have or have had a business relationship unless the source code compiled by a standard C compiler would have the same problem.
I think we've pushed this "anyone can grow up to be president" thing too far.
Ok, I would agree with that. So, license check then a benchtest. IANAL, but I can do the latter adequately even if I can only do a cursory skim for the former.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Can anyone compare this to what Embedded has been doing for a while in functional safety?
https://en.wikipedia.org/wiki/...
It's why Mathworks makes stupid money off of Polyspace Static Analyzer.
https://www.mathworks.com/prod...
https://www.mathworks.com/prod...
On top of that there's also the Barr Group's Embedded C Coding Standard.
https://barrgroup.com/Embedded...
It is MS Research. MS proper ignores them routinely.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Pretty major error right in the introduction:
> Legacy programs would need to be ported wholesale to take advantage of these languages,
Not true for Rust. C libraries and applications can be ported to Rust incrementally and, in fact, some examples have already been done and shipped! See Federico's work on librsvg for example: https://people.gnome.org/~fede...
CompCert provides a formally verified subset of C and has existed since 2008
AT&T's Cyclone is a bit older than CompCert but appears to have fizzled out 12 years ago
https://en.wikipedia.org/wiki/...
Pain is merely failure leaving the body
It's M$ not matter what you check, to pump up this quarters profits according to some dick spreadsheet, if they change what ever they choose to change and it makes it more insecure, they will change it. They have pretty much zero reliability, touting stuff, dumping it when the profitability is not there or there is greater profitability elsewhere, leaving users in the lurch, not a few times but a whole lot of times. What ever it is they are pushing today, will be different in a years time and most often worse for the user. Specifically the more users that pick it up, the worse it becomes, the more M$ dick spreadsheeters think they can leverage it to increase profits and basically bugger the end users. Often end users scream, abandon the product and the M$ abandons it looking for the next lock in scam to leverage.
They have not been a part of the solution for years, they and their ilk largely are the problem. They will sell your privacy, security and copyright the very second, well actually before they a fully certain they can get away with it, greed pushes them to strike early and as it turns out normally fail. They are an extremely unreliable supplier and not to be trusted.
Chaos - everything, everywhere, everywhen
Do you know how many compilers Microsoft have written over the years? Can you name a single instance of any one of them producing code that has ever placed a user in legal trouble due to licenses, copyrights or patents?
Twenty years ago people claimed that Microsoft were going to use submarine patents to slap infringements on people who used their compilers, and yet not one single time has this happened.
C is already safe
Is it? Let's have a look at a security analysis of applications written in C on FreeRTOS. It seems like they're riddled with flaws. Saying "just write better code" lacks real world perspective.
If I'm reading this right, it's a "smart pointer" and so of course you're going to take a run-time penalty. Languages like Haskell abstract things like array access and are immune from the kind of problem due to a "what you want" as opposed to "how you do it" mentality that C has. I think smart pointers, or anything that imposes a needless run-time penalty are a non-starter. We've got better approaches.
Someone not capable of writing a totally secure C program isn't going to be able to guarantee a totally secure python program either. It may have less errors.
> Why wouldn't function_that_uses_pointer() protect itself by doing the pointer check internally?
Because
for ( x=0; x++; x That way function_that_uses_pointer() wouldn't have to worry about someone remembering to do the check
If you want to hack together quick scripts without ever thinking about the possibility of either errors occuring, or some item simply not being present, perhaps VBA, Python, or Pascal is for you. C is for systems programmers who already need to be aware that they can't just make assumptions that every system and every situation will be just like the test they ran.
Friggin Slashdot ate my post.
if ( thegamma = get_gamma() ) {
for ( y=0; y++; y < pixelheight ) {
for ( x=0; x++; x < pixelwidth ) {
do_gamma( x, y, thegamma );
}
}
}
It's kinda silly to check a million times that thegamma isn't null. Checking once is quite enough.
I know loops might be a little bit too advanced of a concept for some people, but advanced programmers use them once in a while. :)
Here's a fun function for you:
void strcpy(const char *src, char *dest) {
while (*dest++ = *src++)
;
}
Yep, that copies a string.
I've been writing C programs for 3 decades, and I have made plenty of mistakes along the way. Occasionally because of using the wrong pointer, but most of them were simply because I got the algorithm wrong. None of these "safe" languages would have prevented the 2nd kind of error.
Well, this isn't quite the same comment, but if the language is compatible with C, or some subset of C, couldn't you compile the "safe version", run your tests, and then, when you were satisfied, compile with standard C? Surely the answers ought to be guaranteed to be the same if there's no error.
Only for real world inputs that match your test inputs. If you compile with standard C you lose the run time checks, array bounds for example. If these check only have a 1% penalty then for many apps that might be quite acceptable.
No one's arguing a language solves your algorithm problems. But why have security problems in addition to your algorithm problems? As discussed in the presentation, the flaws in these applications were all the "usual suspects". When you keep seeing the same problems again and again it's time to use better tools, including better languages.
Bugs are not always coding errors, bugs may also be in the design and correct code of any language can manifest such design bugs
C is fast, but old and "unsafe" (better: difficult-to-master or non-intuitive or attention-demanding). Many other languages are slower, but newer and "safer". Practically speaking, these are the two extreme points of the performance vs. safeness trade-off. You can choose what aspect is more important for you and choose a language accordingly, but you cannot have everything.
Personally, I think that all these efforts to come up with the new safer version of C should focus on accepting the reality (do you want C-like fast? Use C) rather than aspiring to what doesn't seem possible. What about coming up with ways to ease the communication between C and the given programming language, such that anyone could eventually rely on pure C (or C++) when required? Or even coming up with a friendliness wrapper allowing to rely on whatever programming language to write C (e.g., you create the algorithm in language X and it generates the closer, "safer" version of C).
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
it's convenient for functions whose result you want to check before going further.
e.g. with file operations:
if (NULL == (in = fopen(filename, "r"))) { /* process input from in */
fprintf(stderr, "cannot open input %s\n", filename); exit(2);
}
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
to a whole ms ....
> People are so obsessed with wrangling the last ounce of performance out of application programs
You made coffee come out my nose!
Have you actually *used* a major application lately? I'd say performance is far down the list.
> Then you (clearly a C fanboi) writes code like this:
> while (*dest++ = *src++);
I actually didn't write the C library. I've written several Perl libraries; you'll find my code in Apache and Solaris, but Roland McGrath wrote glibc.
> Yes, which is why it's important for the language to mitigate that as much as possible.
What's important very much depends on what software is being written. In a typical Excel macro, sure go ahead and check the domain of the value each time it is accessed. It'll be ten times as slow as not checking, but one shouldn't expect the project manager to manually check domains in his VBA. It's good and right for VBA to "mitigate it as much as possible".
In a graphics driver, speed is top priority. It would be a mistake to take ten times as long to execute to "mitigate as much as possible".
Not everything is a shell script.
Not everything is a shell script.
That's right. Some things, for example, are Pascal programs, which is a better option than C. You should take the time to learn some Pascal. You'll come around to the same opinion.
I learned Pascal 15 years ago. It's an okay language.
At the time, Pascal was competing with Visual Basic. VB won.
The world could have chosen Pascal over VB, but they chose VB. In the 1970s, Pascal competed with C. The world chose C.
Now the industry is going through a phase in which people aren't distinguishing between beginner languages that are designed to be easy vs professional, enterprise-grade tools. Legos are easy, and I good way to learn some basics. You shouldn't build your house out of Legos. The same is true when building information systems. The simplest tools may not be the best things to build your enterprise with.
You don't know that. If you have more brain left to do the algorithm, or do the algorithm first in Python or on paper, you perhaps had avoided the mistake.
The problem basically is you can only shuffle 7 - 9 topics in your short term memory. That is basically your brains registers. If three of them are occupied by useless low level stuff, only the rest can be used for the algorithm.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
About 30 years ago I once remarked to my boss that COBOL would have been a better choice than C for our application. He replied that was probably correct but he would then have to cut our pay in half.
If you ask specifically about compilers, I can only think of a couple of instances, and they didn't really end up in court. (In one case that was because Sun sued MS over J++ before the event.)
If you ask more generally about code produced by MS, there have been lots of instances where the code could only legally be used linked with certain licenses...and it wasn't always clear which ones.
I think we've pushed this "anyone can grow up to be president" thing too far.
The communication happens when you CALL strcpy. Inside of glibc, efficiency rules.
Okay, I gave you the link, so when grow up you can read how Pascal is implementing string copy (in Pascal). "Intrinsic" in this respect just means it's included in the built-in library - it still has to be written, silly. The CPU. doesn't have a "copy a Pascal string" opcode, so someone has to write it. That would be guys like me.
I've been writing C programs for 3 decades, and I have made plenty of mistakes along the way. Occasionally because of using the wrong pointer, but most of them were simply because I got the algorithm wrong.
Security bugs are typically exploited, and not found in ordinary usage. When you get the algorithm wrong is usually obvious right away.
We all know how to CALL a string copy function.
The discussion is how you would IMPLEMENT a simple function.
I put the implementation from Free Pascal in another post if you want to see it. It's about 15 lines.
The operator is syntactic sugar around a function call.
You can see the actual function in the source zip above.
> an assignment operation is not a function call
What looks like an assignment isn't. An assignment should mean both variables point to the same string.
It's creating a new string, copying all the data from the source string to the destination, then finally assigning the new string to the destination variable.
There's a string copy function which accepts the source string as an argument and returns a new string with the data copied from the source.
That's a fair point, to the caller it appears analogous.
To the CPU, they are vastly different. The CPU assigns integers, with a single instruction. Copying a Pascal string is most often hundreds of CPU instructions, there is a function that does that. The String type in Pascal is far more complex than the x bits that comprise an integer. Perhaps most importantly, the variable doesn't hold the value, it's a pointer to memory that is later allocated elsewhere. The function takes the Pascal String type and derefences it to the actual vue in memory, after performing various checks, THEN it asks the CPU to copy a little bit at a time. Contrasted with a pure I assignment, which translates directly to CPU instruction.