Microsoft Research Touts Its 'Checked C' Extension For 'Making C Safe' (microsoft.com)
Microsoft Research has pre-published a new paper to be presented at the IEEE Cybersecurity Development Conference 2018 describing their progress on Checked C, "an extension to C designed to support spatial safety, implemented in Clang and LLVM."
From "Checked C: Making C Safe By Extension": Checked C's design is distinguished by its focus on backward-compatibility, incremental conversion, developer control, and enabling highly performant code... Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, unchecked pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations.
Checked C's bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-or-nothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.
The extension is enabled as a flag passed to Clang -- the average run-time overhead introduced by adding dynamic checks was 8.6%, though in more than half of the benchmarks the overhead was less than 1%. They also note that from 2012 to 2018, buffer overruns were the leading single cause of CVEs.
Microsoft Research says they're now evaluating Checked C, formalizing a proof of its safety guarantee -- and developing a tool to semi-automatically rewrite legacy C programs.
From "Checked C: Making C Safe By Extension": Checked C's design is distinguished by its focus on backward-compatibility, incremental conversion, developer control, and enabling highly performant code... Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, unchecked pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations.
Checked C's bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-or-nothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.
The extension is enabled as a flag passed to Clang -- the average run-time overhead introduced by adding dynamic checks was 8.6%, though in more than half of the benchmarks the overhead was less than 1%. They also note that from 2012 to 2018, buffer overruns were the leading single cause of CVEs.
Microsoft Research says they're now evaluating Checked C, formalizing a proof of its safety guarantee -- and developing a tool to semi-automatically rewrite legacy C programs.
clang/LLVM had been developed in tandem with, practically for a project for making C code safer in the first place: SAFECode.
"We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
CompCert provides a formally verified subset of C and has existed since 2008
A Microsoft extension that isn't evil. I'm suspicious. For pointers, Hoard and DieHard are pretty decent malloc alternatives, and smalloc and secmalloc, have their fans, but provable security and robustness would help a lot.
I'd want an independent comparison before I'd trust Microsoft on this, but if it is shown to be good, I'm willing to concede that - like Smeagol - there's a tiny corner of good in that vast blackness of Gollum.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
How many errors are due to C syntax, e.g. "=" vs "=="?
At what point do we finally decide that C just wasn't the best choice for large scale long lived systems?
(And don't tell me about "experts don't make those mistakes". See, for instance https://www.researchgate.net/p... )
CPU? Or QuickBasic?
Can anyone compare this to what Embedded has been doing for a while in functional safety?
https://en.wikipedia.org/wiki/...
It's why Mathworks makes stupid money off of Polyspace Static Analyzer.
https://www.mathworks.com/prod...
https://www.mathworks.com/prod...
On top of that there's also the Barr Group's Embedded C Coding Standard.
https://barrgroup.com/Embedded...
Will this cause another Debian/Valgrind/OpenSSL crisis?
Pretty major error right in the introduction:
> Legacy programs would need to be ported wholesale to take advantage of these languages,
Not true for Rust. C libraries and applications can be ported to Rust incrementally and, in fact, some examples have already been done and shipped! See Federico's work on librsvg for example: https://people.gnome.org/~fede...
In 40 years maybe they'll finish the language projection for WinRT.
C is already safe
Is it? Let's have a look at a security analysis of applications written in C on FreeRTOS. It seems like they're riddled with flaws. Saying "just write better code" lacks real world perspective.
If I'm reading this right, it's a "smart pointer" and so of course you're going to take a run-time penalty. Languages like Haskell abstract things like array access and are immune from the kind of problem due to a "what you want" as opposed to "how you do it" mentality that C has. I think smart pointers, or anything that imposes a needless run-time penalty are a non-starter. We've got better approaches.
Someone not capable of writing a totally secure C program isn't going to be able to guarantee a totally secure python program either. It may have less errors.
> Why wouldn't function_that_uses_pointer() protect itself by doing the pointer check internally?
Because
for ( x=0; x++; x That way function_that_uses_pointer() wouldn't have to worry about someone remembering to do the check
If you want to hack together quick scripts without ever thinking about the possibility of either errors occuring, or some item simply not being present, perhaps VBA, Python, or Pascal is for you. C is for systems programmers who already need to be aware that they can't just make assumptions that every system and every situation will be just like the test they ran.
Friggin Slashdot ate my post.
if ( thegamma = get_gamma() ) {
for ( y=0; y++; y < pixelheight ) {
for ( x=0; x++; x < pixelwidth ) {
do_gamma( x, y, thegamma );
}
}
}
It's kinda silly to check a million times that thegamma isn't null. Checking once is quite enough.
I know loops might be a little bit too advanced of a concept for some people, but advanced programmers use them once in a while. :)
Here's a fun function for you:
void strcpy(const char *src, char *dest) {
while (*dest++ = *src++)
;
}
Yep, that copies a string.
I've been writing C programs for 3 decades, and I have made plenty of mistakes along the way. Occasionally because of using the wrong pointer, but most of them were simply because I got the algorithm wrong. None of these "safe" languages would have prevented the 2nd kind of error.
Well, this isn't quite the same comment, but if the language is compatible with C, or some subset of C, couldn't you compile the "safe version", run your tests, and then, when you were satisfied, compile with standard C? Surely the answers ought to be guaranteed to be the same if there's no error.
Only for real world inputs that match your test inputs. If you compile with standard C you lose the run time checks, array bounds for example. If these check only have a 1% penalty then for many apps that might be quite acceptable.
No one's arguing a language solves your algorithm problems. But why have security problems in addition to your algorithm problems? As discussed in the presentation, the flaws in these applications were all the "usual suspects". When you keep seeing the same problems again and again it's time to use better tools, including better languages.
Bugs are not always coding errors, bugs may also be in the design and correct code of any language can manifest such design bugs
If I distrust anyone in general, then it's Microsoft. But Microsoft and _security_? In one breath? You kidding?
"Just Fucking Trust Us" --Satya Nadella
C is fast, but old and "unsafe" (better: difficult-to-master or non-intuitive or attention-demanding). Many other languages are slower, but newer and "safer". Practically speaking, these are the two extreme points of the performance vs. safeness trade-off. You can choose what aspect is more important for you and choose a language accordingly, but you cannot have everything.
Personally, I think that all these efforts to come up with the new safer version of C should focus on accepting the reality (do you want C-like fast? Use C) rather than aspiring to what doesn't seem possible. What about coming up with ways to ease the communication between C and the given programming language, such that anyone could eventually rely on pure C (or C++) when required? Or even coming up with a friendliness wrapper allowing to rely on whatever programming language to write C (e.g., you create the algorithm in language X and it generates the closer, "safer" version of C).
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Does the bad C get ran through to the microsoft rehab gulag? Will my next update in 3 months include this for my safety? Screw all this, im gonna fire up my old zilog and just run stuff off that for my own sake
it's convenient for functions whose result you want to check before going further.
e.g. with file operations:
if (NULL == (in = fopen(filename, "r"))) { /* process input from in */
fprintf(stderr, "cannot open input %s\n", filename); exit(2);
}
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
C is a systems programming language, useful for making operating systems and device drivers. That's why it contains all this insecure stuff, like no index checking and pointer arithmetic.
The problem is that people are using systems programming languages to make application programs. They should be using application languages for that.
People are so obsessed with wrangling the last ounce of performance out of application programs that using system programming languages such as C and C++ have become the norm. This is the reason our entire world is based on quicksand.
to a whole ms ....
> Then you (clearly a C fanboi) writes code like this:
> while (*dest++ = *src++);
I actually didn't write the C library. I've written several Perl libraries; you'll find my code in Apache and Solaris, but Roland McGrath wrote glibc.
> Yes, which is why it's important for the language to mitigate that as much as possible.
What's important very much depends on what software is being written. In a typical Excel macro, sure go ahead and check the domain of the value each time it is accessed. It'll be ten times as slow as not checking, but one shouldn't expect the project manager to manually check domains in his VBA. It's good and right for VBA to "mitigate it as much as possible".
In a graphics driver, speed is top priority. It would be a mistake to take ten times as long to execute to "mitigate as much as possible".
Not everything is a shell script.
Not everything is a shell script.
That's right. Some things, for example, are Pascal programs, which is a better option than C. You should take the time to learn some Pascal. You'll come around to the same opinion.
I learned Pascal 15 years ago. It's an okay language.
At the time, Pascal was competing with Visual Basic. VB won.
The world could have chosen Pascal over VB, but they chose VB. In the 1970s, Pascal competed with C. The world chose C.
Now the industry is going through a phase in which people aren't distinguishing between beginner languages that are designed to be easy vs professional, enterprise-grade tools. Legos are easy, and I good way to learn some basics. You shouldn't build your house out of Legos. The same is true when building information systems. The simplest tools may not be the best things to build your enterprise with.
You don't know that. If you have more brain left to do the algorithm, or do the algorithm first in Python or on paper, you perhaps had avoided the mistake.
The problem basically is you can only shuffle 7 - 9 topics in your short term memory. That is basically your brains registers. If three of them are occupied by useless low level stuff, only the rest can be used for the algorithm.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
The power of C is with its pointers, where you can pass an argument to another function by reference. But isn't BASIC passing arguments by reference too? Hence making BASIC as fast as C in a function call.
The communication happens when you CALL strcpy. Inside of glibc, efficiency rules.
Okay, I gave you the link, so when grow up you can read how Pascal is implementing string copy (in Pascal). "Intrinsic" in this respect just means it's included in the built-in library - it still has to be written, silly. The CPU. doesn't have a "copy a Pascal string" opcode, so someone has to write it. That would be guys like me.
Pfah! Depends on the PR behind it, not merit: Pascal kicked the SHIT ouf of C++ performance-wise in of ALL places a competing trade rag, VB Programmer's Journal Sept./Oct. 1997 issue "Inside the VB5 Compiler" in 4/6 tests & tied 1 w/ it (BOTH oddly lost to VB in ActiveX form loads) & it HAS NO NULL-TERMINATED STRING BUFFEROVERFLOW POSSIBLES like BOTH C/C++ do (& no, not all C++ compilers FULLY implement C11 or better std. that tries to compensate for it, easy enough looking for NULL in a chararray/string OR sending 2 pointers in for length (one double size of other & on err or larger you get len) but, then you remember that don't you raymorris as I SPLATTERED YOU all over this site on that note here https://tech.slashdot.org/comm... to the point YOU didn't DARE answer (some douchebag "ZIP" did & I tore him up too - so, is "ZIP" your alterego SOCKPUPPET defender or what?).
* Apple chose Pascal over C so you know, initially, also... & ANYTIME you want to talk "information systems"? I'm game - it's almost ALL I DID for 24++ yrs. professionally.
APK
P.S.=> See subject & see "The Unforgiven" w/ Clint Eastwood - You're just "ENGLISH BOB" to me - nothing more... apk
That sounds like a limitation of your brain; don't assume the same limitations apply to others.
I've been writing C programs for 3 decades, and I have made plenty of mistakes along the way. Occasionally because of using the wrong pointer, but most of them were simply because I got the algorithm wrong.
Security bugs are typically exploited, and not found in ordinary usage. When you get the algorithm wrong is usually obvious right away.
We all know how to CALL a string copy function.
The discussion is how you would IMPLEMENT a simple function.
I put the implementation from Free Pascal in another post if you want to see it. It's about 15 lines.
The operator is syntactic sugar around a function call.
You can see the actual function in the source zip above.
> an assignment operation is not a function call
What looks like an assignment isn't. An assignment should mean both variables point to the same string.
It's creating a new string, copying all the data from the source string to the destination, then finally assigning the new string to the destination variable.
There's a string copy function which accepts the source string as an argument and returns a new string with the data copied from the source.
Sounds like an assignment to me.
When you assign an int variable the value of another int variable, it creates a new variable, and copies the data from the source to the destination. Same semantics.
That's a fair point, to the caller it appears analogous.
To the CPU, they are vastly different. The CPU assigns integers, with a single instruction. Copying a Pascal string is most often hundreds of CPU instructions, there is a function that does that. The String type in Pascal is far more complex than the x bits that comprise an integer. Perhaps most importantly, the variable doesn't hold the value, it's a pointer to memory that is later allocated elsewhere. The function takes the Pascal String type and derefences it to the actual vue in memory, after performing various checks, THEN it asks the CPU to copy a little bit at a time. Contrasted with a pure I assignment, which translates directly to CPU instruction.