How Linux's Kernel Developers 'Make C Less Dangerous' (hpe.com)

← Back to Stories (view on slashdot.org)

How Linux's Kernel Developers 'Make C Less Dangerous' (hpe.com)

Posted by EditorDavid on Saturday September 1, 2018 @01:34PM from the language-barriers dept.

Hewlett-Packard's Enterprise blog summarizes a talk by Linux kernel developer Kees Cook at the North America edition of the 2018 Linux Security Summit. Its title? "Making C Less Dangerous." "C is a fancy assembler. It's almost machine code," said Cook, speaking to an audience of several hundred peers, who understood and appreciated the application speed resulting from C... Over time, Cook and the people he worked with discovered numerous native C problems. To deal with these weaknesses, the Kernel Self Protection Project has worked slowly and steadily on protecting the Linux kernel from attack. In the process, it has worked to remove troublesome code from Linux....

With its operational baggage and weak standard libraries, C contains a great deal of undefined behavior. Cook cited -- and agreed with -- Raph Levien's blog post "With Undefined Behavior, Anything Is Possible." Cook gave concrete examples. "What are the contents of 'uninitialized' variables? Whatever was in memory from before! Void pointers have no type, yet we can call typed functions through them? Sure! Assembly doesn't care: Everything can be an address to call! Why does memcpy() have no 'max destination length' argument? Just do what I say; memory areas are all the same!" Some of these idiosyncracies are relatively easy to deal with. Cook commented, "Linus [Torvalds] likes the idea of always initializing local variables. So, you should 'just do it....'"

The long-term solution? More security-savvy open source developers... While at times, the idea of coming up with a Linux C dialect has been attractive, that's not going to happen. The real issue behind the problem of dangerous code is "people don't want to do the work to clean up code -- not just bad code, but C itself," he said. As with all open source projects, "we need more dedicated developers, reviewers, testers, and backporters."
LWN.net has its own run-down of Cook's talk, as well as a link to a PDF file of his slides.

"Sound good," posted one of their commenters, "though ultimately I'd like kernel devs to adopt Rust as their main Linux kernel development language. Beats the crap out of C and C++ combined."

6 of 509 comments (clear)

Min score:

Reason:

Sort:

Re:Optimisations by Anonymous Coward · 2018-09-01 14:59 · Score: 1, Informative

Yes. They added defects to the language in the name of "undefined behavior" (aka UB). It's basically impossible to write a program in C or C++ that's 100% free of UB. The industry average is about 15-50 bugs per KLOC, and even the best companies openly admit that their code probably contains at least 7 bugs / KLOC.
Segfaults and signed integer overflow are the most common UBs infecting most of all C and C++ programs. But on the one in a million chance that your program doesn't contain one of those UBs, then the odds are good that you still have UB caused by type punning / pointer aliasing that "works on my computer," but will eventually cause your program to emit nasal demons when a new compiler assumes you didn't alias.
Personally I think optimizations around null pointer dereferences are symptoms of the worst form of brain damaged stupidity and/or intentional malice in compiler developers: basically, the compiler is allowed to remove branches testing for segfaults if it can prove that you will unconditionally dereference the same pointer later. And instead of emitting an error and telling you at compile time that it has statically proven that you messed up, the compiler writer is happy to just silently emit code he knows contains UB, and that he knows will result in nasal demons.
If only... by technosaurus · 2018-09-01 16:29 · Score: 1, Informative

C could be a lot better at being closer to the machine. It actually lacks several features in order to placate defunct architectures.

* stdint-style types for size_t, ptrdiff_t, etc...
. - allows better portability for embedded
. . 1. ex. size8_t for indices that won't exceed 255
* lacks largest integer register type
* vector types and extensions - use arm's naming (optional?)
* only single word tokens (ex. unsigned long long double complex)
* ascii only (ebcdic et.al are defunct) => 'z'-'a'==26
* 2s complement
* iee-754
* big/little endian
* define , >>>, >> and to remove undefinedness
. -- differentiate between logical and arithmetic shift * standardize functions for common ops
. - ror, rol, ctz, clz, parity, popcount, etc...
* switch for "strings"
. - use ~strcmp and ~bsearch (or if tree for small number of cases)
. . 1. *simplified versions of bsearch + strcmp
. . 2. compiler internally sorts strings
. . 3. could be extended to other non-integer types
* ability to define bit size of parameters and enums.
* function pointer types are inside out ... unless you typedef them - WTF
* function pointers != lamba/block/etc... could be more efficient (ex. qsort)
* _Generic provides 10% functionality with 90% of the work (it sucks)

To name a few. Anyone else know of a "saner C" standard that makes most of the undefined-ness go away?
1. Re:If only... by Anonymous Coward · 2018-09-01 20:43 · Score: 1, Informative
  
  Uhm, C has like half of those and a fourth of them just doesn't make sense.
  Like, size_t isn't an index, why would you want to have a size8_t? Use uint8_t instead. It is exactly what you want.
  You have single word tokens if you use the stdint.h types like you should.
  The logical/arithmetic shift isn't there specifically to cater to older architectures that doesn't have a barrel shift.
  If you want arithmetic oeprations you should use division and let the compiler replace with an arithmetic shift.
  If it doesn't then the compiler doesn't do it's job and it has nothing to do with the language.
  Like really. Many of your points are addressed in the C standard and the C99 rationale. You should read them.
  Also, claiming to want C to be close to the machine while suggesting switch for strings and a plethora of features that would cause function calls with multiple loops in them on a bunch of architectures is more than just a little bit inconsistent.
  Many features are omitted specifically because they would be hard to implement on many architectures, like the parity one.
  It would be great for Z80 maybe, but since 8080 doesn't treat the parity bit the same and many processors doesn't have them using it would force the compiler to insert a function call that loops through your data to get the parity. (Or use a folding algorithm.)
  Since it isn't portable anyway, you could just use inline assembly for that part.
Re:Why not use Rust? by Anonymous Coward · 2018-09-01 16:46 · Score: 1, Informative

<p>Even embedded is mostly C++ now.</p></quote>

(start rant)

I <i>wish</i> it was mostly C++. It may be C++ in the higher power MPU world where software more complexity brings in people with more opinions, but in the smaller world (sub 512k ram, 1m flash) it's still mostly C. In the even smaller embedded world at the sub (16k ram, 32k flash) you'll find it is all done in assembly code. And that world is inhabited with programmers who have their heads stuck up their asses so much they wouldn't know an object if it slapped them in the face.

I help with a framework for 32-bit controllers. The sub 200Mhz, sub 2mb flash type, and its all done in C. Why? 'Our customers are afraid of c++'. So we try to write complex code that works like OOP in C. You can do it, but its soul crushing.

Namespaces in C++ are worth switching from C to C++ alone, even if you don't use objects. If you're talking about a mildly complex system, the name mangling that comes with C++ and namespaces help to reduce collisions in names. Or else you start doing what we're forced to do, build the namespace into the function and variable names.

The syntactical sugar of objects, overloading and polymorphic code is just gravy when you have 40+ developers working on code and colliding name spaces. And when you start to deal with the requirements of supporting n-number of hardware devices and drivers using interface classes are better than naming your function pointer tables that you use under the hood something something vtable.

No, customers are afraid of C++. So we jump though hoops writing objects in C to satisfy the fears, and take 10 times as long to write code.

I'll agree there are some aspects of C++ that should never come near an embedded project, especially with, lets say less experienced engineers (templates), but all this can be solved with proper code discipline and with people who know what they're doing.

Even our tools don't support C++. Well I lie, they do support C++ officially, but in my mind the support is so broken that its better off not supporting it at all. When you're debugging a child class, you want to see all the members, including the members of the parent classes. But no the debugger is so broken it can't do that. Stepping through a virtual function causes the debugger to get lost. How can you even say you support C++ in your debugger/IDE if it can't even do that? And its suppose to be a premiere embedded IDE?

(end rant).

IoT is still expanding, so maybe more complex things will become most of the embedded world, but there will still be a lot of really small embedded areas that are needed. I really do hope that things progress and C++ starts to handle most of embedded coding, at least for the 32-bit embedded world, I really do. Used right C++ is just as fast and optimized as well optimized C code. It's a tool like any other, and requires skilled people to use it well.
Memcpy max length by Anonymous Coward · 2018-09-01 19:20 · Score: 2, Informative

Yep. The source of numerous buffer overruns.
Many, many years ago, I wrote a function that solved that problem. Internally, it called 'memcpy' but with a max number of items that was supplied in the call OR if it was missing, it used an application wide declaration which was usually '80'.
Worked well for me for 20+ years.
Other solutions are available and acceptable to me.
Local variable initialisation by lyakh · 2018-09-01 19:49 · Score: 3, Informative

"Linus [Torvalds] likes the idea of always initializing local variables." That's new to me. I've seen and often requested myself many cases of redundant local automatic variable initialisation, don't remember seen any backlash against them.