The Linux Kernel Is Now VLA-Free: A Win For Security, Less Overhead and Better For Clang (phoronix.com)

← Back to Stories (view on slashdot.org)

The Linux Kernel Is Now VLA-Free: A Win For Security, Less Overhead and Better For Clang (phoronix.com)

Posted by msmash on Monday October 29, 2018 @08:45AM from the for-the-record dept.

With the in-development Linux 4.20 kernel, it is now effectively VLA-free. From a report: The variable-length arrays (VLAs) that can be convenient and part of the C99 standard but can have unintended consequences. VLAs allow for array lengths to be determined at run-time rather than compile time. The Linux kernel has long relied upon VLAs in different parts of the kernel -- including within structures -- but going on for months now (and years if counting the kernel Clang'ing efforts) has been to remove the usage of variable-length arrays within the kernel. The problems with them are:
1. Using variable-length arrays can add some minor run-time overhead to the code due to needing to determine the size of the array at run-time.
2. VLAs within structures is not supported by the LLVM Clang compiler and thus an issue for those wanting to build the kernel outside of GCC, Clang only supports the C99-style VLAs.
3. Arguably most importantly is there can be security implications from VLAs around the kernel's stack usage.

53 of 113 comments (clear)

Min score:

Reason:

Sort:

Re:"VLAs within structures" not part of C by Knuckles · 2018-10-29 09:05 · Score: 1

VLAs within structures ... are not supported
Maybe read the whole sentence?

--
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Re:"VLAs within structures" not part of C by aardvarkjoe · 2018-10-29 09:08 · Score: 4, Informative

This is what they are referring to. Code like (from that link):

void foo (int n) { struct S { int x[n]; }; }

--

How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?
Bummer. by demon+driver · 2018-10-29 09:08 · Score: 1

Vla
Re:"VLAs within structures" not part of C by Anonymous Coward · 2018-10-29 09:30 · Score: 2, Informative

Looks like you are lost, buddy. This is C.
And how does your vector BS solve the problem? Is its storage allocated on stack entirely?
VLA-free does not resolve the problem. by Anonymous Coward · 2018-10-29 09:39 · Score: 1

The advantage of VLA is pushing items to it without knowing the max. capacity.
The risk of VLA-free is out of capacity, unknown max. capacity.
The stack is like VLA but for only 1 stack instead of many stacks as many VLAs.
Another example, the number of open files. I should not limit it to a small constant, by example, max. 1024 open files, but i need 1 million of open files (for P2P). With VLA it is more flexible under demand or needness.
Many current PCs are 64-bit and have much memory as 32 GB by example for preventing the bottleneck or the out of memory.
1. Re:VLA-free does not resolve the problem. by drnb · 2018-10-29 10:29 · Score: 1
  
  Another example, the number of open files. I should not limit it to a small constant
  There is no such limitation. You can re-allocate arrays as needed. VLA is just automating that for you.
GCCisms by jd · 2018-10-29 09:40 · Score: 4, Informative

The first problem is that they can be dropped from future versions of GCC. They're not part of any standard, after all.
The second problem is that there are situations in which GCC isn't the most suitable compiler. You want to minimize hacks for each different compiler supported.
Security is a big thing, too. It's hard to audit fundamentally unpredictable code.
A major step forward.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
1. Re:GCCisms by The+Evil+Atheist · 2018-10-29 10:41 · Score: 3, Informative
  
  VLAs are part of the C99 standard. It says so right in the summary, and you can look up the standard itself.
  
  --
  Those who do not learn from commit history are doomed to regress it.
2. Re:GCCisms by The+Evil+Atheist · 2018-10-29 10:41 · Score: 1
  
  GCC already supports many of Clang's sanitizers.
  
  --
  Those who do not learn from commit history are doomed to regress it.
3. Re:GCCisms by Anonymous Coward · 2018-10-29 11:15 · Score: 1
  
  A major step forward.
  How the fuck is this a step forward?
  You previous needed a array of variable size.
  You had a language feature that automatically allocated storage for it statically on the stack in a virtually seemless way.
  "Oh no! Advanced features!! Memory stuff!! Scary, scary, go back to K+R!"
  Great. Now you still need the same functionality , but you've now gone BACK to playing around with malloc/free, adding additional function parameters, wasting memory "just in case", or whatever other ad hoc solution a developer now has to use as a workaround.
  New features are not just for convenience. They are often a direct response to serious, chronic deficiencies in the existing standard. Memory management is an eternal source of error an removing variable length arrays just leads to more of it.
  "I'll just allocate all my arrays statically anyway. I know the right size in advance".
  Great. Good for you. Master design. Except until the day a kernel module won't know the right size to allocate, and will call a function with enough instances that someone will go reaching for kmalloc instead. How many steps forward will you be then?
4. Re:GCCisms by Uecker · 2018-10-29 18:16 · Score: 1
  
  VLA are part of the standard and it is a myth that removing them it helps security as it removes price information about run-time bounds from the compiler's view. In my opinion not using VLA is a major step back for security. (Yes, I contributed to with this overall effort myself, but only where it were fake VLAs which really had a constant size but the compiler couldn't know this).
  For GNUisms in general, we will certainly try to standardize some of them (those which are useful, well-defined, and supported by other compilers).
5. Re:GCCisms by jd · 2018-10-29 21:54 · Score: 1
  
  The estimated defect density is around 0.014 (ie: about 14 issues per million lines of code). That gives you an upper threshold on exploitable defects.
  However, if we can reduce that to 0.001, through Clang, which implies 325 defects found with Clang, I'm not going to complain. I'm going to cheer.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
6. Re:GCCisms by jd · 2018-10-29 21:58 · Score: 1
  
  There's lots of stuff in the Linux kernel that uses a GNU variant of a concept rather than the ISO variant. Often because GNU was there first. If you switch from the vendor-specific form, the isms, to the standard form, you don't change the concepts involved but you do make it more portable.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
7. Re:GCCisms by jd · 2018-10-29 22:05 · Score: 1
  
  1. Calm down
  2. You're assuming GNU's method is the only method and thus the standard method
  3. Plenty of people staple together blocks to create virtual arrays. Some are called filing systems, some are called Linux memory managers, and there's one called GMP. It's the method underlying any potentially fragmented workspace if you don't want to keep copying. Because it's required in a lot of Linux, a standard, portable, form in a helper library would be nice. It might mean we can get rid of the umpteen queue and stack implementations, too.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
8. Re:GCCisms by jd · 2018-10-29 22:41 · Score: 2
  
  So you're aware that GNU introduced features often way in advance of any standard and that the GNU syntax/semantics don't always match the ISO version.
  Let's see what ISO says about VLAs:
  C99 adds a new array type called a variable length array type. The inability to declare arrays whose size is known only at execution time was often cited as a primary deterrent to using C as a numerical computing language. Adoption of some standard notion of execution time arrays was considered crucial for C’s acceptance in the numerical computing world.
  Does this match your experience?
  Would discontiguous pools of contiguous memory, giving you the ability to make anything flexible size, be that much worse as that's what the compiler will be using anyway?
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
9. Re:GCCisms by vbdasc · 2018-10-30 01:07 · Score: 1
  
  The first problem is that they can be dropped from future versions of GCC. They're not part of any standard, after all.
  IMHO, GCC rarely drops its proprietary extensions, once introduced and documented. Correct me if I'm wrong.
10. Re:GCCisms by Uecker · 2018-10-30 03:14 · Score: 2
  
  So you're aware that GNU introduced features often way in advance of any standard and that the GNU syntax/semantics don't always match the ISO version.
  Yes of course. In fact, I added myself a GNU extension. I am also participating in WG14.
  
  Let's see what ISO says about VLAs:
  C99 adds a new array type called a variable length array type. The inability to declare arrays whose size is known only at execution time was often cited as a primary deterrent to using C as a numerical computing language. Adoption of some standard notion of execution time arrays was considered crucial for C’s acceptance in the numerical computing world.
  Does this match your experience?
  Absolutely. I use C for numerical computing and VLAs a very important.
  
  Would discontiguous pools of contiguous memory, giving you the ability to make anything flexible size, be that much worse as that's what the compiler will be using anyway?
  I don't understand what you are trying to say. The VLA will live on the stack or the heap depending on where one allocates it. In both cases, there is no way to resize it. Making it resizable is much harder and no compiler does this as it would require a level of indirection which reduces performance and would require some kind of automatic memory management (automatically running destructors). Of course, you can always add your own abstraction for resizable arrays.
11. Re:GCCisms by Uecker · 2018-10-30 03:28 · Score: 1
  
  If you don't want to allocate on the stack you can also not call a function as the stack frame is allocated on the stack...
High vs low languages. by HeckRuler · 2018-10-29 09:44 · Score: 3, Interesting

VLAs are an example of C becoming ever so slightly higher level. When the language does things under the hood without telling you it's just an invitation to bite you in the ass. Good purge.
1. Re:High vs low languages. by MobyDisk · 2018-10-29 09:51 · Score: 1
  
  What was it doing under the hood without telling you? Isn't a VLA basically just a call to alloca()?
2. Re:High vs low languages. by HeckRuler · 2018-10-29 10:47 · Score: 5, Interesting
  
  Yes. Exactly that. It's allocating space for you. It figures out at run-time the length of your array rather than you having to do it by hand at compile-time. I didn't actually know of any security flaws this would lead to, but it stops debuggers from knowing details about calls so it obscured some information from me and pissed me off once.
3. Re:High vs low languages. by Anonymous Coward · 2018-10-29 10:52 · Score: 1
  
  Calls to alloca() are explicit which makes them less hidden to human review. VLAs encourage writing code without putting more thought into it. It also encourages the use of variably length things when fixed length things make more sense. It also discourages thinking about when variably length may come into play and how to efficiently handle them.
  In general, all of this is bad things for code running at all times and with [near] unlimited power to do harm, especially when those properties are ripe for abuse by adversaries.
4. Re: High vs low languages. by The+Evil+Atheist · 2018-10-29 18:34 · Score: 1
  
  Stacks are an abstraction already. There's nothing fundamental about a stack.
  
  --
  Those who do not learn from commit history are doomed to regress it.
Re:Finally! by slickwillie · 2018-10-29 10:07 · Score: 3, Funny

But it is not TLA (Three Letter Acronym) free.

I think the Linux community should join CAT - the Campaign to Abolish TLAs.
Not me. I think of Klaang by mschaffer · 2018-10-29 10:29 · Score: 1

Klaang from Star Trek: http://memory-alpha.wikia.com/...
Good for debugging too by drnb · 2018-10-29 10:32 · Score: 3, Interesting

It helps with debugging too. Build with two unrelated compiler systems and bugs that don't stand out in one may stand out in the other. And I am talking run-time errors not compiler warnings.
GNU'isms by drnb · 2018-10-29 10:35 · Score: 2

Once we get rid of all the GNU'isms can we go back to simply calling it "Linux". ;-)
1. Re:GNU'isms by jd · 2018-10-29 21:56 · Score: 2
  
  The GNU over Linux refers to GNU userspace over Linux kernelspace. So a GNU userspace over OpenBSD would be GNU/OpenBSD. BSD/BSD is 1, since you're dividing by itself.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Finally! by Spacelem · 2018-10-29 10:36 · Score: 2, Informative

Acronyms are words that you pronounce, like laser (Light Amplification by Stimulated Emission of Radiation), scuba, radar, or PIN (Personal Identification Number number).
Initialisms are words you spell out, like FBI, CIA, DNR, ECG, MRI, DVLA etc.
A TLA is an initialism, not an acronym, so really it's not a TLA, it's a TLI. Not sure which one CAT is supposed to be though!
Re:"VLAs within structures" not part of C by The+Evil+Atheist · 2018-10-29 10:43 · Score: 1, Informative

std::vector uses the heap. VLAs are supposed to be on the stack, or at least without require separate allocation. You can't use the heap to solve all your problems when it comes to a kernel.

--
Those who do not learn from commit history are doomed to regress it.
Re:Non-standard language extensions by arth1 · 2018-10-29 11:35 · Score: 2

When the Linux kernel depends on non-standard language extensions that only GCC implements, that's OK.
Except that VLAs are part of the C99 standard, and there's nothing in the standard that says they can't be used in a struct - it's just difficult for the compilers. gcc has chosen to technically implement it as an extension, while Clang/LLVM doesn't support it (nor the floating point pragmas of C99, which has also been an issue for some kernel code).
Oh, god, the kernel is already falling apart! by shess · 2018-10-29 11:54 · Score: 1, Funny

See, adopting a code of conduct is already undermining the foundations of the kernel.
C does not really have arrays by aberglas · 2018-10-29 12:00 · Score: 4, Interesting

Just memory addresses. *Foo could be one or a few or many. Pointer arithmetic.
So variable arrays feels odd.
If you did not like chasing down weird memory corruption problems then you would not be using C (or C++) in the first place.
It would have been trivial to add a little bit of sanity with syntax like
void foo(char buf[blen], int blen)
so a compiler could, in debug mode, check. But no, that would not be a hero's C. nor is variable length arrays.
Incidentally, C's lack of arrays is not efficient. E.g. it is the reason we need 64 bit pointers, namely that C can only address 4 gig in 32 bit pointers. Java can access 32 gig of memory with 32 bit pointers because mallocs are aligned, and 32 gig is more than enough for the vast majority of current applications, and likely to remain so for a long time to come. Doubling your pointer size with lots of zeros is expensive, it clogs caches etc.
1. Re:C does not really have arrays by Aighearach · 2018-10-29 16:50 · Score: 1
  
  If you did not like chasing down weird memory corruption problems then you would not be using C (or C++) in the first place.
  Well, perhaps but OTOH many of us avoid malloc like the plague.
2. Re: C does not really have arrays by Anonymous Coward · 2018-10-29 18:31 · Score: 1
  
  why the fuck are you coding in C then?
3. Re:C does not really have arrays by jd · 2018-10-29 22:21 · Score: 1
  
  Well, most machines probably do treat a machine-level 64-bit pointer as a 64-bit pointer if the opcode says to. It's an atomic operation to load into a register, after all.
  In practice, the compiler won't use a 64-bit opcode if a smaller operation is faster and will work. One reason strongly-typed languages are good for optimizers - you can place things precisely and thus work out how large pointers need to be.
  That's the compiler. The machine just runs the opcode as provided.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:"VLAs within structures" not part of C by Darinbob · 2018-10-29 12:36 · Score: 1

Sadly I knew someone who prefered std::map over std::vector. Including when the key range was tiny and it was guaranteed to only have a single element at a time (I so wish I was making this up). If someone's only tool in their tool box is an nail-gun, don't be surprised if their project has a lot of nails in it.
Re:"VLAs within structures" not part of C by Darinbob · 2018-10-29 12:41 · Score: 3, Informative

Generally, you can get the tricky parts of the kernel done in C, then layer C++ on top of it. That's what a lot of embedded RTOS systems do. The biggest snag is the tendency of getting bloated code from developoers not aware of what C++ does behind the scenes.
Re:"VLAs within structures" not part of C by epine · 2018-10-29 13:50 · Score: 1

I just read a few things about VLAs in C99, and my god, it makes Stroustrup look like a rocket scientist.
Convenient while it works, then brutally unsafe the moment it doesn't work (recompile for a new platform, whole new stack-size ballgame—you do the math, except you can't, because the C standard is deaf-mute on the existence of the primary stack, and hence, perforce, also its size limits).
Of course, when you're compiling the Linux kernel, you are compiling the platform itself, so internally it can certainly sort things in a way that an ordinary C program probably couldn't.
But still, I can't recall C++ violating the type system / allocation sanity this badly since vector<bool> was originally defined as a specialization that didn't actually meet the vector<> container class conceptual requirements, or maybe some early, misguided implementations of smart pointers (who precise misfeatures I've now blissfully forgotten).
Re:"VLAs within structures" not part of C by Uecker · 2018-10-29 18:12 · Score: 1

Huh? There is nothing unsafe with VLA. They also tend to *reduce* stack usage, as the alternative are oversized fixed size arrays on the stack. If you care about the dynamic sized stack, then you also can't have functions calls in a conditional path.
Re: Non-standard language extensions by The+Evil+Atheist · 2018-10-29 18:32 · Score: 1

It's not undefined. It's portable.

--
Those who do not learn from commit history are doomed to regress it.
Re: Non-standard language extensions by Anonymous Coward · 2018-10-29 21:10 · Score: 1

Used sanely Java isn't terrible. Even with realtime stuff - I had no problems. Just don't create objects in the critical path. Given the choice, I wouldn't choose to write the lowest levels of a kernel in Java (or C++) though.
The merit of variable length C99 arrays is a good question. My conservative side says just allocate fixed sized stuff for the smaller cases and for the others malloc and deal with it. If pointers to lists of structs scare you, C ain't for you. But FFS it's 2018 - Is it too much to ask for a C compiler with enough brains to handle VLA's well? I guess so!
I'd love to see some microbenchmarks of VLA's vs doing it the old way.
Re:All = array variants? No by jd · 2018-10-29 22:10 · Score: 1

Contiguous memory is the correct solution, yes. But nothing stops you having an index that tells you where the base is for a given offset. That lets you have a discontinuous set of arrays where each one is accessed as a contiguous array.
Examples: Memory pages in Linux.
If you go onto a different page, you have a different base address. But each page is contiguous. It works, we know it works, and except on crossing boundaries, it's t h e fastest method as you point out.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re: Clang by joao.cordeiro · 2018-10-29 22:29 · Score: 1

Remember what the doctor said!
Re:"VLAs within structures" not part of C by jeremyp · 2018-10-29 22:31 · Score: 1

The flexible array member of a struct is not the same as a variable length array.

--
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
Re:"VLAs within structures" not part of C by religionofpeas · 2018-10-29 22:43 · Score: 1

They reduce average stack size. They don't reduce the worst case stack size, which is what you care about.
Re:"VLAs within structures" not part of C by Joce640k · 2018-10-29 23:20 · Score: 2

It's almost as if you don't know that std::vector can use a custom memory allocator, eg alloca().

--
No sig today...
Re:"VLAs within structures" not part of C by Joce640k · 2018-10-29 23:23 · Score: 1

The biggest snag is the tendency of getting bloated code from developoers not aware of what C++ does behind the scenes.
Doesn't the Linux kernel have a whole army of people approving code changes? They could be aware of what C++ does behind the scenes even if the developers aren't.
(But seriously: Do you think a Linux core developer is incapable of using C++ properly or knowing what it does behind the scenes?)

--
No sig today...
Re:"VLAs within structures" not part of C by Uecker · 2018-10-30 03:19 · Score: 1

They can reduce max stack size in come cases. E.g. if you split one array into two smaller arrays but you don't know how big the smaller arrays are. In any case, they never increase stack size when compared to static arrays on the stack.
Re:MORE's coming that's cool... apk by Aphranius · 2018-10-30 04:17 · Score: 1

What the hell is APK? You're not talking about android app packages, right?
Re:"VLAs within structures" not part of C by Darinbob · 2018-10-30 05:31 · Score: 1

I was referring to kernel and OS work in general, not Linux specifically.
Re:Finally! by Spacelem · 2018-10-30 06:01 · Score: 1

That was clearly intentional and meant as a joke, since everyone says "PIN number" all the time instead of just "PIN".
It's a case of RAS syndrome (or is it RIS syndrome?)
Re:"VLAs within structures" not part of C by The+Evil+Atheist · 2018-10-30 10:29 · Score: 1

It's almost as if you don't know that alloca isn't safe. Especially with vector, resizing is very dodgy with alloca. You may as well just use a std::array.

--
Those who do not learn from commit history are doomed to regress it.