The Linux Kernel Is Now VLA-Free: A Win For Security, Less Overhead and Better For Clang (phoronix.com)
With the in-development Linux 4.20 kernel, it is now effectively VLA-free. From a report: The variable-length arrays (VLAs) that can be convenient and part of the C99 standard but can have unintended consequences. VLAs allow for array lengths to be determined at run-time rather than compile time. The Linux kernel has long relied upon VLAs in different parts of the kernel -- including within structures -- but going on for months now (and years if counting the kernel Clang'ing efforts) has been to remove the usage of variable-length arrays within the kernel. The problems with them are:
1. Using variable-length arrays can add some minor run-time overhead to the code due to needing to determine the size of the array at run-time.
2. VLAs within structures is not supported by the LLVM Clang compiler and thus an issue for those wanting to build the kernel outside of GCC, Clang only supports the C99-style VLAs.
3. Arguably most importantly is there can be security implications from VLAs around the kernel's stack usage.
1. Using variable-length arrays can add some minor run-time overhead to the code due to needing to determine the size of the array at run-time.
2. VLAs within structures is not supported by the LLVM Clang compiler and thus an issue for those wanting to build the kernel outside of GCC, Clang only supports the C99-style VLAs.
3. Arguably most importantly is there can be security implications from VLAs around the kernel's stack usage.
VLAs within structures ... are not supported
Maybe read the whole sentence?
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
This is what they are referring to. Code like (from that link):
How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?
Vla
Looks like you are lost, buddy. This is C.
And how does your vector BS solve the problem? Is its storage allocated on stack entirely?
The advantage of VLA is pushing items to it without knowing the max. capacity.
The risk of VLA-free is out of capacity, unknown max. capacity.
The stack is like VLA but for only 1 stack instead of many stacks as many VLAs.
Another example, the number of open files. I should not limit it to a small constant, by example, max. 1024 open files, but i need 1 million of open files (for P2P). With VLA it is more flexible under demand or needness.
Many current PCs are 64-bit and have much memory as 32 GB by example for preventing the bottleneck or the out of memory.
The first problem is that they can be dropped from future versions of GCC. They're not part of any standard, after all.
The second problem is that there are situations in which GCC isn't the most suitable compiler. You want to minimize hacks for each different compiler supported.
Security is a big thing, too. It's hard to audit fundamentally unpredictable code.
A major step forward.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
VLAs are an example of C becoming ever so slightly higher level. When the language does things under the hood without telling you it's just an invitation to bite you in the ass. Good purge.
But it is not TLA (Three Letter Acronym) free.
I think the Linux community should join CAT - the Campaign to Abolish TLAs.
Klaang from Star Trek: http://memory-alpha.wikia.com/...
It helps with debugging too. Build with two unrelated compiler systems and bugs that don't stand out in one may stand out in the other. And I am talking run-time errors not compiler warnings.
Once we get rid of all the GNU'isms can we go back to simply calling it "Linux". ;-)
Acronyms are words that you pronounce, like laser (Light Amplification by Stimulated Emission of Radiation), scuba, radar, or PIN (Personal Identification Number number).
Initialisms are words you spell out, like FBI, CIA, DNR, ECG, MRI, DVLA etc.
A TLA is an initialism, not an acronym, so really it's not a TLA, it's a TLI. Not sure which one CAT is supposed to be though!
std::vector uses the heap. VLAs are supposed to be on the stack, or at least without require separate allocation. You can't use the heap to solve all your problems when it comes to a kernel.
Those who do not learn from commit history are doomed to regress it.
When the Linux kernel depends on non-standard language extensions that only GCC implements, that's OK.
Except that VLAs are part of the C99 standard, and there's nothing in the standard that says they can't be used in a struct - it's just difficult for the compilers. gcc has chosen to technically implement it as an extension, while Clang/LLVM doesn't support it (nor the floating point pragmas of C99, which has also been an issue for some kernel code).
See, adopting a code of conduct is already undermining the foundations of the kernel.
Just memory addresses. *Foo could be one or a few or many. Pointer arithmetic.
So variable arrays feels odd.
If you did not like chasing down weird memory corruption problems then you would not be using C (or C++) in the first place.
It would have been trivial to add a little bit of sanity with syntax like
void foo(char buf[blen], int blen)
so a compiler could, in debug mode, check. But no, that would not be a hero's C. nor is variable length arrays.
Incidentally, C's lack of arrays is not efficient. E.g. it is the reason we need 64 bit pointers, namely that C can only address 4 gig in 32 bit pointers. Java can access 32 gig of memory with 32 bit pointers because mallocs are aligned, and 32 gig is more than enough for the vast majority of current applications, and likely to remain so for a long time to come. Doubling your pointer size with lots of zeros is expensive, it clogs caches etc.
Sadly I knew someone who prefered std::map over std::vector. Including when the key range was tiny and it was guaranteed to only have a single element at a time (I so wish I was making this up). If someone's only tool in their tool box is an nail-gun, don't be surprised if their project has a lot of nails in it.
Generally, you can get the tricky parts of the kernel done in C, then layer C++ on top of it. That's what a lot of embedded RTOS systems do. The biggest snag is the tendency of getting bloated code from developoers not aware of what C++ does behind the scenes.
I just read a few things about VLAs in C99, and my god, it makes Stroustrup look like a rocket scientist.
Convenient while it works, then brutally unsafe the moment it doesn't work (recompile for a new platform, whole new stack-size ballgame—you do the math, except you can't, because the C standard is deaf-mute on the existence of the primary stack, and hence, perforce, also its size limits).
Of course, when you're compiling the Linux kernel, you are compiling the platform itself, so internally it can certainly sort things in a way that an ordinary C program probably couldn't.
But still, I can't recall C++ violating the type system / allocation sanity this badly since vector<bool> was originally defined as a specialization that didn't actually meet the vector<> container class conceptual requirements, or maybe some early, misguided implementations of smart pointers (who precise misfeatures I've now blissfully forgotten).
Huh? There is nothing unsafe with VLA. They also tend to *reduce* stack usage, as the alternative are oversized fixed size arrays on the stack. If you care about the dynamic sized stack, then you also can't have functions calls in a conditional path.
It's not undefined. It's portable.
Those who do not learn from commit history are doomed to regress it.
Used sanely Java isn't terrible. Even with realtime stuff - I had no problems. Just don't create objects in the critical path. Given the choice, I wouldn't choose to write the lowest levels of a kernel in Java (or C++) though.
The merit of variable length C99 arrays is a good question. My conservative side says just allocate fixed sized stuff for the smaller cases and for the others malloc and deal with it. If pointers to lists of structs scare you, C ain't for you. But FFS it's 2018 - Is it too much to ask for a C compiler with enough brains to handle VLA's well? I guess so!
I'd love to see some microbenchmarks of VLA's vs doing it the old way.
Contiguous memory is the correct solution, yes. But nothing stops you having an index that tells you where the base is for a given offset. That lets you have a discontinuous set of arrays where each one is accessed as a contiguous array.
Examples: Memory pages in Linux.
If you go onto a different page, you have a different base address. But each page is contiguous. It works, we know it works, and except on crossing boundaries, it's t h e fastest method as you point out.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Remember what the doctor said!
The flexible array member of a struct is not the same as a variable length array.
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
They reduce average stack size. They don't reduce the worst case stack size, which is what you care about.
It's almost as if you don't know that std::vector can use a custom memory allocator, eg alloca().
No sig today...
The biggest snag is the tendency of getting bloated code from developoers not aware of what C++ does behind the scenes.
Doesn't the Linux kernel have a whole army of people approving code changes? They could be aware of what C++ does behind the scenes even if the developers aren't.
(But seriously: Do you think a Linux core developer is incapable of using C++ properly or knowing what it does behind the scenes?)
No sig today...
They can reduce max stack size in come cases. E.g. if you split one array into two smaller arrays but you don't know how big the smaller arrays are. In any case, they never increase stack size when compared to static arrays on the stack.
What the hell is APK? You're not talking about android app packages, right?
I was referring to kernel and OS work in general, not Linux specifically.
That was clearly intentional and meant as a joke, since everyone says "PIN number" all the time instead of just "PIN".
It's a case of RAS syndrome (or is it RIS syndrome?)
It's almost as if you don't know that alloca isn't safe. Especially with vector, resizing is very dodgy with alloca. You may as well just use a std::array.
Those who do not learn from commit history are doomed to regress it.