Microsoft Open-Sources 'Checked C,' A Safer C Version (softpedia.com)
An anonymous reader writes from a report via Softpedia: Microsoft has open-sourced Checked C, an extension to the C programming language that brings new features to address a series of security-related issues. As its name hints, Checked C will add checking to C, and more specifically pointer bounds checking. The company hopes to curb the high-number of security bugs such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts, all which would be easier to catch in Checked C. Despite tangible benefits to security, the problem of porting code to Checked C still exists, just like it did when C# or Rust came out, both C alternatives.
This won't gain traction until it's compatible with the C99 and C11 standards.
strcpy_s is part of the C11 standard, and it was a library addition, not a language change.
What a joke. If Microsoft was serious about this they should have done it...oh I don't know, maybe 25 years ago. You know, back when people were still writing applications in C, maybe?
Morons.
Seriously, watching things compile, i see the CC complain about that all the damned time.
As far as I can tell, developers consider the conservation of their precious time much more important than properly checking their variable types, or assuring type casts are content appropriate.
Given this simple fact, a language that would DARE to enforce sanity on such a target demographic is sure to be short lived.
I mean, the GALL! Forcing poor developers to write better quality code and avoid bad practices! Shame on you Microsoft!
There's also a Generalized Checked C - GCC for short.
#DeleteChrome
I seem to recall long long ago someone else made a C compiler that did this.
Non sequitur: Your facts are uncoordinated.
My ism, it's full of beliefs.
Long long time ago (~2000?) I used GCC's bounds checking feature.
If I recall correctly, I had to compile my own GCC because it was the only way to enable it.
That's it, I've had enough. I'm going back to Turbo Pascal.
They allow for both by introducing a "bounded pointer" type in addition to the old style pointer. If you want to, you can still create a buffer overflow using a char *. Or, you can choose to a char 256* and access to it won't overflow 256, it'll generate an error.
I'm confused - is Microsoft reinventing C++ or reinventing Java? I can't tell which decade from the last millennium they're trying to drag us back to. (Maybe as far back as when they were considered a reputable software company?)
"Microsoft has open-sourced Checked C, an extension to the C programming language that brings new features to address a series of security-related issues"
Bounds checking for C and C++ Nov 2004
The CPU just needs to set aside an area of memory exclusively for return addresses, and make that protected. No more security issues, buffer overruns, execution of arbitrary code. The real problem is that return addresses are mingled with other data. This should be solved at the hardware level, and AFAICS, it could be done totally transparently to code, even binaries.
Please fix Visual Studio C99 & C11 compliance first before extending the language.
The funny little known fact is: C99 already has a bounded pointer type: A pointer to a variable-length array.
void foo(int N, char (*ptr)[N]) // undefined behaviour
{
(*ptr)[N + 3] = 10;
}
Using the undefined-behaviour sanitizer, you can also have the compiler add automatic checks.
These are runtime checks and therefore expensive. C is intended to be a language for writing low level code that should not be encumbered with the expense of additional runtime checks. Use a different language if you need training wheels.
That declares that ptr is a pointer to a character, and reserves memory at that address for 256 characters. It does NOT prevent one from accessing ptr[312] . (Though a compiler MIGHT catch it if 312 were a literal.) By the C standard, the behavior when attempting to access ptr[312] is undefined, put it would normaly access whatever happens to be at a memory address 66 characters past the end of the ptr array.
The Microsoft extension adds a type with run-time bounds checking, so trying to access ptr[312] will always give an array-out-of-bounds erroe.
That's the right direction. Apple already has a pretty good version of it. (See below.)
Bounds checking C like this now is weak and very, very late:
https://gcc.gnu.org/ml/gcc/199...
https://www.lrde.epita.fr/~aki...
http://blog.qt.io/blog/2013/04...
http://valgrind.org/docs/manua...
https://en.wikipedia.org/wiki/...
But the grand champion memory debugger is the Mac OS X standard malloc libraries. You can simply set environment variables and instantly get better debugging than most methods on all other platforms. I presume this is because Objective C/C++ is such a pain to debug that they just built in features to always be available, even for production apps.
http://www.cocoawithlove.com/2...
Those libraries are clever because when debugging array bounds corruption and used/free, all mallocs get their own mmapped memory block surrounded by unmapped memory. Plus writing patterns into free / allocated memory to detect writing to freed memory, etc. This is great because it triggers a system signal that debuggers can catch deterministically.
I found and used those techniques on my last big project a couple years ago. The Windows desktop app and imaging C++ libraries were full of errors, memory corruption, struct and 32bit/64bit problems, etc. I had to do a lot of debugging and rewriting to port to Mac OS X, then a lot to solve corruption and threading issues. And found out, the hard way, what a mess the "standard" pthreads API / libraries were. Just spurred me on to switch to C++11 to have standard threads. This Mac OS X built-in debugging along with gdb made it a snap to find all of those kinds of errors, even for code meant for Android, Linux, and Windows.
Stephen D. Williams
What's wrong with the standard pthreads API?
"First they came for the slanderers and i said nothing."
As yet, nobody has made an OS that isn't C at the bottom. There's a reason for that. Although there are projects that claim it's now possible, not one major operating system uses them for kernel programming.
And wrappers like this have existed since the first day of C. You can always wrap your own memory and pointer management functions and structures and just expect people to use them. They come with a performance cost, and wrapping C means people can only use your wrappers. Even this, which claims compatibility, basically just introduces two new pointer "types" which can't be dereferenced in the normal way.
It's not that this has been impossible forever and people are only just going "Oh, maybe we should do something". It's been this way because there are things that you need to take account of still. And though security is certainly a high-priority, a system that runs dog-slow, isn't compatible with other APIs, has to have tricks all over it to make it work, and ultimately still has to end up with hardware pointers where the bounds are set by the programmer (as here) means that it won't get used at all.
There's a reason that even "theoretical" OS like MINIX still use C and pointers. At the OS level, hardware access needs unbounded pointers or pointers that only the programmer knows the bounds of. Basically, bang, security problem if they use them wrong.
Even ordinary applications made in pointer-managed languages have to - by definition - include more checks and code than those that don't. I'm not saying those checks aren't worthwhile, or don't stop security problems, but there is still yet to be a serious OS or even low-level drivers written in anything other than C.
And people speak as if, if we were to all just move to Rust or whatever (which also includes its own pointer types including a special insecure "unbounded" pointer - wonder why that is even in there, hmm?) that all the security problems would magically disappear. Unfortunately it's not like that.
It's about eliminating human error and there's a lot that can go wrong with pointer arithmetic and lack of checks. But that human error is present whether or not a pointer is used. Most of the time the problem is lack of bounds checking - that, in any language, can lead to serious problems like crashes, acting on incorrect data, getting into infinite loops, etc.
The problem is that the one part you NEED that kind of balance, deep in the kernel rings where you're using drivers and low-level memory access outside of the normal protections, you don't have it available as the hardware needs real pointers to be manipulated in order to operate.
So checked C is simila to Pascal?
May I direct your attention to this?
C# isn't anywhere near a C alternative. No-one sensible is going to write an OS or low-level driver in C#.
There are many (?) compilers that place boundary spaces above and below arrays, so that access to ptr[256] or ptr[-1] will trigger a (manageable) runtime error, so it will catch runaway loops. But it won't catch things like your example. And C is for speed and this adds a check or two for every array access, so even if enabled in debug mode, it's usually removed in release mode.
Non-Linux Penguins ?
Actually, it declares that ptr is a pointer to an array. So to get at the data you would actually have to do (*ptr)[20] or ptr[0][20]. Provided you have assigned something meaningful to the pointer first. The only memory reserved by the declaration is enough to hold a pointer.
My interest in writing in C is knowing the machine code that is likely to be produced by the compiler. How does this apply to "Nim"?
Change is certain; progress is not obligatory.
Pretty sure a lot of operating systems used Assembler at the bottom in the early 2000s. Now, I think they're pretty flexible, like:
https://github.com/CosmosOS/Co...
Change is certain; progress is not obligatory.
It is absolutely astonishing that it has taken _this_long_ for someone to make these basic fixes to C.
Is the Rust language low level enough to know what the machine code will be produced from the language at a glance?
Change is certain; progress is not obligatory.
Technically, of course you're correct.
However, are we suggesting that assembler is a better memory manager? Because it's literally the bottom of the pack, with C only just slightly above it because it actually include a malloc function.
And Cosmos pretty much only works in VMWare and doesn't even have a single hardware video driver (my exact comment - when you need to start interfacing with hardware, it becomes almost impossible to make any guarantees about the pointers you need to play with) - strange that.
And it uses syslinux as the bootloader.
And critical parts are written in assembler (back to square one!) and literally the assembler code is just tacked into the final OS in place of C' functions (like a macro replacement).
Again, ZERO MEMORY POINTER GUARANTEES in assembler.
All they've done is wrote the OS frontend and maybe a filesystem etc. in C# while assembler (potentially insecure and able to do and access any memory or mess up stacks as normal) is used for all the critical parts - booting up, interfacing with hardware, and god knows what else.
It's an amazing achievement but it's NOT an OS written in C#.
You apparently haven't looked at the output of C compilers recently. The output is less and less predictable from looking at the C code.
The biggest issue with C at the moment isn't actually bounds-checking (although that would be nice) -- it's the fact that it's a minefield of constructs which look perfectly sensible but are in fact "undefined", in which case the compiler is authorized to do absolutely anything it wants. For instance, the C standard explicitly states that all pointers point to valid memory, and that having a pointer that points into non-valid memory is "undefined". This means on super-high-performance loops the compiler can make simplifying assumptions to get 5% speed increases; but it also makes it very difficult to write security checks that the complier won't just optimize out without telling you.
TCP: Why the Internet is full of SYN.
As yet, nobody has made an OS that isn't C at the bottom.
Riiight.
Ezekiel 23:20
actually (ptr + 20) and *(ptr + 20) can be used .. and is what is most likely translated to by the compiler. Just as printf("%c", "this is a string"[8]); will output the letter 'a'
*("this is a string" + 8) will also produce the letter 'a' as "this is a string" is used as the base address.
As yet, nobody has made an OS that isn't C at the bottom.
What nonsense.
Perhaps you like to google a bit or read wikipedia? Mac OS e.g. was written in Pascal. Other OSes are written in Forth, Java, Oberon, Modula II. There are plenty of OSes you never heard about written in Languages you never heard about.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
it becomes almost impossible to make any guarantees about the pointers you need to play with) - strange that.
That is nonsense. To access hard ware in 99.9% of all cases you have a fixed pointer pointing to the "I/O port" of that hard ware. The pointer never changes, you don't do any arithmetic with it etc.
And all languages that don't run in a VM support such pointers, e.g. in Pascal you write:
aLongWord : long [$000EF0002], some Pascals have an "absolute" keyword that takes an address, and basically all Pascals can use variant Records. Modula 2 and Oberon use System libraries to access raw memory.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I wonder if some kind of AI (Watson or similar) is going to deprecate a lot of these programming languages, extensions, and tools. Imagine if your AI could analyze your millions of lines of code in a few minutes, then aggressively optimize it and patch all of the bugs and vulnerabilities, while still giving the same intended output 100% of the time. I wonder what programming would look like at that point.
I imagine at that point everybody would be using a programming language like Go or Python that emphasizes readability and understandability, since all of the manual stuff you used to have to do with C or C++ or asm won't be necessary anymore.
Another indication that M$ recognizes that .NET and C# have failed.
You can simply set environment variables and instantly get better debugging than most methods on all other platforms.
... except ... its the same as on FreeBSD ... and every other applications that uses that malloc allocator instead of a crappy one. (Its not OS specific, its just the default allocator in OSX, iOS and FreeBSD, presumably others but I don't really follow the others.
Queue the idiots screaming that Apple stole it from FreeBSD now ... (original code was written to make the FBSD allocator not suck ass years ago)
And found out, the hard way, what a mess the "standard" pthreads API / libraries were.
Sigh, you just told us you're a shitty developer. Best of yet, you don't realize that what you switched to ... still uses pthreads under the hood on every platform that matters except windows.
pthreads requires you to know WTF you are doing and are extremely efficient, but I'm guessing you prefer something that does all the work for you and wastes a fuckton of CPU time because 'math is hard' Barbee.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
I wasn't actually aware of syslinux being used for the bootloader.
I thought the critical low level parts were written in X# (which is very assembler-like)?
I definitely ran Cosmos off a flash drive before on actual hardware. As for video, I wonder if your reasoning is true or just another matter like, lack of interest.
Change is certain; progress is not obligatory.
I tend to avoid C++ compilers that are branded as 'C compilers' for C and the C compilers I play with are pretty sane with a couple of switches.
Change is certain; progress is not obligatory.
All these noobs preoccupied with the "security" of libc string function... nobody use the libc string function not because they are insecure, but because they are bad. They are also trivial to implement if you really need them. Anyone that need to process text will build a proper parser.
Except nearly all parsers will use them in some form.
String are the least important aspect of any programe... except helloworld.c which is all about string and accomplish nothing.
If you think string parsing, manipulation is the least important aspect, then it is very obvious you do not actually do any programming any more. Inputs are more and more becoming linked with string parsing, gaining larger and larger influences over RPC APIs since XML and now JSON.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
For instance, the C standard explicitly states that all pointers point to valid memory, and that having a pointer that points into non-valid memory is "undefined".
Define an "invalid pointer" value then?
Among the many uses of C is OS and Boot Loader programming, and in those cases the value "0" is still a valid memory address to use. IOW, there are some areas of programming where there is no invalid value. therefore you can't define what is non-valid and what is valid, therefore it is undefined. These areas are also a primary target of operation for the C Programming Language.
For the C language itself, there probably isn't really anything that could be improved in that manner. Most of what people can validly complain about as being "undefined" is not part of the C language but of the libC library, and in that case it basically came down to history and vendors being unable to agree upon a standard result because they already had their implementations, didn't want to significantly change them, and saw some value in how they were doing it that they didn't want to give up or change. Some string and memory functions fall exactly into this position.
Now, what would be extremely helpful is the incorporation of functionality that could help with tracking memory chunks and determining whether they were allocated by a given program or library instance, on the stack or the heap of said instance, and be able to report the size of the allocation. This would enable programs to be able to make simple queries in order to avoid buffer overflows.
And calls like the strcpy_s() are stupid when the solution really is strncpy(), which includes a parameter to define how large the output buffer is.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
*lol* First thing that popped into my mind was magic number warning for every pointer. "It says here you just added 4, and it worked?" Well yes, I just reserved that many bytes for that. "Why don't you first give that number a number in the memory, so you can call your numbers from that number?" *turns off magic number for pointers*
It produces C code, so you can know the machine code that is likely to be produced by the C compiler. If you know what machine code your C code will become, which I find quite hard, due to modern optimizations. That shifts the problem to knowing what C code your nim code will be turned into. It would take some learning and experience, and one would have to weigh this against the advantages nim offers, of which the checking described here ist only one. /. noob hell for this.
P.S.: sry for re-posting this while logged in, my anon reply was invisible even when selecting all posts to be displayed. Hope I don't go to
Up pops Clippy - "It looks like you're voiding a pointer!"
#DeleteChrome
Of course their were the lisp machines: https://en.wikipedia.org/wiki/...
Umm, no. *(ptr + 20) is different from those above. That would translate to *(char**) (((void*) ptr) + sizeof(char*) * 20). Whilst ptr[0][20] and (*ptr)[20] translate to *(char*) (((void*) (*ptr)) + sizeof(char) * 20).
And (ptr + 20) is even worse since it's completely missing dereferences.
Your printf example is correct though, but not relevant to the current case.
Yeah, but I'm not going to write 100 different variations in nim to get it to produce the C code I want to get it translated to machine code.
I'm writing low level C code, which doesn't have much room for optimization honestly and I like to do my own micro optimizations (and trust me, a compiler can't optimize some things the same way a human can, take a look at path walking in the Linux kernel for an epic example).
Sadly, things like bounds checking and malloc handling aren't always the most efficient and sacrifice performance; which isn't always in the interest of the project one works on.
Change is certain; progress is not obligatory.
Well, let's be honest, assembler does not have the nice fancy IDEs you can get with C (even if you're not using their fancy compilers), the language isn't so simple that you have to write out entire loop code. It is low level enough that I can write micro-optimizations and understand more or less what the compiler is going to produce.
So yes, my preference for using C is for this particular scenario typically.
Change is certain; progress is not obligatory.
Up pops Clippy - "It looks like you're voiding a pointer!"
Hey, John, It looks like you want to write some spaghetti code dereferencing Null pointers. Would you like help?
Telemetry calls embedded via the compiler? Check!
Queue the idiots screaming that Apple stole it from FreeBSD now
Is it "stole" as much as "used under license"? Darwin is FreeBSD's userland on a Mach-derived kernel.
That and if you need to port your code to other architectures, such as x86 vs. x86-64 vs. ARM vs. MIPS, you can recompile C with only a minor loss in runtime speed due to loss of effectiveness of micro-optimizations.
No-one sensible is going to write an OS or low-level driver in C#.
Singularity was written in a variant of C#, using the .NET CLR's safety model.
"Singularity authors aren't sensible"? No true Scotsman.
And, so, how is this different from lint?
For that matter, none of all the C I used to write ever had buffer overflows, or memory out of bounds, etc. Of course, I wasn't right out of school when I wrote it, nor was I lazy... so, for example, I almost *never* used while/wend - overwhelmingly, it was for/next, with *LIMITS*. After some early C programming bugs that I fixed, I started, more and more, using strncpy.. And I *checked* the return from malloc.
mark "remember, half of the programmers out there are the bottom half"
I have no idea how you would try to program a PMMU in C.
Any program requiring a processor feature not exposed by the definition of the C language can be written in a language that's a superset of C with either intrinsics for that feature or inline assembly. Intrinsics are more involved to implement but might be safer in the sense that more things can be statically proved about their behavior.
This does not reserve memory. This declares ptr to be a pointer to an array of length 256. This is exactly what a bounded pointer is. And yes, compilers can sometimes detect out-of-bounds accesses at compile time (not only for literals) and because out-of-bounds accesses are undefined behavior, there are free to add run-time bounds checking when the pointer is de-referenced. And this is exactly what the undefined-behavior sanitizer for clang and gcc does if you use it. I know, because I fixed this for gcc because it wasn't working.
void foo(int n, char (*buf)[n]) ./a.out
{
(*buf)[n] = 1;
}
int main(int c, char* argv[])
{
char buf[10];
foo(10, &buf);
}
$ clang-3.5 -fsanitize=undefined -O3 c.c
$
c.c:4:2: runtime error: index 10 out of bounds for type 'char [n]'
In the example I gave, the size is encoded in the type. This makes it simple and cheap for the compiler to add bounds checking (which can also often be optimized away if it can prove that the access is not out-of-bounds).
That's very interesting, thank you. Sometime I might use -fsanitize=undefined. Since most software is network-bound or IO-bound, I suppose the cost would be minimal?
> I know, because I fixed this for gcc because it wasn't working.
It's always good to hear from someone who definitely knows.
Meh, I wrote a functional kernel-mode XML parser used in production for a job a few years back. Didn't use a single library string function, which was the point here. As the GPP said: anyone that needs to process text will build a proper parser.
memcpy is different: it's full of surprisingly complex optimizations. But memcpy_s is fucking stupid.
Socialism: a lie told by totalitarians and believed by fools.
Ahh.. I never noticed the ptr[0][20] !!!
No spaghetti in the code! Also after doing years of Java and checkstyle, dereferencing sounds weird. I mean I get it, it's the value from the memory, not the number to its start, but it just feels like "referring" to a null pointer doesn't it? Maybe it's just me. I'm referring to 50, ohh nothing there which fits the bill, I was dereferencing... maybe it's just the language, you don't delocate a gift box do you? You locate and pick it up. It's also delocated, okay I'm done with this.
--Call me when you can successfully compile the Linux kernel on this thing...
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
C... why do you even variable bro?
Meh, I wrote a functional kernel-mode XML parser used in production for a job a few years back. Didn't use a single library string function, which was the point here. As the GPP said: anyone that needs to process text will build a proper parser.
Well, kernel-mode would prevent you from using something from libc any way...so yeah you don't have to but it's still better to do so when you can; kernel-mode stuff typically provides its own series of functions instead of using standard library stuff so as not to get unwanted dependencies - though that all depends on what you're willing to accept as a kernel writer (some may like to just import libc and provide the necessary hooks to just run it).
memcpy is different: it's full of surprisingly complex optimizations. But memcpy_s is fucking stupid.
all the *_s() functions are stupid. Every time I have to write code for Windows software I have to map numerous functions from _() to (); the _s() just uses it (or something very much like it) behind the scenes any way.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)