This is exactly the sort of negativity which shows how the open source community is abusive and unable to cope when a great new idea comes along that throws away all those bad concepts in Unix, just because we're right and you're wrong.
Nonsense. Compare the reaction to systemd with the reaction to launchd (XNU) or SMF (Solaris). Most people who have had contact with either of the latter regard them as imperfect but significant improvements on what was there previously and, if they're using systems that don't ship with them wish that they did (or, ideally, something taking the good ideas from them each and combining them, leaving the bad ideas behind). No one is complaining about replacing traditional UNIX tools with something better, they're complaining about replacing stuff that mostly works with something that throws all of the last few decades of software engineering away.
Uh, isn't that true of any native library? Most C and C++ libraries don't do anything you couldn't do yourself
You've selectively quoted him. The full complaint was:
many people use jQuery (which is a large CPU-heavy library) to do things that can be done in fewer lines of straight javascript
Why use a library and 10 lines of library calls to do something that you could do in 5 lines of code? You should use libraries when the cost of reimplementing the functionality is higher than the cost of using the library.
There's a lot of ongoing work. Some of it is in formal models of programming languages: we had a paper in PLDI last year on a model for C (which has to be parameterised, because it turns out the C standards committee, C programmers, and C compiler writers have almost totally disjoint ideas about what C means). We're looking at combining this with formal models of CPUs to allow black-box testing of compilers (if we can prove that the output corresponds to the input then we don't need to verify a few million lines of compiler). There's white-box stuff, such as the Alive project from Utah and MSR looking at proving correctness of a number of the peephole optimisations in LLVM (although, unfortunately, it's so far managed to prove incorrectness of a depressing number of them).
Beyond that, there's a lot of ongoing work improving the proof tools. Coq, Isabelle and HOL4 are still in active development and AGDA has gained an increasingly large following in recent years. The more that you can automate the proofs, the more feasible this kind of thing becomes and, in particular, the easier you can make it to write modular proofs the easier it is to maintain the code. The CEO of Intel just after the Pentium FDIV bug started to pour money into verification and made quite an insightful comment in this regard: He didn't care how much it cost to verify something the first time (it had to be cheaper than doing a full recall), but he needed it to be cheap to re-verify after some changes. Centaur (formerly VIA's x86 chip division) incorporates a lot of verification into their CI system: they're a long way away from full verification, but their verification tools help catch and pinpoint a lot of things that would otherwise (probably) show up in testing in the simulator as 'something is broken in the design'.
There's also a growing push towards proof-carrying code. The F* language (MSR and INRIA) is probably the best example in this space. It's a version of F# that allows you to write proofs in the same language that you write the code and only compiles when the code matches the proofs. Unfortunately, F* still has a very long way to go before simple changes are easy. For example, they have a talk where they walk through verifying a quicksort implementation. If you want to change the quicksort to a heapsort, then you have to throw away most of the proof (and the proof is more complex than the implementation, though at least the input / output specifications are simple and so you can be sure that the proof is correct if it compiles). They're also using an SMT solver (Z3) on the back end, so there's no guarantee that it will actually terminate: you will either get a counterexample, a success message, or a timeout in the solver and a timeout doesn't necessarily mean that the proof is correct or incorrect, just that it's too complex to machine check.
The toy project is the most complex piece of software that has been formally verified. The fact that you consider it to be a toy project tells you everything that you need to know about the current state of the art in formal verification.
Formal verification proves that the program corresponds to the specification. It doesn't prove that the specification is correct. It's really hard not to run into Goedel's incompleteness theorem and discover that the specification is actually more complex (and therefore error prone) than the program.
I can think of two ways. Way one: you contract out the development to a company that has a paid version and an ad-supported version. They accidentally give you the ad-supported apk to push to your customers instead of the paid one. Way two: you have an internal project to see if you can push down the up-front cost of a phone by pushing ads in various places (as the Kindle Fire does on the lock screen) and using that to subsidise the cost. You have the same people working on it as the normal system and one of them accidentally pushes to the wrong repo.
The difference is a lot lower at the higher end. My partner has just replaced her ageing Nokia 1020 (really nice phone and the hardware is still pretty solid, but Windows Mobile 8.1 doesn't get security updates and has almost no third-party apps) with a second-hand Samsung. A similar iPhone is only 20-30% more expensive, but the killer is the lack of third-party OS support. When an iPhone stops getting security updates, it's basically dead. When a moderately popular Android device stops getting security updates, you can install LineageOS on it and it remains useable (my first-gen Moto G is older than her Nokia and a lot slower, but it's still getting fortnightly updates from LineageOS and I expect to keep using it until it physically dies).
What if the pointer points beyond the end of some chunk of malloc'd memory?
Then you will get a trap. The bounds are set at malloc() time and cannot be extended (except by the code in malloc, which holds a pointer to a larger area which it can subset).
Merely checking array bounds is insufficient for any non-trivial C program.
Checking the array bounds is sufficient. Checking the bounds of the static type, is not.
Forcing the companies to provide the means to unencrypt all the data passing through it's services provides very little benefit
It provides no benefit, because the bad people will not use the backdoor'd encryption, they will use something else (if they buy a copy of Applied Cryptography second hand then they can just type in the cost listings for some secure algorithms and use their own version). On the other hand, the existence of a backdoor intrinsically makes a system insecure, so everyone else suffers from making it easier for criminals to gain access to their messages.
I am the author of the CHERI C/C++ implementation (and a bunch of the instructions and key parts of the abstract model). A few corrections:
does pointer bounds and 'const' qualifier checking and some protection of C++ private member variables in hardware
We don't enforce const in hardware, because it broke too much code. For example, the C standard library function strstr takes a const char* argument and returns a char* derived from it. Our hardware doesn't permit you to add permissions to a pointer (or increase its bounds), and so the returned pointer also lacks the store permissions. Instead, we add __input and __output qualifiers that allow this to work: if you cast a pointer to an __input-qualified one then the compiler removes store permissions and no pointer derived from that pointer can be used to modify an object.
sufficient to replace the security properties of virtual memory (with the caveat you can't unmap a page without killing the process)
CHERI doesn't require that you throw away the MMU and some things work a lot better when you compose the two. The MMU is good for coarse-grained isolation, CHERI is good for fine-grained sharing. You can use the MMU to revoke objects (unmap the pages), without having to find and invalidate all pointers (which we can do - a couple of my students and I have added accurate garbage collection to C).
It's implemented in FPGA and only requires a few gates over a standard MIPS and is working with comparable performance without any exotic kinds of cache memory.
Note that we do rely on tagged memory, though we are able to efficiently implement this in commodity DRAM via a tag cache (some of my colleagues have done some great work on improving the efficiency here). We need one tag bit per 128 bits of memory (the tag bit tells you whether an aligned 128-bit value contains a pointer or normal data), so 256 bits per page. You can read 256 bits from a single DRAM read, so anything with vaguely good cache locality rarely needs to pull the tags from DRAM.
It's only "approaching" because I don't think it can solve use-after-free
We can implement an accurate garbage collector in C with CHERI. Some of my current work is attempting to push the performance of this up to where C programmers will be willing to just leave it on, because they won't be able to measure a slowdown vs malloc / free. For the Java interop work that I published at ASPLOS this year, we did very coarse-grained revocation, allowing the JVM to invalidate pointers that the native code held for longer than it claimed that it would, thus preventing any spatial or temporal memory safety violations in C code from affecting the Java heap in any way.
It's just way way way too easy to write buffer overflows in C
Technically: it's too easy to write buffer overflows in most implementations of C. The C specification makes accessing beyond the end of a buffer undefined behaviour. That means that an implementation is allowed to do whatever it wants (and doesn't have to do the same thing consistently). Most implementations implement this as 'trample an arbitrary memory location', but it's completely valid to transform it into a recoverable trap. My research group has proposed a small set of CPU extensions that we're talking to vendors about adopting that allows an implementation of C (which I've created and which we're able to build complex C programs with) that provides these guarantees. Any out-of-bounds array access for us is a CPU trap, which the OS turns into a signal. You can then either kill the process, unwind the stack and kill the thread, or recover in some other way.
That kind of bug also wouldn't appear in C++ with modern (automatically checkable) coding conventions. Additionally, the C++ version would easily have permitted the memory to be reclaimed by RCU, whereas doing this in Rust without overhead would require compiling everything in unsafe mode.
Image processing routines are exactly the sort where compilers are good at eliding bounds checks. The input size is fixed, the filter size is fixed, and scalar evolution lets you calculate up-front the maximum index using well-known analysis techniques. It's then easy to hoist the checks out of the loop and leave a single 'will this loop ever overflow' check, with no bounds checks during the calculation.
Except that Rust is kinda a bit like C except with a formal verifier built in so you can prove you don't have memory errors
Please stop repeating this. Rust has a type checker. Most languages have a type checker. The type system in Rust is stricter than that in C (though it is possible to implement the same thing in the library in C++), but it is not a formally verified type system and the implementation of the type checker (which is not a formal verifier) is also not verified (and can't be until the type system itself is verified). If you want a language with a formally verified type system, look at Pony. If you want a language that integrates formal verification, look at F*. If you want to use Rust, that's fine, but stop claiming that it has features that it doesn't.
I work with a group that does formal verification and you seem also to be talking from a position of ignorance. Currently, the record for low-cost formally verified software is held by the NICTA team behind seL4. Their number is around 30 times the cost of using best practices for normal software development. A few caveats for this number:
The baseline is assuming the effort involved in creating a test suite with full coverage of the specification and a detailed specification. Few projects actually have this level of QA, so add another factor of 2-4 to the cost relative to most projects.
They're employing mathematicians on fairly low (academic-level) salaries for the verification. In an industrial setting, with skills in so much demand, you'd find it difficult to pay them less than double this.
Their entire software stack is around 10,000 lines of code and they've not yet shown that their verification effort scales linearly with the complexity of the software.
The cost of refactoring is close to the cost of initial development, as the proofs are not often reusable after modifications to the code.
It was a whole 6 hours between the first public release of their formally verified microkernel and someone finding the first exploitable security vulnerability.
There's a lot of ongoing research in this area (I quite like F*, though it has some significant issues with proof reuse and usability of its error messages), but the tools for formal verification are currently as appropriate for large-scale modern software development as punch cards.
you instead use a language with some form of god damn formal verification
I think you're confusing Rust with Pony (which does have formal verification of its type system). Rust's type system has not been formally verified (let alone any of the implementation, though someone did verify its binary search implementation).
Before he became executive producer, his episodes were the best of the new set. I suspect that he had a decent editor who was willing to tell him when he was going over the top, and who didn't continue this when he was in charge. Episodes written by the executive producer have been the worst since the show came back, irrespective of who it was.
I'd argue that there aren't many good reasons to favour C over C++ these days and there's a much simpler migration path as most C code is valid (though not idiomatic) C++ code. Replacing bare arrays with std::array, collections implemented in macros with ones implemented in templates, and raw pointers with std::unique_ptr / std::shared_ptr will go a long way to catching bugs. You retain the ability to easily sidestep the type system if you need to, but a lot of the things that require you to in C can be simply implemented with C++.
I always forget how archaic the banking system in the US is. In the UK, to pay for anything under £40 with my card, I take it out of my wallet, tap it on the terminal for about 3 seconds, and then go. Chip and pin takes slightly longer, but getting the response typically takes about 2 seconds: a lot less time to pop the card in the machine and type the pin than it typically takes someone to either find exact change or for the cashier to make change for them. No ATMs in the UK, irrespective of bank, charge me to withdraw cash: UK banks stopped doing that in the '80s. In contrast, my US bank has a small network of ATMs in three states and charges me if I use anyone else's ATM, and the ATM operator also often charges me as well: that double dipping is illegal in the EU.
How much are you paying to withdraw cash? I have a US bank account, but I rarely use it because the currency conversion fees on my UK credit card are lower than the cost of withdrawing cash in the US (where apparently it's legal to double dip, so the bank charges me for using an ATM off their network and the ATM operator charges me for using their ATM).
This is exactly the sort of negativity which shows how the open source community is abusive and unable to cope when a great new idea comes along that throws away all those bad concepts in Unix, just because we're right and you're wrong.
Nonsense. Compare the reaction to systemd with the reaction to launchd (XNU) or SMF (Solaris). Most people who have had contact with either of the latter regard them as imperfect but significant improvements on what was there previously and, if they're using systems that don't ship with them wish that they did (or, ideally, something taking the good ideas from them each and combining them, leaving the bad ideas behind). No one is complaining about replacing traditional UNIX tools with something better, they're complaining about replacing stuff that mostly works with something that throws all of the last few decades of software engineering away.
Uh, isn't that true of any native library? Most C and C++ libraries don't do anything you couldn't do yourself
You've selectively quoted him. The full complaint was:
many people use jQuery (which is a large CPU-heavy library) to do things that can be done in fewer lines of straight javascript
Why use a library and 10 lines of library calls to do something that you could do in 5 lines of code? You should use libraries when the cost of reimplementing the functionality is higher than the cost of using the library.
There's a lot of ongoing work. Some of it is in formal models of programming languages: we had a paper in PLDI last year on a model for C (which has to be parameterised, because it turns out the C standards committee, C programmers, and C compiler writers have almost totally disjoint ideas about what C means). We're looking at combining this with formal models of CPUs to allow black-box testing of compilers (if we can prove that the output corresponds to the input then we don't need to verify a few million lines of compiler). There's white-box stuff, such as the Alive project from Utah and MSR looking at proving correctness of a number of the peephole optimisations in LLVM (although, unfortunately, it's so far managed to prove incorrectness of a depressing number of them).
Beyond that, there's a lot of ongoing work improving the proof tools. Coq, Isabelle and HOL4 are still in active development and AGDA has gained an increasingly large following in recent years. The more that you can automate the proofs, the more feasible this kind of thing becomes and, in particular, the easier you can make it to write modular proofs the easier it is to maintain the code. The CEO of Intel just after the Pentium FDIV bug started to pour money into verification and made quite an insightful comment in this regard: He didn't care how much it cost to verify something the first time (it had to be cheaper than doing a full recall), but he needed it to be cheap to re-verify after some changes. Centaur (formerly VIA's x86 chip division) incorporates a lot of verification into their CI system: they're a long way away from full verification, but their verification tools help catch and pinpoint a lot of things that would otherwise (probably) show up in testing in the simulator as 'something is broken in the design'.
There's also a growing push towards proof-carrying code. The F* language (MSR and INRIA) is probably the best example in this space. It's a version of F# that allows you to write proofs in the same language that you write the code and only compiles when the code matches the proofs. Unfortunately, F* still has a very long way to go before simple changes are easy. For example, they have a talk where they walk through verifying a quicksort implementation. If you want to change the quicksort to a heapsort, then you have to throw away most of the proof (and the proof is more complex than the implementation, though at least the input / output specifications are simple and so you can be sure that the proof is correct if it compiles). They're also using an SMT solver (Z3) on the back end, so there's no guarantee that it will actually terminate: you will either get a counterexample, a success message, or a timeout in the solver and a timeout doesn't necessarily mean that the proof is correct or incorrect, just that it's too complex to machine check.
The toy project is the most complex piece of software that has been formally verified. The fact that you consider it to be a toy project tells you everything that you need to know about the current state of the art in formal verification.
Yes, verbs would have been nice. Usually, they are considered "required" in an English sentence.
Nonsense.
Usually. Not always.
Bravo!
Formal verification proves that the program corresponds to the specification. It doesn't prove that the specification is correct. It's really hard not to run into Goedel's incompleteness theorem and discover that the specification is actually more complex (and therefore error prone) than the program.
I can think of two ways. Way one: you contract out the development to a company that has a paid version and an ad-supported version. They accidentally give you the ad-supported apk to push to your customers instead of the paid one. Way two: you have an internal project to see if you can push down the up-front cost of a phone by pushing ads in various places (as the Kindle Fire does on the lock screen) and using that to subsidise the cost. You have the same people working on it as the normal system and one of them accidentally pushes to the wrong repo.
Yes, verbs would have been nice. Usually, they are considered "required" in an English sentence.
Nonsense.
The difference is a lot lower at the higher end. My partner has just replaced her ageing Nokia 1020 (really nice phone and the hardware is still pretty solid, but Windows Mobile 8.1 doesn't get security updates and has almost no third-party apps) with a second-hand Samsung. A similar iPhone is only 20-30% more expensive, but the killer is the lack of third-party OS support. When an iPhone stops getting security updates, it's basically dead. When a moderately popular Android device stops getting security updates, you can install LineageOS on it and it remains useable (my first-gen Moto G is older than her Nokia and a lot slower, but it's still getting fortnightly updates from LineageOS and I expect to keep using it until it physically dies).
What if the pointer points beyond the end of some chunk of malloc'd memory?
Then you will get a trap. The bounds are set at malloc() time and cannot be extended (except by the code in malloc, which holds a pointer to a larger area which it can subset).
Merely checking array bounds is insufficient for any non-trivial C program.
Checking the array bounds is sufficient. Checking the bounds of the static type, is not.
To summarise the summary of the summary: people are a problem.
Forcing the companies to provide the means to unencrypt all the data passing through it's services provides very little benefit
It provides no benefit, because the bad people will not use the backdoor'd encryption, they will use something else (if they buy a copy of Applied Cryptography second hand then they can just type in the cost listings for some secure algorithms and use their own version). On the other hand, the existence of a backdoor intrinsically makes a system insecure, so everyone else suffers from making it easier for criminals to gain access to their messages.
does pointer bounds and 'const' qualifier checking and some protection of C++ private member variables in hardware
We don't enforce const in hardware, because it broke too much code. For example, the C standard library function strstr takes a const char* argument and returns a char* derived from it. Our hardware doesn't permit you to add permissions to a pointer (or increase its bounds), and so the returned pointer also lacks the store permissions. Instead, we add __input and __output qualifiers that allow this to work: if you cast a pointer to an __input-qualified one then the compiler removes store permissions and no pointer derived from that pointer can be used to modify an object.
sufficient to replace the security properties of virtual memory (with the caveat you can't unmap a page without killing the process)
CHERI doesn't require that you throw away the MMU and some things work a lot better when you compose the two. The MMU is good for coarse-grained isolation, CHERI is good for fine-grained sharing. You can use the MMU to revoke objects (unmap the pages), without having to find and invalidate all pointers (which we can do - a couple of my students and I have added accurate garbage collection to C).
It's implemented in FPGA and only requires a few gates over a standard MIPS and is working with comparable performance without any exotic kinds of cache memory.
Note that we do rely on tagged memory, though we are able to efficiently implement this in commodity DRAM via a tag cache (some of my colleagues have done some great work on improving the efficiency here). We need one tag bit per 128 bits of memory (the tag bit tells you whether an aligned 128-bit value contains a pointer or normal data), so 256 bits per page. You can read 256 bits from a single DRAM read, so anything with vaguely good cache locality rarely needs to pull the tags from DRAM.
It's only "approaching" because I don't think it can solve use-after-free
We can implement an accurate garbage collector in C with CHERI. Some of my current work is attempting to push the performance of this up to where C programmers will be willing to just leave it on, because they won't be able to measure a slowdown vs malloc / free. For the Java interop work that I published at ASPLOS this year, we did very coarse-grained revocation, allowing the JVM to invalidate pointers that the native code held for longer than it claimed that it would, thus preventing any spatial or temporal memory safety violations in C code from affecting the Java heap in any way.
It's just way way way too easy to write buffer overflows in C
Technically: it's too easy to write buffer overflows in most implementations of C. The C specification makes accessing beyond the end of a buffer undefined behaviour. That means that an implementation is allowed to do whatever it wants (and doesn't have to do the same thing consistently). Most implementations implement this as 'trample an arbitrary memory location', but it's completely valid to transform it into a recoverable trap. My research group has proposed a small set of CPU extensions that we're talking to vendors about adopting that allows an implementation of C (which I've created and which we're able to build complex C programs with) that provides these guarantees. Any out-of-bounds array access for us is a CPU trap, which the OS turns into a signal. You can then either kill the process, unwind the stack and kill the thread, or recover in some other way.
That kind of bug also wouldn't appear in C++ with modern (automatically checkable) coding conventions. Additionally, the C++ version would easily have permitted the memory to be reclaimed by RCU, whereas doing this in Rust without overhead would require compiling everything in unsafe mode.
Image processing routines are exactly the sort where compilers are good at eliding bounds checks. The input size is fixed, the filter size is fixed, and scalar evolution lets you calculate up-front the maximum index using well-known analysis techniques. It's then easy to hoist the checks out of the loop and leave a single 'will this loop ever overflow' check, with no bounds checks during the calculation.
Except that Rust is kinda a bit like C except with a formal verifier built in so you can prove you don't have memory errors
Please stop repeating this. Rust has a type checker. Most languages have a type checker. The type system in Rust is stricter than that in C (though it is possible to implement the same thing in the library in C++), but it is not a formally verified type system and the implementation of the type checker (which is not a formal verifier) is also not verified (and can't be until the type system itself is verified). If you want a language with a formally verified type system, look at Pony. If you want a language that integrates formal verification, look at F*. If you want to use Rust, that's fine, but stop claiming that it has features that it doesn't.
The idea with Rust is that the Rust compiler does a significant amount of formal verification so that human beings don't have to.
Please don't confuse type checking, using a non-verified type system, with formal verification.
I work with a group that does formal verification and you seem also to be talking from a position of ignorance. Currently, the record for low-cost formally verified software is held by the NICTA team behind seL4. Their number is around 30 times the cost of using best practices for normal software development. A few caveats for this number:
There's a lot of ongoing research in this area (I quite like F*, though it has some significant issues with proof reuse and usability of its error messages), but the tools for formal verification are currently as appropriate for large-scale modern software development as punch cards.
you instead use a language with some form of god damn formal verification
I think you're confusing Rust with Pony (which does have formal verification of its type system). Rust's type system has not been formally verified (let alone any of the implementation, though someone did verify its binary search implementation).
I'm sad that Sue Perkins didn't get the job - she was rumoured to be in the running and would have been perfect for the role.
Before he became executive producer, his episodes were the best of the new set. I suspect that he had a decent editor who was willing to tell him when he was going over the top, and who didn't continue this when he was in charge. Episodes written by the executive producer have been the worst since the show came back, irrespective of who it was.
I'd argue that there aren't many good reasons to favour C over C++ these days and there's a much simpler migration path as most C code is valid (though not idiomatic) C++ code. Replacing bare arrays with std::array, collections implemented in macros with ones implemented in templates, and raw pointers with std::unique_ptr / std::shared_ptr will go a long way to catching bugs. You retain the ability to easily sidestep the type system if you need to, but a lot of the things that require you to in C can be simply implemented with C++.
I always forget how archaic the banking system in the US is. In the UK, to pay for anything under £40 with my card, I take it out of my wallet, tap it on the terminal for about 3 seconds, and then go. Chip and pin takes slightly longer, but getting the response typically takes about 2 seconds: a lot less time to pop the card in the machine and type the pin than it typically takes someone to either find exact change or for the cashier to make change for them. No ATMs in the UK, irrespective of bank, charge me to withdraw cash: UK banks stopped doing that in the '80s. In contrast, my US bank has a small network of ATMs in three states and charges me if I use anyone else's ATM, and the ATM operator also often charges me as well: that double dipping is illegal in the EU.
How much are you paying to withdraw cash? I have a US bank account, but I rarely use it because the currency conversion fees on my UK credit card are lower than the cost of withdrawing cash in the US (where apparently it's legal to double dip, so the bank charges me for using an ATM off their network and the ATM operator charges me for using their ATM).