GCC Compiler Finally Supplanted by PCC?
Sunnz writes "The leaner, lighter, faster, and most importantly, BSD Licensed, Compiler PCC has been imported into OpenBSD's CVS and NetBSD's pkgsrc. The compiler is based on the original Portable C Compiler by S. C. Johnson, written in the late 70's. Even though much of the compiler has been rewritten, some of the basics still remain. It is currently not bug-free, but it compiles on x86 platform, and work is being done on it to take on GCC's job."
I notice that TFS doesn't say that anyone is actually able to compile anything (other than PCC) with it. The BSD folks would love to have a BSD-licensed drop-in replacement for GCC; but it doesn't sound like this is it. Not yet at least.
./build.sh or whatever).
Wake me up when you're able to use PCC instead of GCC to do a 'make world' (or
pcc will take YEARS to get the functionality and optimizations that gcc has. Even if it compiles slowly and sometimes generates dumb code.
Either way, they'd much, much better off if they imported LLVM and redirected their compiler brain power to clang.
PCC is interesting, but it's based on technology from the 70's, doesn't support a lot of interesting architectures, and has no optimizer to speak of.
If you're interested in advanced compiler technology, check out LLVM, which is an ground up redesign of an optimizer and retargettable code generator. LLVM supports interprocedural cross-file optimizations, can be used for jit compilation (or not, at your choice) and has many other capabilities. The LLVM optimizer/code generator can already beat the performance of GCC compiled code in many cases, sometimes substantially.
For front-ends, LLVM supports two major ones for C family of languages: 1) llvm-gcc, which uses the GCC front-end to compile C/C++/ObjC code. This gives LLVM full compatibility with a broad range of crazy GNU extensions as well as full support for C++ and ObjC. 2) clang, which is a ground-up rewrite of a C/ObjC frontend (C++ will come later) that provides many advantages over GCC, including dramatically faster compilation and better warning/error information.
While LLVM is technologically ahead of both PCC and GCC, the biggest thing it has going is both size of community and the commercial contributors that are sponsoring work on the project.
-Chris
Let me get this straight. A compiler that has been production-quality for over 15 years, compiles everything on every architecture, and has been continuously improved every minute of its existence needs to be replaced by ... Son of pcc? Because of a license?
Sure, I prefer BSD-style licenses, and so do some other people, but what drives gcc development is the GNU license. I think I'll stick to the compiler that's debugged. Oh, that's right, I forgot, it comes with a debugger too. If you like that sort of thing.
It has less to do with the license and more to do with GCC's increasingly spotty support for some of the hardware platforms that NetBSD and OpenBSD run on. That and GCC internals are a maintenance nightmare, and its development process is getting even less commmunity-driven than it was before (which was never that much). Asking for a new compiler warning might take anywhere from a day to years just to get a response. The license is definitely gravy though.
The BSD license that PCC is under, I understand, is actually a problem even to the BSD folks: PCC is actually extremely old (it was originally written for the PDP11!) and apparently it still carries the advertising clause.
I believe it was de Raadt that once mentioned he'd prefer a non-optimizing compiler that produced simple, bullet-proof, bug-free code, i.e., in terms of the OS and its base tools, he prefers correct to fast.
Well that explains a lot. And here I was thinking that all modern compilers were designed correctly with a front-end and back-end. So much for academics.
Actually, the post you're replying to is total bollocks. GCC has had a clear divide between front and back end (not to mention a source-language independent middle layer for performing optimizations) since I first looked at it in about 1996. Each layer is hideously complex, but they are all there.
Auctually if I could write in C as well as him, I would do so more often. The problem is not him writing in C, its other people writing in C that are not as good as him. Do to the scope of his work, him writing in C does not lead to more bad C being written. So I'm auctually thankful he is coding in C.
That being said, he should encourage lesser programmers (including myself) to specifically not code in C.
--- Justin Dearing http://www.justaprogrammer.net/ We're just programmers.
If you read Linus' early posts about the Linux kernel you will notice that he had originally licensed it under its own license. He didn't switch to the GPL until later. It is hard to say whether or not Linux would have taken off without the GPL. It probably would have been fine using a BSD userland even if a userland wasn't quite available yet. In fact, if the kernel wasn't now so dependant on the GNU userland (binutils, glibc, etc.) it would be pretty easy to usethe BSD userland. (I can't recall if there was a BSD lisenced userland in existance in 1991)
--
(not the GPP, incidentally)
Even then, the problem is too many wetware cycles wasted on programming C "well", no matter who the programmer is -- If TdR didn't have to be so good at writing good C code (avoiding common pitfalls), he could reoptimize toward greater _creative_ productivity instead of ensured correctness. Isn't this why we use computers in the first place?*
(* to do things which require correctness and repetitive tasks, i.e. which can be programmed to be done automatically rather than manually)
Actually... the BSD license is GPLv2 or GPLv3-compatible, because it doesn't impose any restrictions beyond those included in GPLv2 or GPLv3. So BSD code can be incorporated into a GPL program (Theo de Radt's recent rants notwithstanding).
My bicyles
That's a somewhat more difficult goal, and not one that I'm really interested in attempting here. We'd have to get into references and all kinds of other stuff that really isn't worth it for a Slashdot flamewar.
The only credit that they didn't get is the specific credit they claim they deserved: The use of their name for the operating system they wrote. People argue three positions on this:
My question to you is this: Are you arguing position #2 or position #3?
Position #2 is 100% valid (but RMS gets to keep complaining that you're being a jerk). Position #3 is worthy of argument, but position #2 is not an argument in support of it.
-- The act of censorship is always worse than whatever is being censored. Always.
Far too true, however for working with flexible or hazy requirements or looking to make code that is fast, C is very hard to beat. Also, just because this is true today, doesn't make it something that will be true forever.
F... "hey, you know what? You're ALL right". I'm being facetious, and the C standard has done a great job in promoting C, but the C standard has really not evolved very far in terms of guaranteeing semantics.Once again, totally on the money. The standards group was too concerned to fix things like bitfields to make them useful, or standardize the method of determining the size of an "int". I think the standard evolved more to making the compiler writers happy than to make any real effort at fixing vague semantics that are quite prevalent and cause any number of problems.
But, if you're trying to verify code that's already been written, either by hand or via some automated tool like a static analyzer, it is painful.This is so true that it is painful to hear! I worked a couple of years with a tool to analyze code for customers. The variability in the compilers, environments, implementation details, user hacks, or compiler switches that affect code is dizzying. Has anyone else enjoyed the declaration "char myvar[0]" construct?
However, before you condemn all of the analysis tools, check out tools that perform "semantic analysis" (that what it was called when I was there) of C code. You may be pleasantly surprised. But it is still a challenge to wholly analyze any project. C++ is hideously complex, as is Ada (even the more recent revision, Ada95 I think).
Once again, no panacea of code correctness tools, but with the body of work that has been placed in C; it would be foolish to just "walk away". I've worked with a number of languages; both object oriented and procedural. Many arcane assembly languages as well. In my work, I've come to the simple conclusion that *ALL* languages have issues in many areas. Something like the old adage "all dogs have fleas"...
cheers!
No, the RMS position is not that, since Linux is not the OS developed by the GNU project. It is an OS that's common feature is the Linux kernel; as usually distributed, it includes various tools from the GNU project. The RMS position is, roughly, "The GNU System is an OS developed by the GNU Project, and therefore every project that incorporates any components from that system is also morally obligated to include 'GNU' in the name of there product, even though they have express written permission to distribute the product without doing so."
But even that's not exactly right, because this argument is only applied to Linux, not to other products that incorporate GNU tools.
The GNU project's own OS doesn't have a general release yet, though its been coming Real Soon Now for my entire adult life.
Actually, that is, properly speaking, "the GPL position", since the GPL does not have any provision requiring adhering to upstream naming conventions or request. If the GNU project had wanted to make that a condition of distribution, they ought to have incorporated it into the license under which they authorized people to redistribute their code and create derivative works.
Complaining about people doing something you've given them express, written legal permission to do is somewhat childish, especially when you've (since you began complaining) revised the legal terms offered more than once without addressing the thing you keep complaining about.
The GNU project may or may not have written an operating system, but Linux isn't it. An operating system is more than a kernel, but a kernel is an indispensable portion of the operating system. Using some OS components GNU developed and some other OS components doesn't make the result the same OS GNU developed. Its something else. Distributors are free to call it things that highlight the kernel by including its name ("Red Hat Enterprise Linux"), things that evoke the name of the kernel without actually using it exactly ("Lindows"), or things that don't mention the kernel at all ("Knoppix"). They could do the same thing with the GNU toolchain. The fact that none of the distributors wan't to label the OS in a way that highlights the GNU toolchain may be disappointing to the members of the GNU project, but since they've expressly (and repeatedly, given the revisions to the GPL) given distributors permission to do exactly what they are doing, they don't really have any leg to stand on in complaining about it.
My point was more that if you might just be better off in the long run by starting from scratch, instead of taking on the maintenance nightmare that is gcc.
Unfortunately, Microsoft made a decision not to use the boundary protection in their new operating system called "Windows". They ignored most of the work that Intel did providing support in the silicon for a decent micro operating system. The boundary protection could have been built into the programming language and runtime. Things would have been much better in the long term.
16 bit Windows did use it, but just not for protection.
Originally 8086s had a simple segmentation mode without protection. Each address was built up of seg<<16|offset. Since both segment and offset were 16 bit, this limited the address space to 1MB. Famously, the designers of the IBM PC reservered the upper 384K for IO, and this is where the 640K limit came from.
Later on the 80286, protected mode was supported, where the value you loaded into a segment register was a selector, and index into a table of segments. The CPU supported different privilege levels called Rings with only the highly privileged ones allowed to create entries in this table.
16 bit Windows did used 286 protected mode - it had to to get access to memory above 1MB. When it loaded it would install a DOS extender which would switch the PC into protected mode and allow 16 bit protected mode tasks to run on top of DOS. Since the 286 didn't allow you to switch back, each call into DOS used a triple fault to reset the processor and some Bios code to jump back into Windows.
It was even possible to allocate buffers bigger than 64K. In that case, windows would set up an array of selectors for each 64K chunk. If C code wanted to access an arbitrary location in the buffer, the compiler would work out which 64K chunk it was in, load a segment register with the corrrect selector (this was a very slow operation since microcode in the 286 had to check permissions), calculate the offset and load it into one of the normal registers and then do the segment read. This was an incredibly slow process.
It's also worth pointing out that 16 bit Windows used protected mode to get access to more memory, the like Dos the OS didn't protect itself from being damaged by third party applications. And it didn't stop third party applications damaging each other. As Walter Oney put it the philosphy was that it's a personal computer after all - If you're a programmer you can do what you like to it, just like you're free to run your car without oil until it seizes up.
Once the 386 came out and allowed offsets into segments to be bigger than 64K Windows would even set the limit on the first selector to allow you to access the whole buffer with operand size overides. The 386 also supported Virtual 8086 mode, so protected mode could jump into DOS and segments would work like a 8086. But loading segment registers was still a very slow operation, and Windows NT and Linux which are both designed to stop applications corrupting each other or the system both used page tables to do it instead. But page tables don't protect against buffer overruns.
Mind you segmentation only protects against buffer overruns if you malloc the buffers. Automatic variables on the stack are not protected. And allocing a stack variable is just a subtract instruction - it is orders of magnitude faster than calling into the OS, switching to Ring 0, allocating the memory, filling in the segment table and the returning to the caller who would load the selector into a segment register. Worse, there are very few segment registers, an each time you access any buffer you need to reload one. On a 286 there is CS,DS,ES.CS and DS are needed for near code and data, so only ES is free for far pointers. The 386 has FS and GS too, but three registers with very slow loads is not a recipe for a speedy machine.
So Microsoft tried it and it was slow. If they'd have used it enough to avoid buffer overruns - i.e. malloc every buffer rather than allocating some on the stack it would have been really slow. And so all modern OSs rely on the page table for protection instead of segments so they can run on multiple processors. In x64 mode, segment limits aren't even checked by hardware anymore.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;