Any "Pretty" Code Out There?
andhow writes "Practically any time I hear a large software system discussed I hear "X is a #%@!in mess," or "Y is unmanageable and really should be rewritten." Some of this I know is just fresh programmers seeing their first big hunk o' code and having the natural reaction. In other cases I've heard it from main developers, so I'll take their word for it. Over time, it paints a bleak picture, and I'd be really like to know of a counterexample. Getting to know a piece of software well enough to ascertain its quality takes a long time, so I submit to the experience of the readership: what projects have you worked on which you felt had admirable code, both high-level architecture and in-the-trenches implementation? In particular I am interested in large user applications using modern C++ libraries and techniques like exception handling and RAII."
It is my experience that reading and understanding code is dramatically more difficult than writing code. It gets even more difficult if it isn't your own code. Commenting, design, layuot, good structure, documentation all reduce this fact but never remove it. I've seen plently of good programmers declare code "ugly" because it had a few warts but in reality they just couldn't understand it.
I find code can be exceptionally well presented but only if you look on a file by file basis.
Most projects have nice clean stable blocks which to look at you just know its right.
Other parts resemble a jungle and have no logical flow and are horrid.
Whenever I am building an algorithm, it goes through the numerous rebuilds, after initially getting it working each one has more and more order until it looks like it will win a race.
If the boss comes in and sees working code though, they don't understand this prettyness and will expect it to be shipped.
liqbase
IMHO, postfix takes the cake for the most elegant and readable code I've ever looked at. At one point I found an screenshot of qmail vs. postfix code in similar areas for handling some condition. The qmail code was hardcoded, had nasty loops and was just plain unbearable. The postfix version, however, was exceedingly elegant and I knew right away what the code was doing.
I only wish firefox was 10% as elegant and cruft free as postfix.
The source for Tcl is widely considered by those who have worked with it to be unusually clean and clear.
Amarok looks quite horrible by compairson with what its UI is built on. Though they have their gnarly parts, on the whole I am always impressed with the KDE libraries.r
I am trolling
the independent jpeg group's libjpeg is pretty well written in terms of style and design
Can't say from personal experience, but I hear that the TeX source is a truly enlightning experience. Knuth is all for literate programming, you see.
I think you meant the cruftiness of source code is direcly proportional to the number of people working on it DIVIDED BY the amount of time spent working on it.
This explains why commercial source code produced by large teams of programmers under tight arbitrary deadlines tends to be sloppy. Source code produced by passionate hobbyists under the "we'll release it when it's done" deadline perspective tends to be cleaner.
My first large project I ever attempted (HERMES, now abandoned, http://hermesweb.sourceforge.net/ had, I believe, reasonably pretty code. Architecturally, there were some pretty parts too. But overall, the architecture was a mess simply because I didn't know better. I eventually abandoned it because I realized it was going to be impossible to fix the initial design mistakes without entirely replacing a large percentage of the code.
My current large project is LedgerSMB. This deals with an entirely different magnitude of mess. Essentially we forked from a codebase which we have come to understand is nearly unmaintailable and yet we *have* to replace all the code because we have lots of users on the software who rely on it. Hence we are refactoring with an axe.
The older codebase (SQL-Ledger/LedgerSMB 1.0/LedgerSMB 1.2) has a number of architectural limitations and issues, as well as a lot of evidence of an overall lack of architecture. If that weren't enough, the code is pretty problematic too. It could be worse (at least the codebase is reasonably readible if you put enough effort into it).
I think it hits about 75% of the software programming antipatterns mentioned on Wikipedia, and extends some of them in weird ways. For example instead of just magic strings, we have magic comments (comments which are actually part of the program code and which deletion causes problems). And we have function calls which pass by "reference-to-deferenced-reference." In perl terms \%$ref.
Hence we are moving everything to a new and *cleaner* architecture.
LedgerSMB: Open source Accounting/ERP
Much of the code in the kernel could win IOCCC hands down.
Which parts?
At least you admit to being uninformed.
I haven't looked either, but I happen to know that BOOT::Python often does NOT work. It has thread-related problems.
At for the rest of BOOST, I've looked at a good chunk. BOOST makes decent programmers cry. The other follow-up post by the Anonymous Coward Xbox developer has it all correct.
I'll add:
BOOST is full of butt-ugly hacks. Check out the, uh, template things, named _0 through _9 being used as stand-in dummy arguments. Eeeeeew!!!
BOOST looks easy to dumb-ass programmers, but these programmers leave bugs that are difficult for expert programmers to find.
BOOST makes compilers run very very slow, and often breaks the optimizer anyway.
I wrote a Perl filter that took C code as input, and applied all kinds of "unprettifications" to it (removing comments, collapsing variable declarations, introducing random curly-brace and indentation styles, removing whitespace or adding strange whitespace). The output looked like it had been written at 3am by a hung-over ex-FORTRAN engineer who had just discovered FORTH.
Then I demonstrated that a bunch of code checked into our system looked like it had *already* been run through this tool. After the public shaming, a couple of the offenders cleaned up their acts for a while, but they're back to their old tricks.
These days I'm working on a project where all the devs are really, really serious about the formatting and naming conventions. Some of the rules suck, in my opinion, but there's a lot to be said for consistency.
[In the 80s, HyperCard team at Apple used to regularly run their sources through a Pascal formatter. The code, in a friend's words, "looked ironed." Unfortunately I haven't run across any good C++ formatters.]
Any sufficiently advanced technology is insufficiently documented.
Yes, I understand that. I just think it is somewhat ironic that the implementation of TeX is much prettier than the language itself.
cruftness = people * time is a reasonable approximation, I can confirm the same kind of stories. Even if the operating system and the projects software would be trustworthy, the consultants would probably %$^& the customers database too. Hurray for corporate politics.
The Bourne Shell must get some kind of mention here. What do you do if you prefer ALGOL to C? Why, #define your own syntax, and thus turn boring old C code into a thing of beauty.
Repton.
They say that only an experienced wizard can do the tengu shuffle.
Wow! That's impressive- I feel guilty now for carving so many minutes out of someone's life. Although if there were javadocs, I'd imagine that most of these disparaging comments would be within the HelloWorld class itself. The library code javadocs should always have lofty descriptions of themselves as if they're going to do brain surgery. Especially if they have empty implementations.
If I wrote this code in 2007 I would have used "setPayload()" instead of "configure()" so that MessageBody would follow standard JavaBean conventions. That would let me easily wire one up in a Spring XML file. Maybe I could even insert AOP pointcuts somewhere. After all Hello World is the sort of application that practically screams for aspect oriented programming.
Many pieces of old code aren't pretty for a fairly defined set of reasons:
1. a) Debugging Ensure you actually have an appropriate way of debugging the code. The systems I work with are embedded and run 7x24. People will say: it failed last week on Wednesday at 3:00 A.M., we got it working, but can you fix the problem? The problem may not actually be your code, it could be another piece of equipment. In any case, you need to figure this out from the logs. In my experience, many "pretty" programs are too small to justify extensive logging. After logging is included, the programs become less "pretty" but much more maintainable.
1. b) Refactoring after Debug Sometimes the results of the debug will show a major design error in the program. You now need to implement a major architectural change that really was not originally intended. You have good modular code when it can withstand these major design changes in a relatively smooth manner.
2. Failure to handle common areas of problems well These include:
2. a) Strings Does your program have the ability to smoothly handle unicode/UTF/HTML/locale specific strings? Every different language you port your application too, and every different program you talk with, will all have differing definitions of what is a string. My favorite test case is CNC (Computer Numerically Controlled) machinery. Some CNC machines expect embedded nulls inside the strings. The embedded null requirement affects a surprisingly large number of string libraries.
2. b) MessageBox() Invariably in a big program it will be unacceptable to allow it to hang on a modal dialog box like MessageBox(). How are you going to handle it? What if a library call executes a modal dialog box?
2. c) Handling Exceptions For a simple prototype program, handling exceptions is not a big deal. In a production application, all the exceptions must be handled appropriately and the program must be able to continue when exceptions occur. The error handling code often exceeds the size of the original program.
2. d) Third Party Libraries / Operating Systems (Windows) The amount of code devoted to covering up mistakes in other code is amazing. Unfortunately, unless coding on an open platform, one must accept the costs of the additional code. When starting a new project, I recommend thoroughly stress testing any new libraries that will be used. Thus one can find the killer bugs that significantly affect design decisions.
I would appreciate any feedback/additions to the items on this list.
...On another note, I'm willing to bet that the person asking
I probably spend an equal chunk of time looking at code as I do writing it (then again, being an intern consultant/admin I'm always looking for a reason to write code and can never justify scratching an itch someone else, who is smarter than I, has already scratched sufficiently), I think I once spent a good chunk of time, that I should have been studying for my data structures final, reading the 2.6 kernel - and I probably take a peak at samba on every 2 weeks or so... and I'm a software development major.
This wasn't made to sound like an attack, although it probably does; I'm really curious what prompted you to say what you did, and if you know something I don't (which is currently going off at 50/50 odds - the things I don't know could fill volumes, and the things I do, a small pamphlet, with large text).
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
that he thought Bill Atkinson's MacPaint was the most beautiful program ever written. Hearing this, Andy Hertzfeld made it a priority to recover the source code from an old Macintosh diskette. He contacted me because he was a bit worried about Apple's reaction if he just released it on the net (since it was Apple property), and I advised him to get the Computer History Museum involved if he didn't want to take the risk. I believe that he donated the code, but I'm not sure what the Museum did to have it made available.
Tim O'Reilly @ O'Reilly Media, Inc. 1005 Gravenstein Highway North, Sebastopol, CA 95472 http://www.oreilly.com
So this post is perfectly timed. It's a collection of essays by leading software engineers about code they find especially beautiful.
h tml
Andy Oram, the editor, thought it would be poor form to make a post himself, but heck, I thought: this is very relevant. The table of contents for the book can be found at http://www.oreilly.com/catalog/9780596510046/toc.
It includes essays by Brian Kernighan, Jon Bentley, Tim Bray, Yukohiro Matsumoto, Simon Peyton-Jones, and many others. The code is intended not only to be beautiful but also instructive and in many cases re-usable.
We're hoping to build an ongoing site around the book so additional examples would be very welcome.
Tim O'Reilly @ O'Reilly Media, Inc. 1005 Gravenstein Highway North, Sebastopol, CA 95472 http://www.oreilly.com
If you are about to mod me down, keep in mind that this post was most likely sarcastic.
As my page (to which you link) notes, these bugs are likely exploitable only in theory.
And I've been hired (and paid well) to modify qmail code, including patching it to fix bugs as well as extending it, for years now, but nobody has even inquired as to what it'd take to fix the "Guninski" bugs that might theoretically be exploitable — at least, not so far.
I think that's a pretty sure indication that the qmail user base does not consider those bugs to be sufficiently worrisome to fix. (I did publish a simple fix to one of the first bugs Guninski found; that fix was incorporated into netqmail. But I did that gratis.)
I don't know offhand whether DJB has ever acknowledged any bugs in qmail. But, just as code doesn't lie while comments can, code that is reasonably well-specified, as qmail's components' interfaces are, cannot pretend bugs don't exist in it, even if authors or fanboys do, just as it can't pretend it has bugs even when claimed otherwise[*]. So I don't particularly miss djb's opinions and pronouncements on such issues, since I can read the code and decide for myself.
[*] There's a web page out there that claims "qmail-smtpd does not detect CR LF properly on packet boundaries", which strikes me as complete and utter — as well as easily demonstrable, by simply looking at the code — nonsense. Not that it can't happen, but it'd almost certainly be due to an OS, networking, or (non-qmail) library bug. Tellingly, despite the high likelihood such a bug would result in huge numbers of legitimate emails being rejected by many qmail servers worldwide, there's no information on this alleged bug beyond somebody supposedly reporting it. That's only marginally more persuasive than saying "qmail-smtpd dropped every third email on every server running it on March 17, 2001, between 11:45 and 12:15 UTC, according to a guy I overheard in a bar the other day." Color me unimpressed.
Practice random senselessness and act kind of beautiful.
Are there any operating systems out there that use random numbering of PIDs? Windows and Linux both number them sequentially and I would not expect it to happen otherwise.
/dev/urandom (on UNIX) instead. For high-grade random numbers, use /dev/random and note it may take a while to build the entropy.
PIDs are not random for any reasonable value of random. For low-grade random numbers use something like
LedgerSMB: Open source Accounting/ERP
It's just occurred to me you are Tim O'Reilly. Wow, there are still some important folks that still post on
I've been writing C++ code for about a decade. I consider myself competent with almost every weird nook in C++ - I have extensive template metaprogramming in some projects, I've used and abused multiple virtual inheritance, and about the only thing I avoid are exceptions because I feel they're a non-solution.
And I think you're dead right. C++ is a hideously complex bitch of a language. Anyone trying to use all the C++ features will quickly drive themselves insane. I rarely use inheritance, I rarely make my own templates, I never do operator overloading unless it's absolutely clear what the operators mean (number classes, geometry classes, and string classes, basically.) In many ways, my code looks like C code, albeit C code with obsessive typesafety and extensive use of the STL.
I've programmed in other languages quite a bit. I honestly feel C++ is the single best language out there. But it isn't for anyone, and it's certainly not for people who can't sit down and say "okay, we need to make this damn program simple."
Breaking Into the Industry - A development log about starting a game studio.
I'm not 100% sure if cruft is a layman's term for Design Debt, or if Design Debt is just one type of cruft, but they're definitely related.
We apologize for the preceding message. All those responsible have been sacked.
Guninski's code works for any default qmail installation figuring the right arch and memory. Depending on your org, most currently shipping machines are capable of fostering said environment. An AMD64 with 8GB+ of ram is not uncommon. I have 4 of them and work for a small company. My previous company had several dozen (Sun 4100s).
These exploits can be performed by any user who owns such hardware, and can read. They are not theoretical. Many bugs have existed in linux kernels that only manifest themselves under extreme circumstances, and I don't see Linus or anyone else of respectable programmer status that attempts such dismissals with a handwave. Maybe it's because pride doesn't get in their way?
Peter Norvig, now CTO of Google, agrees with you. Coding, like writing, is best improved by an alternating diet of writing and reading good works. He collected a few of the best he'd found in a book called Paradigms of Artificial Intelligence Programming, available from his web site or from Amazon: http://norvig.com/paip.html
It talks about AI because it was the 80s (92 by the time it hit shelves) and AI was cool---but the applications involved are now just what we call computing. It's not perfect: fifteen years have passed since it was written. In that time, C++'s STL and Boost have caught up with many features of Common Lisp. Java's come along and done well. Other interactive dynamic languages than Lisp exist: Python, for example. So you'll have to do some translating in your head---but for the same reason that Cicero is read by students of English rhetoric, Norvig should be read by C++ and Java programmers seeking mastery.
-- Brian T. Sniffen
http://www.smk.co.za/
I know practically nothing about OS programming and my C / C++ has been rusting since University, but this guy writes code that even I can follow.
I haven't, honestly. I'm looking at the Wikipedia article and it looks like it's got a lot of stuff that I'm not so interested in, like more reliance on runtime tests and lack of templates.
While I don't write my own templates, I do use them extensively in the STL, and I'd really miss the typesafety of having them. It looks like Objective C is not nearly as obsessive about typesafety as I am.
Breaking Into the Industry - A development log about starting a game studio.
You know, you could just code the thing in Java and eliminate this issue outright, as well as all possibility of buffer overruns... C is the worst possible language for Internet-facing servers.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Take Gled (http://www.gled.org/ - a recent CVS snapshot is preferable), a distributed C++ application builder with OpenGL/OpenAL/FLTK interfaces, object persistence and excellent extensibility.
It certainly is not pretty the first time you look at it, that is probably true for any unique project, but if you look harder, you will see a strange tangle using ROOT, CINT the C++ interpreter, built-in C++ object dictionaries, elegant and fast network stack for object streaming and synchronization, and strangely effective remote procedure call interface. But my favourite is the auto-building FLTK gui.
While remotely involved, I do enjoy this code immensely.
Try building a new library for it and enjoy GUI-enabled objects in minutes... (There is even a scratch for a TA-like game in one of the demos, not yet playable.)
-Kvorg
Please correct me if I got my facts wrong.