Any "Pretty" Code Out There?
andhow writes "Practically any time I hear a large software system discussed I hear "X is a #%@!in mess," or "Y is unmanageable and really should be rewritten." Some of this I know is just fresh programmers seeing their first big hunk o' code and having the natural reaction. In other cases I've heard it from main developers, so I'll take their word for it. Over time, it paints a bleak picture, and I'd be really like to know of a counterexample. Getting to know a piece of software well enough to ascertain its quality takes a long time, so I submit to the experience of the readership: what projects have you worked on which you felt had admirable code, both high-level architecture and in-the-trenches implementation? In particular I am interested in large user applications using modern C++ libraries and techniques like exception handling and RAII."
It is my experience that reading and understanding code is dramatically more difficult than writing code. It gets even more difficult if it isn't your own code. Commenting, design, layuot, good structure, documentation all reduce this fact but never remove it. I've seen plently of good programmers declare code "ugly" because it had a few warts but in reality they just couldn't understand it.
I find code can be exceptionally well presented but only if you look on a file by file basis.
Most projects have nice clean stable blocks which to look at you just know its right.
Other parts resemble a jungle and have no logical flow and are horrid.
Whenever I am building an algorithm, it goes through the numerous rebuilds, after initially getting it working each one has more and more order until it looks like it will win a race.
If the boss comes in and sees working code though, they don't understand this prettyness and will expect it to be shipped.
liqbase
IMHO, postfix takes the cake for the most elegant and readable code I've ever looked at. At one point I found an screenshot of qmail vs. postfix code in similar areas for handling some condition. The qmail code was hardcoded, had nasty loops and was just plain unbearable. The postfix version, however, was exceedingly elegant and I knew right away what the code was doing.
I only wish firefox was 10% as elegant and cruft free as postfix.
The source for Tcl is widely considered by those who have worked with it to be unusually clean and clear.
Amarok looks quite horrible by compairson with what its UI is built on. Though they have their gnarly parts, on the whole I am always impressed with the KDE libraries.r
I am trolling
the independent jpeg group's libjpeg is pretty well written in terms of style and design
Can't say from personal experience, but I hear that the TeX source is a truly enlightning experience. Knuth is all for literate programming, you see.
I think you meant the cruftiness of source code is direcly proportional to the number of people working on it DIVIDED BY the amount of time spent working on it.
This explains why commercial source code produced by large teams of programmers under tight arbitrary deadlines tends to be sloppy. Source code produced by passionate hobbyists under the "we'll release it when it's done" deadline perspective tends to be cleaner.
My first large project I ever attempted (HERMES, now abandoned, http://hermesweb.sourceforge.net/ had, I believe, reasonably pretty code. Architecturally, there were some pretty parts too. But overall, the architecture was a mess simply because I didn't know better. I eventually abandoned it because I realized it was going to be impossible to fix the initial design mistakes without entirely replacing a large percentage of the code.
My current large project is LedgerSMB. This deals with an entirely different magnitude of mess. Essentially we forked from a codebase which we have come to understand is nearly unmaintailable and yet we *have* to replace all the code because we have lots of users on the software who rely on it. Hence we are refactoring with an axe.
The older codebase (SQL-Ledger/LedgerSMB 1.0/LedgerSMB 1.2) has a number of architectural limitations and issues, as well as a lot of evidence of an overall lack of architecture. If that weren't enough, the code is pretty problematic too. It could be worse (at least the codebase is reasonably readible if you put enough effort into it).
I think it hits about 75% of the software programming antipatterns mentioned on Wikipedia, and extends some of them in weird ways. For example instead of just magic strings, we have magic comments (comments which are actually part of the program code and which deletion causes problems). And we have function calls which pass by "reference-to-deferenced-reference." In perl terms \%$ref.
Hence we are moving everything to a new and *cleaner* architecture.
LedgerSMB: Open source Accounting/ERP
Much of the code in the kernel could win IOCCC hands down.
Which parts?
At least you admit to being uninformed.
I haven't looked either, but I happen to know that BOOT::Python often does NOT work. It has thread-related problems.
At for the rest of BOOST, I've looked at a good chunk. BOOST makes decent programmers cry. The other follow-up post by the Anonymous Coward Xbox developer has it all correct.
I'll add:
BOOST is full of butt-ugly hacks. Check out the, uh, template things, named _0 through _9 being used as stand-in dummy arguments. Eeeeeew!!!
BOOST looks easy to dumb-ass programmers, but these programmers leave bugs that are difficult for expert programmers to find.
BOOST makes compilers run very very slow, and often breaks the optimizer anyway.
I wrote a Perl filter that took C code as input, and applied all kinds of "unprettifications" to it (removing comments, collapsing variable declarations, introducing random curly-brace and indentation styles, removing whitespace or adding strange whitespace). The output looked like it had been written at 3am by a hung-over ex-FORTRAN engineer who had just discovered FORTH.
Then I demonstrated that a bunch of code checked into our system looked like it had *already* been run through this tool. After the public shaming, a couple of the offenders cleaned up their acts for a while, but they're back to their old tricks.
These days I'm working on a project where all the devs are really, really serious about the formatting and naming conventions. Some of the rules suck, in my opinion, but there's a lot to be said for consistency.
[In the 80s, HyperCard team at Apple used to regularly run their sources through a Pascal formatter. The code, in a friend's words, "looked ironed." Unfortunately I haven't run across any good C++ formatters.]
Any sufficiently advanced technology is insufficiently documented.
Yes, I understand that. I just think it is somewhat ironic that the implementation of TeX is much prettier than the language itself.
cruftness = people * time is a reasonable approximation, I can confirm the same kind of stories. Even if the operating system and the projects software would be trustworthy, the consultants would probably %$^& the customers database too. Hurray for corporate politics.
REM grab http://download.microsoft.com/download/win95upg/to ol_s/1.0/W95/EN-US/olddos.exe
REM for the qbasic.exe
SCREEN 13
WINDOW (-2, -2)-(2, 2)
FOR x = -2 TO 2 STEP 4 / 320
FOR y = -2 TO 2 STEP 4 / 200
u = 0
v = 0
FOR i = 0 TO 256
REM (u+vi)^2=u*u-v*v +2uvi
ut = u * u - v * v + x
vt = 2 * u * v + y
u = ut
v = vt
c = i
IF (u * u + v * v) > 4 THEN i = 256
NEXT i
PSET (x, y), c
NEXT y
NEXT x
First, compare the way each author handled security disclosures.
Found nothing shameful? Did you see this about Postfix? http://cr.yp.to/maildisasters/postfix.html
Next, compare the number of false public statements made by each author.
Found nothing? Did you see this about Postfix's author? http://cr.yp.to/qmail/venema.html
And finally, compare the number of security flaws and their severity in Postfix compared to qmail.
Ultimately, do you support or condone the behavior documented at http://cr.yp.to/maildisasters/postfix.html
-- begin quote http://cr.yp.to/maildisasters/postfix.html --
IBM released Postfix with massive hype in mid-December 1998. ``IBM software to shield email from hackers,''
blared the Reuters headline. ``This will make IBM's and everyone's Internet activities more secure,'' IBM's network security research manager said in a prepared statement.
A few days later I glanced at the Postfix security documentation. ``No Postfix program is set-uid,'' the Postfix author wrote. ``Introducing the concept was the biggest mistake made in UNIX history. Set-uid (and its weaker cousin, set-gid)
causes more trouble than it is worth.''
This set off alarm bells in my head. ``Does postfix really use a world-writable directory
for people to drop off mail?'' I wrote in a 19981217 email message to another security expert.
``Is there anything that stops a user from making a hard link to another user's message, preventing postfix from delivering the message?''
In fact, when Postfix saw an extra hard link, not only would it fail to deliver the message, but it would actively remove the hard link, Any local attacker could trivially exploit this to anonymously destroy other users' incoming or outgoing messages. There was no way for the system administrator to find the culprit, and no way to recover the messages.
The Postfix author's reaction to my first public comments was outright denial. ... Bogus. ... Bogus. ... Bogus.'' He continued by giving an example of how an incompetent attacker might fail to destroy a file.
``Bernstein is wrong on all points,'' he said in a public statement in response to a summary of the problems. ``Bogus.
Several people pointed out his mistake, but he continued to deny the problem. ``In my opinion, no-one has brought forward a vulnerability worth mentioning,'' he said in a bugtraq message titled ``Claimed Postfix Vulnerabilities.''
I sent a detailed description of the vulnerability to bugtraq.
/var/spool/postfix/maildrop, and /etc/postfix/postfix-script, replacing 1777 by 1733.
The Postfix author finally admitted that the attack would destroy mail. However, he didn't post a security alert on the Postfix web pages. Instead he added a brief note to the middle of the ``Postfix Errata'' page:
A local user can hard link a maildrop queue file to another
directory within the same file system, causing the mail to not be
delivered. Workaround: chmod 1733
edit
When I saw this, I posted a note to comp.security.unix, explaining that this ``workaround'' simply didn't work. Any user could still anonymously destroy messages.
The Postfix author followed up, using the subject line of ``DAN BERNSTEIN'S CLAIM'' without admitting that my claims were correct, summarizing the problems as ``local users [can] play games with hard links'' without mentioning that these games
The Bourne Shell must get some kind of mention here. What do you do if you prefer ALGOL to C? Why, #define your own syntax, and thus turn boring old C code into a thing of beauty.
Repton.
They say that only an experienced wizard can do the tengu shuffle.
Wow! That's impressive- I feel guilty now for carving so many minutes out of someone's life. Although if there were javadocs, I'd imagine that most of these disparaging comments would be within the HelloWorld class itself. The library code javadocs should always have lofty descriptions of themselves as if they're going to do brain surgery. Especially if they have empty implementations.
If I wrote this code in 2007 I would have used "setPayload()" instead of "configure()" so that MessageBody would follow standard JavaBean conventions. That would let me easily wire one up in a Spring XML file. Maybe I could even insert AOP pointcuts somewhere. After all Hello World is the sort of application that practically screams for aspect oriented programming.
Many pieces of old code aren't pretty for a fairly defined set of reasons:
1. a) Debugging Ensure you actually have an appropriate way of debugging the code. The systems I work with are embedded and run 7x24. People will say: it failed last week on Wednesday at 3:00 A.M., we got it working, but can you fix the problem? The problem may not actually be your code, it could be another piece of equipment. In any case, you need to figure this out from the logs. In my experience, many "pretty" programs are too small to justify extensive logging. After logging is included, the programs become less "pretty" but much more maintainable.
1. b) Refactoring after Debug Sometimes the results of the debug will show a major design error in the program. You now need to implement a major architectural change that really was not originally intended. You have good modular code when it can withstand these major design changes in a relatively smooth manner.
2. Failure to handle common areas of problems well These include:
2. a) Strings Does your program have the ability to smoothly handle unicode/UTF/HTML/locale specific strings? Every different language you port your application too, and every different program you talk with, will all have differing definitions of what is a string. My favorite test case is CNC (Computer Numerically Controlled) machinery. Some CNC machines expect embedded nulls inside the strings. The embedded null requirement affects a surprisingly large number of string libraries.
2. b) MessageBox() Invariably in a big program it will be unacceptable to allow it to hang on a modal dialog box like MessageBox(). How are you going to handle it? What if a library call executes a modal dialog box?
2. c) Handling Exceptions For a simple prototype program, handling exceptions is not a big deal. In a production application, all the exceptions must be handled appropriately and the program must be able to continue when exceptions occur. The error handling code often exceeds the size of the original program.
2. d) Third Party Libraries / Operating Systems (Windows) The amount of code devoted to covering up mistakes in other code is amazing. Unfortunately, unless coding on an open platform, one must accept the costs of the additional code. When starting a new project, I recommend thoroughly stress testing any new libraries that will be used. Thus one can find the killer bugs that significantly affect design decisions.
I would appreciate any feedback/additions to the items on this list.
...On another note, I'm willing to bet that the person asking
I probably spend an equal chunk of time looking at code as I do writing it (then again, being an intern consultant/admin I'm always looking for a reason to write code and can never justify scratching an itch someone else, who is smarter than I, has already scratched sufficiently), I think I once spent a good chunk of time, that I should have been studying for my data structures final, reading the 2.6 kernel - and I probably take a peak at samba on every 2 weeks or so... and I'm a software development major.
This wasn't made to sound like an attack, although it probably does; I'm really curious what prompted you to say what you did, and if you know something I don't (which is currently going off at 50/50 odds - the things I don't know could fill volumes, and the things I do, a small pamphlet, with large text).
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
that he thought Bill Atkinson's MacPaint was the most beautiful program ever written. Hearing this, Andy Hertzfeld made it a priority to recover the source code from an old Macintosh diskette. He contacted me because he was a bit worried about Apple's reaction if he just released it on the net (since it was Apple property), and I advised him to get the Computer History Museum involved if he didn't want to take the risk. I believe that he donated the code, but I'm not sure what the Museum did to have it made available.
Tim O'Reilly @ O'Reilly Media, Inc. 1005 Gravenstein Highway North, Sebastopol, CA 95472 http://www.oreilly.com
So this post is perfectly timed. It's a collection of essays by leading software engineers about code they find especially beautiful.
h tml
Andy Oram, the editor, thought it would be poor form to make a post himself, but heck, I thought: this is very relevant. The table of contents for the book can be found at http://www.oreilly.com/catalog/9780596510046/toc.
It includes essays by Brian Kernighan, Jon Bentley, Tim Bray, Yukohiro Matsumoto, Simon Peyton-Jones, and many others. The code is intended not only to be beautiful but also instructive and in many cases re-usable.
We're hoping to build an ongoing site around the book so additional examples would be very welcome.
Tim O'Reilly @ O'Reilly Media, Inc. 1005 Gravenstein Highway North, Sebastopol, CA 95472 http://www.oreilly.com
If you are about to mod me down, keep in mind that this post was most likely sarcastic.
As my page (to which you link) notes, these bugs are likely exploitable only in theory.
And I've been hired (and paid well) to modify qmail code, including patching it to fix bugs as well as extending it, for years now, but nobody has even inquired as to what it'd take to fix the "Guninski" bugs that might theoretically be exploitable — at least, not so far.
I think that's a pretty sure indication that the qmail user base does not consider those bugs to be sufficiently worrisome to fix. (I did publish a simple fix to one of the first bugs Guninski found; that fix was incorporated into netqmail. But I did that gratis.)
I don't know offhand whether DJB has ever acknowledged any bugs in qmail. But, just as code doesn't lie while comments can, code that is reasonably well-specified, as qmail's components' interfaces are, cannot pretend bugs don't exist in it, even if authors or fanboys do, just as it can't pretend it has bugs even when claimed otherwise[*]. So I don't particularly miss djb's opinions and pronouncements on such issues, since I can read the code and decide for myself.
[*] There's a web page out there that claims "qmail-smtpd does not detect CR LF properly on packet boundaries", which strikes me as complete and utter — as well as easily demonstrable, by simply looking at the code — nonsense. Not that it can't happen, but it'd almost certainly be due to an OS, networking, or (non-qmail) library bug. Tellingly, despite the high likelihood such a bug would result in huge numbers of legitimate emails being rejected by many qmail servers worldwide, there's no information on this alleged bug beyond somebody supposedly reporting it. That's only marginally more persuasive than saying "qmail-smtpd dropped every third email on every server running it on March 17, 2001, between 11:45 and 12:15 UTC, according to a guy I overheard in a bar the other day." Color me unimpressed.
Practice random senselessness and act kind of beautiful.
Are there any operating systems out there that use random numbering of PIDs? Windows and Linux both number them sequentially and I would not expect it to happen otherwise.
/dev/urandom (on UNIX) instead. For high-grade random numbers, use /dev/random and note it may take a while to build the entropy.
PIDs are not random for any reasonable value of random. For low-grade random numbers use something like
LedgerSMB: Open source Accounting/ERP
It's just occurred to me you are Tim O'Reilly. Wow, there are still some important folks that still post on
Now I wish I knew postfix with that recommendation.
Well, here goes...anecdotes from various years & code...bear with me...up front...the number of lines isn't indicative of the complexity if it's done correctly, including "vertical coding".
I crafted a nice little VB 6 system which had about six or seven discrete processes to deal with an inbox...and finally to a GDG on a mainframe. Notification of an email message, analyze for format, take the enclosed encryption key and deal with the key ring, updated keys & other garbage, yadda yadda, then put it into the mainframe's queue.
I ran into someone about two years later and they said it was running without a hitch. They thought about adding some extra, major features but were afraid they'd break it -- despite how everything was stitched together (separate processes for each step - pick up what was in the incoming queue, process it, and send it to the next process's queue. A template for each step - all I to insert was the brain cells of functionality. If something failed, things would at least go as far as possible and wait until things were running again. Simple, fast, reliable.
I've got my first (and only) PL/I program with 5k lines with lots of documented procedure calls. It was supposed to convert DG's CEO word processing documents to IBM's PROFS. (CEO's internal format was a sealed box. You couldn't even buy the formats, let alone get them for free. This is even moreso when you're trying to push their box out the door. It processed more than 100'000 documents and had no problems when I turned it over. They'd taken bids from outsourced groups and the best they could get was 5-8 people, 5-8 months minimum, and a $50'000 retainer. I wrote it in less than two months and didn't have to offload my usual tasks or work overtime.
I wrote an ASP [1] project which was almost 5'000 lines because my code is so vertical for readability. As a startup, they wanted to ensure everything would scale when things hit the real world. Someone challenged the necessity of the length until I let them have a crack at it. Defeat. It was to process the transfer of CD and DVD information between an (external) browser (like a Mac or PC) and a collection of CDs and DVDs (see? no need for apostrophes: ala DVD's) on various devices. Some of them were a few thousand discs in testing. No one could figure out how it worked so well. I used what I call active style instead of the usual passive pachinko machine's drop-thru style. (personally, I think those who code that way should be taken out back with a switch and find out if whipping their little p**p** motivates them a bit.)
For a bit, I had to clean up these messes at client sites and thought I'd seen it all. not. I went in to deal with an ASP/VBscript mess which took some bozo 4-6 months to write and had redundant/cascading #include files. Imagine an #include file which has lots of code and another #include references that file. It was a major clusterf***. (if you don't know what that means, wait until you've got a little more time in the saddle. The way it was written meant 20'000 unique lines of code for every trip to the server(s). It's no wonder they were able to muster ten or eleven active connections, then placed another order for additional high-end servers. One day, a newspaper in Missouri had a city where 1/3 of the largest employer's workforce was being laid off, a sheriff's deputy was brought in for pedophile changes, and a ten car pileup on the interstate was on CNN and a reference was made to the web site. Their servers practically melted.
For some reason, people seem to think vertical coding is a bad thing because pretty coding with comments makes it look so long. If it's compiled, then what is their beef? If it's something like ASP/VBScript, it's tokenized, loaded, and reused.
It's not that tough. Vertical code with a liberal use of a consistent format and empty lines, document it with s
I've been writing C++ code for about a decade. I consider myself competent with almost every weird nook in C++ - I have extensive template metaprogramming in some projects, I've used and abused multiple virtual inheritance, and about the only thing I avoid are exceptions because I feel they're a non-solution.
And I think you're dead right. C++ is a hideously complex bitch of a language. Anyone trying to use all the C++ features will quickly drive themselves insane. I rarely use inheritance, I rarely make my own templates, I never do operator overloading unless it's absolutely clear what the operators mean (number classes, geometry classes, and string classes, basically.) In many ways, my code looks like C code, albeit C code with obsessive typesafety and extensive use of the STL.
I've programmed in other languages quite a bit. I honestly feel C++ is the single best language out there. But it isn't for anyone, and it's certainly not for people who can't sit down and say "okay, we need to make this damn program simple."
Breaking Into the Industry - A development log about starting a game studio.
I'm not 100% sure if cruft is a layman's term for Design Debt, or if Design Debt is just one type of cruft, but they're definitely related.
We apologize for the preceding message. All those responsible have been sacked.
Guninski's code works for any default qmail installation figuring the right arch and memory. Depending on your org, most currently shipping machines are capable of fostering said environment. An AMD64 with 8GB+ of ram is not uncommon. I have 4 of them and work for a small company. My previous company had several dozen (Sun 4100s).
These exploits can be performed by any user who owns such hardware, and can read. They are not theoretical. Many bugs have existed in linux kernels that only manifest themselves under extreme circumstances, and I don't see Linus or anyone else of respectable programmer status that attempts such dismissals with a handwave. Maybe it's because pride doesn't get in their way?
Peter Norvig, now CTO of Google, agrees with you. Coding, like writing, is best improved by an alternating diet of writing and reading good works. He collected a few of the best he'd found in a book called Paradigms of Artificial Intelligence Programming, available from his web site or from Amazon: http://norvig.com/paip.html
It talks about AI because it was the 80s (92 by the time it hit shelves) and AI was cool---but the applications involved are now just what we call computing. It's not perfect: fifteen years have passed since it was written. In that time, C++'s STL and Boost have caught up with many features of Common Lisp. Java's come along and done well. Other interactive dynamic languages than Lisp exist: Python, for example. So you'll have to do some translating in your head---but for the same reason that Cicero is read by students of English rhetoric, Norvig should be read by C++ and Java programmers seeking mastery.
-- Brian T. Sniffen
http://www.smk.co.za/
I know practically nothing about OS programming and my C / C++ has been rusting since University, but this guy writes code that even I can follow.
I haven't, honestly. I'm looking at the Wikipedia article and it looks like it's got a lot of stuff that I'm not so interested in, like more reliance on runtime tests and lack of templates.
While I don't write my own templates, I do use them extensively in the STL, and I'd really miss the typesafety of having them. It looks like Objective C is not nearly as obsessive about typesafety as I am.
Breaking Into the Industry - A development log about starting a game studio.
You know, you could just code the thing in Java and eliminate this issue outright, as well as all possibility of buffer overruns... C is the worst possible language for Internet-facing servers.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Actually that's an example of a bad coding practice: notice the magic number 10000. In the above example it would be much better to use a named constant like BUSLOGIC_TIMEOUT_LOOPS_PER_SECOND, or maybe better yet BusLogic_GlobalOptions.TimeoutLoopsPerSecond. Incidentally, doing so would also remove the stupid "here's what 10000 means" comments.
And while I'm ranting about comments, I'll point out that they didn't feel the need to explain WHY those particular values are given 60 second timeout while the default is only one second.
Take Gled (http://www.gled.org/ - a recent CVS snapshot is preferable), a distributed C++ application builder with OpenGL/OpenAL/FLTK interfaces, object persistence and excellent extensibility.
It certainly is not pretty the first time you look at it, that is probably true for any unique project, but if you look harder, you will see a strange tangle using ROOT, CINT the C++ interpreter, built-in C++ object dictionaries, elegant and fast network stack for object streaming and synchronization, and strangely effective remote procedure call interface. But my favourite is the auto-building FLTK gui.
While remotely involved, I do enjoy this code immensely.
Try building a new library for it and enjoy GUI-enabled objects in minutes... (There is even a scratch for a TA-like game in one of the demos, not yet playable.)
-Kvorg
Please correct me if I got my facts wrong.