Old-School Coding Techniques You May Not Miss
CWmike writes "Despite its complexity, the software development process has gotten better over the years. 'Mature' programmers remember manual intervention and hand-tuning. Today's dev tools automatically perform complex functions that once had to be written explicitly. And most developers are glad of it. Yet, young whippersnappers may not even be aware that we old fogies had to do these things manually. Esther Schindler asked several longtime developers for their top old-school programming headaches and added many of her own to boot. Working with punch cards? Hungarian notation?"
Some of those are obnoxious and good to see them gone. Others, not so much. For instance, sorting/searching algorithms, data structures, etc. Don't they still make you code these things in school? Isn't it good to know how they work and why?
On the other hand, yeah... fuck punch cards.
Documentation!
First of all, most actual practices mentioned are well alive today -- it's just most programmers don't have to care about them because someone else already did it. And some (systems and libraries developers) actually specialize on doing just those things. Just recently I had a project that almost entirely consisted of x86 assembly (though at least 80% of it was in assembly because it was based on very old code -- similar projects started now would be mostly in C).
Second, things like spaghetti code and Hungarian notation are not "old", they were just as stupid 20 years ago as they are now. There never was a shortage of stupidity, and I don't expect it any soon.
Contrary to the popular belief, there indeed is no God.
Hollerith constants
Equivalences
Computed Gotos
Arithmetic Ifs
Common blocks
There were worse things, horrible things... dirty tricks you could play to get the most out of limited memory, or to bypass Fortran's historical lack of pointers and data structures. Fortran-90 and its successors have done away with most of that cruft while also significantly modernizing the language.
They used to say that real men programmed in Fortran (or should I say FORTRAN). That was really before my time, but I've seen the handiwork of real men: impressive, awe-inspiring, crazy, scary. Stuff that worked, somehow, while appearing to be complete gibberish -- beautiful, compact, and disgustingly ingenious gibberish.
Long live Fortran! ('cause you know it's never going to go away)
Getting tired of Slashdot... moving to Usenet comp.misc for a while.
Actually, the worst spaghetti code I have ever seen (in 30+ years most of it in life-critical systems) is OO C++. It doesn't have to be that way, but I have seen examples that would embarrass the most hackish FORTRAN programmers.
I am alarmed at the religious fervor and non-functional dogma associated with modern programming practices. Even GOTOs have good applications - yes, you can always come up with some other way of doing it, by why and with how much extra futzing? But it's heresy.
Brett
Hungarian notation is bad because you are encoding type and scope information into the name, which makes it harder to change things later.
The fact that it is also one of the ugliest naming conventions is merely a secondary issue.
Dunx
Converting caffeine into code since 1982
Really it has nothing to do with IDEs, but more compilers, good coding practice and OO principles. A few cons:
For a succinct summary: Hungarian Notation Considered Harmful
Be careful. People in masks cannot be trusted.
For some reason the article says that only variables beginning with I,J,and K were implicitly integers in Fortran. Actually, it was I-N.
Good Hungarian notation does exactly that, actually. Check out Apps Hungarian, which encodes the semantic type of the data, rather than the language-level data type.
Of course stupid Hungarian notation is stupid. Stupid anything is stupid. Problem is, most people don't hear about the right approach.
Duff's Device. Pre-ANSI C-language means of unrolling an arbitrary-length loop. We had an Evans and Sutherland Picture System II at the NYIT Computer Graphics Lab, and Tom wrote this to feed it IO as quickly as possible.
Bruce Perens.
First off, most of the things on the list haven't gone away, they've just moved to libraries. It's not that we don't need to understand them, it's just that not everyone needs to implement them (especially the data structures one- having a pre-written one i good, but if you don't understand them thoroughly you're going to have really bad code)..
On top of that, some of their items
*Memory management- still needs to be considered about in C and C++, which are still top 5 languages. You can't even totally ignore it in Java- you get far better results from the garbage collector if you null out your references properly, which does matter if your app needs to scale.
I'd even go so far as to say ignoring memory management is not a good thing. When you think about memory management, you end up with better designs. If you see that memory ownership isn't clearcut, it's usually the first sign that your architecture isn't correct. And it really doesn't cause that many errors with decent programmers(if any- memory errors are pretty damn rare even in C code). As for those coders who just don't get it- I really don't want them on my project even if the language doesn't need it. If you can't understand the request/use/release paradigm you aren't fit to program.
*C style strings
While I won't argue that it would be a good choice for a language today (heck even in C if it wasn't for compatibility I'd use a library with a separate pointer and length), its used in hundreds o thousands of existing C and C++ library and programs. The need to understand it isn't going to go away anytime soon. And anyone doing file parsing or network IO needs to understand the idea of terminated data fields.
I still have more fans than freaks. WTF is wrong with you people?
* Sorting algorithms
If you don't know them, you're not a programmer. If you don't ever implement them, you're likely shipping more library code than application code.
* Creating your own GUIs
Umm.. well actually..
* GO TO and spaghetti code
goto is considered harmful, but it doesn't mean it isn't useful. Spaghetti code, yeah, that's the norm.
* Manual multithreading
All the time. select() is your friend, learn it.
* Self-modifying code
Yup, I actually write asm code.. plus he mentions "modifying the code while it's running".. if you can't do that, you shouldn't be wielding a debugger, edit and continue, my ass.
* Memory management
Yeah, garbage collection is cheap and ubiquitous, and I'm one of the few people that has used C++ garbage collection libraries in serious projects.. that said, I've written my own implementations of malloc/free/realloc and gotten better memory performance. It's what real programmers do to make 64 gig of RAM enough for anyone.
* Working with punch cards
Meh, I'm not that old. But when I was a kid I wrote a lot of:
100 DATA 96,72,34,87,232,37,49,82,35,47,236,71,231,234,207,102,37,85,43,78,45,26,58,35,3
110 DATA 32,154,136,72,131,134,207,102,37,185,43,78,45,26,58,35,3,82,207,34,78,23,68,127
on the C64.
* Math and date conversions
Every day.
* Hungarian notation
Every day. How about we throw in some reverse polish notation too.. get a Polka going.
* Making code run faster
Every fucking day. If you don't do this then you're a dweeb who might as well be coding in php.
* Being patient
"Hey, we had a crash 42 hours into the run, can you take a look?"
"Sure, it'll take me about 120 hours to get to it with a debug build."
How we know is more important than what we know.
Jeez. You must have taken the same course that I did. (Probably not actually.) In my case it was a programming class emphasizing statistics taught by someone in the business school who actually wanted card decks turned in. (This was probably no later than, maybe, '80/'81.) I did the same thing you did. I wrote all the software at a terminal (one of those venerable bluish-green ADM 3As) and when it was working I left the code in my virtual card punch. When I sent a message to the operator asking to have the contents sent off to a physical card punch, his message back was "Seriously?
CUR ALLOC 20195.....5804M
Circa 1984, when I did summer programming jobs at Digital Research (purveyors of CP/M), one of the programmers there showed me how you could put a transistor radio inside the case of your computer. You could tell what the computer was doing by listing to the sounds it picked up via the RF emissions from the computer. For instance, it would go into a certain loop, and you could tell because the radio would buzz like a fly.
Documentation was a lot harder to come by. If you wanted the documentation for X11, you could go to a big bookstore like Cody's in Berkeley, and they would have it in multiple hardcover volumes. Each volume was very expensive. The BSD documentation was available in the computer labs at UC Berkeley in the form of 6-foot-wide trays of printouts. (Unix man pages existed too, but since you were using an ADM3A terminal, it was often more convenient to walk over to the hardcopy.)
On the early microcomputers, there was no toolchain for programming other than MS BASIC in ROM. Assemblers and compilers didn't exist. Since BASIC was slow, if you wanted to write a fast program, you had to code it on paper in assembler and translate it by hand into machine code. But then in order to run your machine code, you were stuck because there was no actual operating system that would allow you to load it into memory from a peripheral such as a cassette tape drive. So you would first convert the machine code to a string of bytes expressed in decimal, and then write a BASIC program that would do a dummy assignment into a string variable like 10 A$="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx". Then you would write self-modifying code in BASIC that would find the location where the string literal "xxx...." was stored, and overwrite it with your machine code. So now if you gave the LIST command, it would display the program on the screen, with the string literal displayed as goofy unprintable characters. Then you would code the program so it would execute the machine code stored at the address of the variable A$. Finally you'd save the program onto cassette.
Find free books.
If you're going to talk about old school, you gotta mention Mel.
The determined Real Programmer can write Fortran programs in any language.
Self-modifying code
Yup, I actually write asm code.. plus he mentions "modifying the code while it's running".. if you can't do that, you shouldn't be wielding a debugger.
Code that generates code is occasionally necessary, but code that actually modifies itself locally, to "improve performance", has been obsolete for a decade.
IA-32 CPUs still support self-modifying code for backwards compatibility. (On most RISC machines, it's disallowed, and code is read-only, to simplify cache operations.) Superscalar IA-32 CPUs still support self-modifying code. But the performance is awful. Here's what self-modifying code looks like on a modern CPU:
Execution is going along, with maybe 10-20 instructions pre-fetched and a few operations running concurrently in the integer, floating point, and jump units. Alternate executions paths may be executing simultaneously, until the jump unit decides which path is being taken and cancels the speculative execution. The retirement unit looks at what's coming out of the various execution pipelines and commits the results back to memory, checking for conflicts.
Then the code stores into an instruction in the neighborhood of execution. The retirement unit detects a memory modification at the same address as a pre-fetched instruction. This triggers an event which looks much like an interrupt and has comparable overhead. The CPU stops loading new instructions. The pipelines are allowed to finish what they're doing, but the results are discarded. The execution units all go idle. The prefetched code registers are cleared. Only then is the store into the code is allowed to take place.
Then the CPU starts up, as if returning from an interrupt. Code is re-fetched. The pipelines refill. The execution units become busy again. Normal execution resumes.
Self-modifying code hasn't been a win for performance since the Intel 286 (PC-AT era, 1985) or so. It might not have hurt on a 386. Anything later, it's a lose.
x = x xor y
y = x xor y
x = x xor y
Now you know!
Correct. I worked for Charles at Xerox on the BravoX project and I initially fought Hungarian. One day I had an epiphany about what it was really about and then I didn't have any problems with it. Properly done it can reduce "name choosing time" to almost zero and it makes walking into other people's code almost completely painless. The key is that you encode semantics, not machine-level details.
Let me tell you a true story to illustrate why I think people should still learn that stuff.
ACT I
So at one point I'm in a room with what looks like two particularly unproductive Wallys. Though it's probably unfair to call both Wally, since at least one looks like the hard working kind... he just makes as much progress as a courier on a treadmill.
So Wally 1 keeps clicking and staring at the screen all week and spewing things like "Unbelievable!" every 5 minutes. My curiosity gets the better of me and I ask what's happening.
"Look at this," goes Wally 1, and I indeed move over to see him toiling in the debugger through a Hashtable with String keys. He's looking at its bucket array, to be precise. "Java is broken! I added a new value with the same hash value for the key, and it just replaced my old one! Look, my old value was here, and now it's the new one!"
"Oh yes, we had that bug too at the former company I worked for," chimes in Wally 2. "We had to set the capacity manually to avoid it."
I clench my teeth to stop myself from screaming.
"Hmm," I play along, "expand that 'next' node, please."
"No, you don't understand, my value was here and now there's this other key there."
"Yes, but I want to see what's in that 'next' node, please."
So he clicks on it and goes, "Oh... There it is..."
Turns out that neither of them had the faintest fucking clue what a hash table is, or for that matter what a linked list is. They looked at its hash bucket and expected nothing deeper than that. And, I'm told, at least one of them had been in a project where they actually coded workarounds (that can't possibly do any difference, too!) for its normal operation.
ACT II
So I'm consulting at another project and essentially they use a HashMap with string keys too. Except they created their own key objects, nothing more than wrappers around a String, and with their own convoluted and IMHO suboptimal hash value calculation too. Hmm, they must have had a good reason, but I ask someone.
"Oh," he goes, "we ran into a Java bug. You can see it in the debugger. You'd add a new value whose key has the same hash value and it replaces yours in the array. So Ted came up with an own hash value, so it doesn't happen any more."
Ted was their architect, btw. There were easily over 20 of them merry retards in that project, including an architect, and neither of them understood:
A) that that's the way a hash table works, and more importantly
B) that it still worked that way even with Ted's idiotic workaround. It's mathematically impossible to code a hash there which doesn't cause the same collisions anyway, and sure enough Ted's produced them too.
ACT III
I'm talking to yet another project's architect, this time a framework, and, sure enough...
"Oh yeah, that's the workaround for a bug they found in project XYZ. See, Java's HashMap has a bug. It replaces your old value when you have a hash collision in the key."
AAARGH!
So I'm guessing it would still be useful if more people understood these things. We're not just talking abstract complaints about cargo-cult programming without understanding it. We're talking people and sometimes whole teams who ended up debugging into it when they had some completely unrelated bug, and spent time on it. And then spent more time coding "workarounds" which can't possibly even make any difference. And then spent more time fixing the actual bug they had in the first place.
A polar bear is a cartesian bear after a coordinate transform.
There is no practical difference between OO code and structured code. The article assumed structured code means goto and gosub, but any Real Programmer knows that procedures (which are just gosubs by name rather than address) are still structured programming.
So what's OO? Each class is just a bunch of functions and procedures, with one entry point and one exit point for each - your standard structured programming methodology. The fact that there are different classes makes no difference. Calls between classes don't change the nature of a class any more than pipes between programs change the nature of programs.
I wasn't impressed by other claims, either. Garbage collection is still a major headache in coding, which is why there are so many debugging mallocs and so many re-implementations of malloc() for specialist purposes. Memory leaks are still far, far too common - indeed they're probably the number 1 cause of crashes these days.
Pointer arithmetic? Still very very common. If you want to access data in an internal database quickly, you don't use SQL. You use a hash lookup and offset your pointer.
Sorts? Who the hell uses a sort library? Sort libraries are necessarily generic, but applications often need to be efficient. Particularly if they're real-time or HPC. Even mundane programmers would not dream of using a generic library that includes sorts they'll never refer to in, say, an e-mail client or a game. They'll write their own.
One of the reasons people will choose a malloc() like hoard, or an accelerated library like liboil is that the standard stuff is crappy for anything but doing standard stuff. This isn't the fault of the glibc folks, it's the fault of computers for not being infinitely fast and the fault of code not being absolutely identical between tasks.
The reason a lot of these rules were developed was that you needed to be able to write reusable code that also had a high degree of correctness. Today, you STILL need to be able to write reusable code that also has a high degree of correctness. If anything, the need for correctness has increased as security flaws become all the more easily exploited, and the need for reusability has increased as code bases are often just too large to be refactored on every version. (Reusability is just as important between versions as it is between programs - a thing coders often forget, forcing horrible API and ABI breakages.)
The reason that software today is really no better, stability-wise, than it was 15-30 years ago is that new coders think they can ignore the old lessons because they're "doing something different", only to learn later on that really they aren't.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
And a few more examples of cargo-cultism, from people who were untrained to understand what they're doing, but someone thought it was ok because the Java standard library does it for them anyway.
1. The same Wally 1 from the previous story had written basically this method:
Then he called it like this, to try to get around an immutable field in an object. Let's say we have an object called Name, which has an immutable String. So you create it with that string and can't change it afterwards. You have a getName() but not a setName() on it. So he tried to do essentially:
I understand that he worked a week on trying to debug into why it doesn't work, until he asked for help.
2. From Ted's aforementioned project:
So they used the wrapper classes like Integer or Character all over the place instead of int or char. This was back in Java 1.3 times too, so there was no automatic boxing and unboxing. The whole code was a mess of getting the values boxed as parameters, unboxing them, doing some maths, boxing the result. Lather, rinse, repeat.
I ask what that's all about.
"Oh, that's a clever optimization Ted came up with. See, if you have the normal int as a parameter, Java copies the whole integer on the stack. But if you use Integer it only copies a pointer to it."
AAARGH!
A polar bear is a cartesian bear after a coordinate transform.
You don't need long division in normal life. Regardless of if you are in a math heavy career or not, you aren't going to waste your time doing it by hand, you'll use a calculator which is faster and more accurate. However, you need to learn it. You need to understand how division works, how it's done. Once you learn it, you can leave it behind and automate it, but it is still important to learn. An understand of higher level math will likely be flawed if basic concepts aren't learned properly.
This is one of my favourite quotes:
"The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." - Michael A. Jackson
That being said, when I hit the experts only situation I can usually get 2 orders of magnitude improvement in speed. I just then have to spend the time to document the hell out of it so that the next poor bastard who maintains the code can understand what on earth I've done. Especially given that all too often I am this poor bastard.
Um, I'm pretty sure quicksort is still the go-to sort simply because it's the implementation that's built into almost every single programming environment. Then again honestly, I'd say that from the point of view of a pragmatic programmer... it doesn't matter. There's a built-in fuction (whether it's qsort() in the C standard library, or Arrays.Sort() in Java, or whatever) that will take your array and return it, sorted. If your app runs too slow and you profile it and it turns out the speed problem is in the sorting AND you can't find a better algorithm that doesn't depend so much on sorting... THEN you look at optimising it. Never forget the two cardinal rules of optimising:
1) Don't optimise.
2) (Experts only:) Optimise later.
Or as I once read it eloquently expressed:
1) Make it work.
2) Make it work right.
3) Make it work fast.
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.