Slashdot Mirror


The Law of Leaky Abstractions

Joel Spolsky has a nice essay on the leaky abstractions which underlie all high-level programming. Good reading, even for non-programmers.

18 of 437 comments (clear)

  1. Informative by Omkar · · Score: 5, Insightful

    Although I used to program as a hobby, my eyes bugged out when I saw this article. It's actually quite interesting; I finally realize why the hell people program in lower level languages.

    One point that I think could be addressed is backward compatibilty. I really know nothing about this, but don't the versions of the abstractions have to be fairly compatible with each other, especially on a large, distributed system? This extra abstraction of an abstraction has to be orders of magnitude more leaky. The best example I can think of is Windows.

    1. Re:Informative by CynicTheHedgehog · · Score: 5, Insightful

      Exactly. The only way to do something more easily or more efficiently is to restrict your scope. If you know something about a particular operation, or if you can make a few assumptions about it, your life because much easier. Take sorting, for example. Comparison sorts run (at best) in Omega(n log n) time. However, if you know the maximum range of numbers k in a set of length n, and k is much smaller than n, you can use a counting sort and do it in Theta(n) time. But what happens if you put a k+1 number in there? Well, all hell breaks loose.

      Another example: Java provides a pretty nifty mail API that you can use to create any kind of E-mail you can dream up in 20 lines of code or so. But you only ever want to send E-mail with a text/plain bodypart and a few attachments. So you make a class that does just that, and save yourself 15 lines of code every time you send mail. But suppose you want to send HTML E-mail, or you want to do something crazy with embedded bodyparts? Well it's not in the scope, so it's back to the old way.

      In order to abstract you have to reduce your scope somehow, and you have to ensure that certain parameters are within your scope (which adds overhead). And sometimes there's just nothing you can do about that overhead (like in TCP). And occasionally (if you abstract too much) you limit your scope to the point where your code can't be re-used.

      And as you abstract you tend to pile up a list of dependencies. Every library you abstract from needs to be included in addition to your library (assuming you use DLLs). So yes, there are maintenance and versioning headaches involved.

      Bottom line: non-trivial abstraction saves time up front, but costs later, mostly in the maintenance phase. There's probably some fixed kharmic limit to how much can be simplified beyond which any effort spent simply in displaces the problem.

  2. The underlying problem with programming by Jack+Wagner · · Score: 5, Insightful

    I'm of the idea that the whole premise that high-level tools and high level abstraction coupled with encasulation are the biggest bane of the software industry. We have these high level tools which most programmers really don't understand and are taught that they don't need to understand in order to build these sophisticated products.

    Yet, when something goes wrong with the underlying technology they are unable to properly fix their product because all they know is some basic java or VB and they don't understand anything about sockets or big-endian/little endian byte alignment issues. It's no wonder todays software is huge and slow and doesn't work as advertised.

    The one shining example of this is FreeBSD, which is based totally on low level C programs and they stress using legacy program methodologies in place of the fancy schmancy new ones which are faulty. The proof is in the pudding, as they say, when you look at the speed and quality if FreeBSD, as opposed to some of the slow ponderous OS's like Windows XP or Mac OSX.

    Warmest regards,
    --Jack

    --


    Wagner LLC Consulting Co. - Getting it right the first time
    1. Re:The underlying problem with programming by binaryDigit · · Score: 4, Insightful

      Well I'd agree up to a point. The fact is that FreeBSD is trying to solve a different problem/attract a different audience than XP/OSX. If FreeBSD was forced to add all the "features" of the other two in an attempt to compete in that space, then it would suffer mightily. You also have to take into account the level/type of programmers working on the these projects. While FreeBSD might have a core group of seasoned programmers working on it, the other two have a great range of programming experience working on it. A few guys who know what they're doing working on a smaller featureset would always produce better stuff than a large group of loosely coupled and widely differing talents working on a monsterous feature set.

    2. Re:The underlying problem with programming by jorleif · · Score: 5, Insightful

      The real problem is not the existance of high-level abstractions, but the fact that many programmers are unwilling or unable to understand the abstraction.

      So you say "let's get rid of encapsulation". But that doesn't solve this problem, because this problem is one of laziness or incompetence rather than not being allowed to touch what's inside the box. Encapsulation solves an entirely different problem, that is the one of modularity. If we abolish encapsulation the same clueless programmers will just produce code that is totally dependent on some obscure property in a specific version of a library. They still won't understand what the library does, so we're in a worse position than when we started.

    3. Re:The underlying problem with programming by Yokaze · · Score: 5, Insightful

      Don't blame the tools.

      High level languages and abstractions aren't the problem, neither are pointers in low level languages. It's the people, who can't use them.

      Abstraction does mean that you should not have to care about the underlying mechanisms, not that you should not understand them.

      --
      "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
    4. Re:The underlying problem with programming by radish · · Score: 5, Insightful

      And inevitably, at some point in a programmer's career, they'll come across a system in which the only available development tool is an assembler

      Do you REALLY believe that? Are you mad? I can be pretty sure that in my career I will never be required to develop in assembler. And even if I do, I just have to brush up on my asm - big deal. To be honest, if I was asked to do that I'd probably quit anyway, it's not something I enjoy.

      Sure it's important to understand what's going on under the hood, but you have to use the right tools for the right job. No one would cut a lawn with scissors, or someones hair with a mower. Likewise I wouldn't write a FPS game in prolog or a web application in asm.

      The real point is that people have to get out of the "one language to code them all" mentality - you need to pick the right language and environment for the task at hand. From a personal point of view that means haveing a solid enough grasp of the fundamentals AT ALL LEVELS (i.e. including high and low level languages) to be able to learn the skills you inevitably won't have when you need them.

      Oh, and asm is just an abstraction of machine code. If you're coding in anything except 1's and 0's you're using a high(er) level language. Get over it.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    5. Re:The underlying problem with programming by Junks+Jerzey · · Score: 5, Insightful

      After 5 years of programming, my favorite language has become assembler - not because I hate HLL's, but rather, because you get exactly what you code in assembler. There are no "Leaky Abstractions" in assembly.

      Ah, but you are wrong, and I'm speaking as someone who has written over 100,000 lines of assembly code. The great majority of the time, when you're faced with a programming problem, you don't want to think about that problem in terms of bits and and bytes and machine instructions and so on. You want to think about the problem in a more abstract way. After all, programming can be extremely difficult, and if you focus on the minute then you may never come up with a solution. And many high level abstractions simply do not exist in assembly language.

      What does a closure look like in assembly? It doesn't exist as a concept. Even if you write code using closures in Lisp, compile to assembly language, and then look at the assembly language, the concept of a closure will not exist in the assembly listing. Period. Because it's a higher level concept. It's like talking about a piece of lumber when you're working on a molecular level. There's no such thing when you're viewing things in such a primitive way. "Lumber" only becomes a concept when you have a macroscopic view. Would you want to build a house using individual molecules or would you build a house out of lumber or brick?

    6. Re:The underlying problem with programming by Chris+Mattern · · Score: 5, Insightful

      > There are no "Leaky Abstractions" in assembly.

      At this point, may I whisper the word "microcode" in your ear?

      Chris Mattern

  3. The ultimate leaky abstraction by nounderscores · · Score: 5, Insightful

    Is our own bodies.

    I'm studying to be a bioinformatics guy with the university of melbourne and have just had the misfortune of looking into the enzymatic reactions that control oxygen based metabolism in the human body.

    I tried to do a worst case complexity analysis and gave up about half way through the krebs cycle.

    When you think about it, most of basic science, some religeon and all of medicine has been about removing layers of abstraction to try and fix things when they go wrong.

  4. TCP for the bored by mekkab · · Score: 5, Insightful

    FIne it's relaibale becasue of acks, timeouts, adaptive re-transmit timeouts that take statistical averages of RTT times, exponential back-off and slow start, window acks which keep track of what bytes are received, etc.

    So in your case of timing out N, re-tx'ing N, and then getting the repsonse to the first N back after sending the second N, you do two things:
    1) Good! You got yr packet!
    2) keep track of how many bytes you have received thsu far (TCP is not sending messages, it is sending a stream)
    3) when you get the response from your second request, discard it, becuase you already received those bytes from the stream.
    4) since you timed out, DON'T use the Round TRip Time for that reponse: slow down your expected RTT time, and THEN start measuring.

    And guess what? If I unplug the NIC of the other machine, there is no reliable way of transmitting that data (assuming your destination machine isn't dual homed)- so I keep streaming bytes to a TCP socket and I don't find out my peer is gone for approx. 2 minutes.
    WOW. There's nothing reliable about that boundary condition!

    my point is TCP is reliable ENOUGH. But I wouldn't equate it with a Maytag warranty. It is not a panacea. Infact, for a closed homogenous network I wouldn't even consider it the best option. But if the boundary conditions fall within the acceptible fudge range (remember Real Time human grade systems are not 100% reliable, only 99.99999% and much of that is achieved through redundancy) your leaks are ok.

    --
    In the future, I would want to not be isolated from my friends in the Space Station.
  5. Time to market is the factor, not elegance by Ars-Fartsica · · Score: 5, Insightful
    This argument is so tired. The downfall of programming is now due to people who can't/don't write C. Twenty years before that the downfall of programming was C programmers who couldn't/wouldn't write assembler.

    The market rewards abstractions because they help create high level tools that get products on the market faster. Classic case in point is WordPerfect. They couldn't get their early assembler-based product out on a competitive schedule with Word or other C based programs.

    1. Re:Time to market is the factor, not elegance by ChaosDiscord · · Score: 5, Insightful
      The market rewards...

      I'd suggest stearing clear of that phrase if your intention is to indicate that something is "good". It's also completes with things like "The market rewards skilled con men who disappear before you realize you've been rooked" and "The market rewards CEOs who destroy a company's long term future to boost short term stock value so he can cash out and retire."

      I'm all in favor of good abstractions, good abstractions will help make us more efficient. But even the best abstractions occasionally fail, and when they fail a programmer needs to be able to look beneath the abstraction. If you're unable to work below and without the abstraction, you'll be forced to call in external help which may cost you any of time, money, showing people you don't entirely trust your proprietary code, and being at the mercy of an external source. Sometimes this trade off is acceptable (I don't really have the foggest idea how my car works, when it breaks I put myself at the mercy of my auto shop). Perhaps we're even moving to a world where you have high level programmers that occasionally call in low level programmers for help. But you can't say that it's always best to live at the highest level of abstraction possible. You need to evaluate the benefits for each case individually.

      You point out that many people complain that some new programmers can't program C, while twenty years ago the complaint was the some new programmers can't program assembly. Interestingly both are right. If you're going to be skilled programmer you should have at least a general understanding of how a processor works and assembly. Without this knowledge you're going to be hard pressed to understand certain optimizations and cope with catastrophic failure. If you're going to write in Java or Python, knowing how the layer below (almost always C) works will help you appreciate the benefits of your higher level abstraction. You can't really judge the benefits of one language over another if you don't understand the improvements each tries to make over a lower level language. To be a skilled generalist programmer, you really need at least familiarity with every layer below the one you're using (this is why many Computer Science desgrees include at least one simple assembly class and one introductory electronics class).

  6. This is an overrated rant about bad coding by PureFiction · · Score: 5, Insightful

    Proper abstractions avoid unintended side-effects by presenting a clean view of the intent and function of a given interface, and not just a collection of methods or structures.

    When I read what Joel wrote about "leaky abstractions" i saw a peice complaining about "unintended side-effects". I don't think the problem is with abstractions themselves, but rather the implementation.

    He lists some examples:

    1. TCP - This is a common one. Not only does TCP itself have peculiar behavior in less than ideal conditions, but it is also interfaced with via sockets, which compound the problem with an overly complex API.

    If you were to improve on this and present a clean reliable stream transport abstraction is would likely have a simple connection establishment interface and some simple read/write functionality. Errors would be propagated up to a user via exceptions or event handlers. But the point I want to make is that This problem can be solved with a cleaner abstraction.

    2. SQL - This example is a straw man. The problem with SQL is not the abstraction it provides, but the complexity of dealing with unknown table sizes when you are trying to write fast generic queries. There is no way to ensure that a query runs fastest on all systems. Every system and environment is going to have different amounts and types of data. The amount of data in a table, the way it is indexed, and the relationship between records is what determines a queries speed. There will always be manual performance tweaking of truly complex SQL simply because every scenario is different and the best solution will vary.

    3. C++ string classes. I think this is another straw man. Templates and pointers in C++ are hard. That is all there is too it. Most Visual Basic only coders will not be able to wrap their minds around the logic that is required to write complex c++ template code. No matter how good the abstractions get in C++, you will always have pointers, templates, and complexity. Sorry Joel, your VB coders are going to have to avoid c++ forever. There is simply no way around it. This abstraction was never meant to make things simple enough for Joe Programmer, but rather to provide an extensible, flexible tool for the programmer to use when dealing with string data. Most of the time this is simpler, sometimes it is more complex (try writing your own derived string class - there are a number of required constructors you must implement which are far from obvious) but the end result is that you have a flexible tool, not a leaky abstraction.

    There are some other examples, but you see the point. I think Joel has a good idea brewing regarding abstractions, complexity, and managing dependencies and unintended side-effects, but I do not think the problem is anywhere near as clear cut as he presents. As a discipline software engineering has a horrible track record of implementing arcane and overly complex abstractions for network programming (sockets and XTI) generic programming (templates, ref counting, custom allocators) and even operating systems API's (POSIX).

    Until we can leave behind all of the cruft and failed experiments of the past, start new with complete and simple abstractions that do not mask behavior, but rather recognize it and provide a mechansim to handle it gracefully, we will run into these problems.

    Luckily, such problems are fixable - just write the code. If joel were right and complex abstractions were fundamentally flawed, that would be a dark picture indeed for the future of software engineering (it is only going to grow ever more complex from here kids - make no mistake about it).

  7. Pessimism gone rampant by jneemidge · · Score: 5, Insightful
    This article reminds me of what I hated most about Jurassic Park (the novel -- the movie blessly omits the worst of it) -- Ian Malcolm's runaway pessimism. The arguments boil down to be very similar. Ian Malcolm says that complex systems are so complex we can't ever understand them all, so they're doomed to fail. Joel Spolsky says that our high-level abstractions will fail and because of that we're doomed to need to understand the lower-level stuff. I have problems with both -- they're a sort of technopessimism that I find particularly offensive, because they make the future sound bleak and hopeless despite volumes of evidence that, in fact, we've been dealing successfully with these issues for decades and they're just not all that bad.

    We have examples of massively complex systems that work very reliably day-in and day-out. Jet airplanes, for one; the national communications infrastructure, for another. Airplanes are, on the whole, amazingly reliable. The communications infrastructure, on the other hand, suffers numerous small faults, but they're quickly corrected and we go on. Both have some obvious leaky abstractions.

    The argument works out to be pessimism, pure and simple -- and unwarrented pessimism to boot. If it were true that things were all that bad, programmers would all _need_ to understand, in gruesome detail, the microarchitectures they're coding to, how instructions are executed, the full intricacies of the compiler, etc. All of these are leaky abstractions from time to time. They'd also need to understand every line of libc, the entire design of X11 top to bottom, and how their disk device driver works. For almost everyone, this simply isn't true. How many web designers, or even communications applications writers, know -- to the specification level -- how TCP/IP works? How many non-commo programmers?

    The point is that sometimes you need to know a _little bit_ about the place where the abstraction can leak. You don't need to know the lower layer exhaustively. A truly competant C programmer may need to know a bit about the architecture of their platform (or not -- it's better to write portable code) but they surely do not need to be a competant assembly programmer. A competant web designer may need to know something about HTML, but not the full intricacies of it. And so forth.

    Yes, the abstractions leak. Sometimes you get around this by having one person who knows the lower layer inside and out. Sometimes you delve down into the abstraction yourself. And sometimes, you say that, if the form fails because it needs JavaScript and the user turned off JavaScript, it's the user's fault and mandate JavaScript be turned on -- in fact, a _good_ high-level tool would generate defensive code to put a message on the user's screen telling them that, in the absence of JavaScript, things will fail (i.e. the tool itself can save the programmer from the leaky abstraction).

    What Ian Malcolm says, when you boil it all down, is that complex systems simply can't work in a sustained fashion. We have numerous examples which disprove the theory. That doesn't mean that we don't need to worry about failure cases, it means we overengineer and build in failsafes and error-correcting logic and so forth. What Joel Spolsky says is that you can't abstract away complexity because the abstractions leak. Again, there are numerous examples where we've done exactly that, and the abstraction has performed perfectly adequately for the vast majority of users. Someone needs to understand the complex part and maintain the abstraction -- the rest of us can get on with what we're doing, which may be just as complex, one layer up. We can, and do, stand on the shoulders of giants all the time -- we don't need to fully understand the giants to make use of their work.

  8. Re:even for non-programmers by jaredcoleman · · Score: 5, Insightful
    Very funny! I agree that the average Joe is still going to be lost with the technical aspects of this article, but the author does generalize...

    And you can't drive as fast when it's raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it's raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can't see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions


    I've heard a lot of people say that they can't believe how many homes, schools, and other buildings were destroyed by the huge thunderstorms that hit the states this past weekend, or that many people died. Hello, we haven't yet figured out how to control everything! American (middle to upper-class) life is a leaky abstaction. We find this out when we have a hard time coping with natural things that shake up our perfect (abstacted) world. That is what we all need to understand.

  9. Argh. by be-fan · · Score: 4, Insightful

    While I usually like Joel's work, I'm pissed about the random jab at C++. For those he didn't read the article, he says something along the lines of

    "A lot of the stuff the C++ committe added to the language was to support a string class. Why didn't they just add a built-in string type?"

    It's good that a string class wasn't added, because that lead to templates being added! And templates are the greatest thing, ever!

    The comment shows a total lack of understanding of post-template, modern C++. People are free not to like C++ (or aspects of it) and to disagree with me about templates, of course, and in that case I'm fine with them taking stabs at it. But I get peeved when people who have just given the language a cursory glance try to fault it. If you haven't used stuff like Loki or Boost, or taken a look at some of the fascinating new design techniques that C++ has enabled, then you're in no place to comment about the language. At least read something like the newer editions of D&E or "The C++ Programming Language" then read "Modern C++" before spouting off.

    PS> Of course, I'm not accusing the author of being unknowledgable about C++ or anything of the sort. I'm just saying that this particular comment sounded rather n00b'ish, so to speak.

    --
    A deep unwavering belief is a sure sign you're missing something...
  10. Re:Here goes.... by Junks+Jerzey · · Score: 4, Insightful

    Please tell me what you think of this - I would honestly like to know.

    I've worked in a way similar to you, and I might still if it were as mindlessly simple to write assembly language programs under Windows as it was back in the day of smaller machines (i.e. no linker, no ugly DLL calling conventions, smaller instruction set, etc.). In addition to being fun, I agree in that assembly language is very useful when you need to develop your own abstractions that are very different from other languages, but it's a fine line. First, you have to really gain something substantial, not just a few microseconds of execution time and an executable that's ten kilobytes smaller. And second, sometimes you *think* you're developing a simpler abstraction, but by the time you're done you really haven't gained anything. It's like the classic newbie mistake of thinking that it's trivial to write a faster memcpy.

    These days, I prefer to work the opposite way in these situations. Rather than writing directly in assembly, I try to come up with a workable abstraction. Then I write a simple interpreter for that abstraction in as high a level language as I can (e.g. Lisp, Prolog). Then I work on ways of mechanically optimizing that symbolic representation, and eventually generate code (whether for a virtual machine or an existing assembly language). This is the best of both worlds: You get your own abstraction, you can work with assembly language, but you can mechanically handle the niggling details. If I come up with an optimization, then I can implement it, re-convert my symbolic code, and there it is. This assumes you're comfortable with the kind of programming promoted in books the _Structure and Interpretation of Computer Programs_ (maybe the best programming book ever written). To some extent, this is what you are doing with your macros, but you're working on a much lower level.