Slashdot Mirror


Knuth Got It Wrong

davecb writes "Think you've mastered the art of server performance? Think again. Poul-Henning Kamp, in an article at ACM Queue, finds an off-by-ten error in btrees, because they fail to take virtual memory into account. And he solves the problem in the open source 'Varnish' HTTP accelerator, for all of us to see and use."

8 of 298 comments (clear)

  1. Nope, he didn't by siddesu · · Score: 5, Informative

    Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.

  2. Credit where credit is due... by Anonymous Coward · · Score: 5, Informative
  3. Re:Theoretical performance vs real-world performan by DragonWriter · · Score: 5, Informative

    Yes it's true. In some real-world applications an algorithm encounters it's worst case running time more than the predicted theoretical average case running time.

    That's actually not the issue.

    The issue is that in the real world, the assumptions underlying the calculation of algorithmic complexity may not hold. For instance, Knuth's analysis that the author of the article here holds to be misleading (not, as the Slashdot title suggests, "wrong") calculates the complexity based on the assumption of ideal random access memory, that is, memory for which all accesses are equal cost.

    In real-world computers using caches (as pretty much all real-world computers do, often at many different levels) that assumption does not hold -- access to some parts of the memory address space are much more expensive than accesses to other parts of the address space, and which parts of the address space are expensive changes over time (and how it changes over time potentially depends on the caching strategy used at every level lower than the level at which the algorithm is implemented.)

    This means that algorithms in the real world can scale worse than their theoretical "worst-case", if that theoretical worst-case scaling is based on the assumption of constant memory access cost, since that assumption does not hold in the real world.

  4. This headline is silly by MadAndy · · Score: 4, Informative
    Knuth's stuff assumes everything is RAM-resident - as long as you don't violate that what he wrote is as valid as ever. I'm quite certain that he'll have different suggestions for tasks involving storing data on disk. Even though the disk writes are implicit because the OS is doing them, they're still there, just as they would be had he coded them explicitly. So of course you're going to get poor performance using a RAM-resident algorithm for a disk-resident application.

    The RAM resident stuff is still useful, both at a lower level, but also for those of us creating applications that can live entirely in memory. A web site I did recently is careful to ensure that the entire primary data set can fit in memory, and for that site everything he wrote is still perfectly valid.

    In fact, for very high performing websites you try to ensure that at least most of your requests come from memory rather than disk, which makes Knuth's stuff more important than ever. If you can't do it in RAM then you'd better have a lot of spindles!

  5. Re:Knuth didn't get it wrong by ImABanker · · Score: 5, Informative

    How does one get from, "I have not found a systematic flaw in the quality of Knuth et al.'s reasoning" to "Knuth Got it Wrong"?

  6. Many servers by tepples · · Score: 4, Informative

    Unless you have many servers

    The article does in fact mention many servers, specifically replacing twelve Squid servers with three Varnish servers.

    it is cheaper to throw money at the problem in the form of physical RAM before you start thinking about problems like this.

    No matter how much physical RAM you have, you're still limited by the speed of the interface between RAM and the CPU. If a set of closely related nodes can fit in a cache line, this improves locality so that the CPU can do more work instead of twiddling its thumbs.

  7. Re:Careful there... by Rakishi · · Score: 4, Informative

    No, you're missing the point. And wrong. And also missing a lot knowledge about what O() means.

    It doesn't matter how many orders of magnitude slower an operation is. O() is about SCALING for input size n. That's all. Constant factors do not matter. The GP already said this, please pay bloody attention next time. The running time of each operation is just a constant and has no impact on the O() performance. An order of magnitude or fifty is all the same. You talk about 1/100 speed and compare n to n^2. Hahahaha. Take n = 1000000. You know what the difference between n and n^2 is then? 1000000. Your puny factor of 100 is irrelevant against six orders of magnitude performance difference.

    The worst case performance does not change. It's still O(n) or O (n^2). In fact it's quite possible for an O(n^2) algorithm to be faster than an O(log(n)) algorithm for small n under certain conditions. O is all about n being so bloody large that constant factors don't matter anymore.

    This is all basic introductory algorithms stuff, please read up on it before trying to chime in on any arguments related it it in the future, okay?

  8. The smell of slashdot in the morning... by phkamp · · Score: 4, Informative

    What a misleading title, it is not even in the same continent as the article.

    A large number of people obviously didn't read the actual article.

    And I guess Knuth has quite a fanboi community on slashdot. I wonder if he really appreciates that ?

    Some of those who did read the article, does not seem to know the difference between a binary heap and a binary tree, and even the pretty strong clue to the difference in the text, did not make them go check wikipedia. 10 out of 10 for selfesteem, but 0 out of 10 for clue.

    Those who think CS should be unsullied by actual computers should make sure to note this belief on their resume. (Trust me, everybody you send your resume to will appreciate that.)

    Those who advocate getting rid of Virtual Memory must have much more money for RAM than is sensible. I wish I could afford that.

    About five comments tries, in vain it seems, to explain the point of my article to the unwashed masses (kudos!, but really, what are you doing here ?)

    Not one comment rises to a level where I feel a need to answer it specifically. On Reddit over the weekend there were about a handful.

    Is that really par for the course on slashdot these days ?

    Sic transit gloria mundi...

    Poul-Henning

    --
    Poul-Henning Kamp -- FreeBSD since before it was called that...