Slashdot Mirror


Knuth Got It Wrong

davecb writes "Think you've mastered the art of server performance? Think again. Poul-Henning Kamp, in an article at ACM Queue, finds an off-by-ten error in btrees, because they fail to take virtual memory into account. And he solves the problem in the open source 'Varnish' HTTP accelerator, for all of us to see and use."

25 of 298 comments (clear)

  1. as Knuth told me when I was at his house by commodoresloat · · Score: 5, Funny

    "Who the hell are you and what are you doing in my house?"

    1. Re:as Knuth told me when I was at his house by interval1066 · · Score: 4, Funny

      I wrote Knuth an email once. He never wrote me back.

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    2. Re:as Knuth told me when I was at his house by TheGratefulNet · · Score: 5, Funny

      I called Wirth, once. but I think I called him by name and not by value.

      perhaps I made the wrong judgement call; my phone overflowed.

      --

      --
      "It is now safe to switch off your computer."
  2. Crank it to 11 by MrEricSir · · Score: 4, Funny

    10 times faster? Yawn. Wake me up when it's 11 times faster.

    --
    There's no -1 for "I don't get it."
    1. Re:Crank it to 11 by PatPending · · Score: 5, Funny

      10 times faster? Yawn. Wake me up when it's 11 times faster.

      And wake me up when it's 1010 times faster.

      --
      What one fool can do, another can. (Ancient Simian Proverb)
    2. Re:Crank it to 11 by Nadaka · · Score: 5, Funny

      That has to be the one of the better binary jokes around.

  3. Nope, he didn't by siddesu · · Score: 5, Informative

    Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.

    1. Re:Nope, he didn't by SashaMan · · Score: 5, Insightful

      Mod parent up. There was a comment on the article that referenced cache oblivious algorithms, which was a new concept to me and very interesting. Basically, this set of algorithms assumes a memory hierarchy (e.g. fast ram vs slow disk) that is optimized to limit the number of times the slower memory is accessed. Importantly, cache oblivious algorithms are optimal REGARDLESS of the size of the cache. That's opposed to a cache aware algorithm, like a normal b-tree, where the size of each node is set according to the page size of the machine.

      A very helpful overview here from MIT Opencourseware: http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-fourteen/

    2. Re:Nope, he didn't by Darinbob · · Score: 4, Insightful

      Agreed. Very often these results get misused and misunderstood. Every operation has a cost, and you need to know what those costs are before you decide which one of those gets measured.

      For instance, it is common with sorting algorithms to measure (in Big-Oh notation) how many comparison operations there are. However these may not necessarily be very expensive relative to other operations; such as the cost of swapping two elements. If the elements are large structures and the comparison keys are integers, then you should be concerned about the number of copies. But if the elements are pointers to structures and the keys are long strings, then the concern will be with the number of comparisons.

      Almost every text book on algorithms you see is going to assume some sort of idealized computing device. In the real world there are multiple levels of memory speeds; caches up to virtual memory to disk. There are also other external factors that must be taken into account as well; for instance if you don't have enough memory, virtual or not, to hold all the data at once. Say you're sorting a database selection; or you're on a mainframe with limited memory space but reading customer data from tapes.

    3. Re:Nope, he didn't by yuhong · · Score: 5, Insightful

      Especially when virtual memory did not exist 46 years ago.

  4. Knuth didn't get it wrong by gnasher719 · · Score: 4, Insightful

    You should have read a bit further to the bit with B-trees and B*-trees.

    1. Re:Knuth didn't get it wrong by ImABanker · · Score: 5, Informative

      How does one get from, "I have not found a systematic flaw in the quality of Knuth et al.'s reasoning" to "Knuth Got it Wrong"?

    2. Re:Knuth didn't get it wrong by Bigjeff5 · · Score: 4, Funny

      Laziness?

      Come on, it's Slashdot!

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
  5. don't use swap, doofs by conspirator57 · · Score: 4, Insightful

    article summary:

    "disk is slower than RAM. some doofs don't realize their system is swapping. ergo algorithm is bad."

    throw in 'Knuth is wrong' to generate page hits.
    ???
    profit.

    non-sensationalized takeaway: "remember swap is slow; try not to use it."

    --
    "If still these truths be held to be
    Self evident."
    -Edna St. Vincent Millay
  6. Credit where credit is due... by Anonymous Coward · · Score: 5, Informative
  7. Agreed by eldavojohn · · Score: 5, Insightful

    Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.

    Yeah, using this logic, once quantum computers get to the point of solving general problems there's going to be an awful lot of people who "got it wrong" because their algorithms do not apply in a quantum computer. Advancements in technology causing algorithms & data structures to be tweaked means the original person who thought them up "got it wrong"? Ridiculous.

    "Oh, RSA was brilliant. But they got it wrong. You see, they failed to account for xanxinglydiumapping in 2056 computers. Poor stupid wrong bastards."

    --
    My work here is dung.
  8. Knuth didn't get anything wrong by jfengel · · Score: 5, Insightful

    There isn't any "off by ten error", and this isn't telling us anything we don't already know (in CS terms): implementation on an actual computer can be different in performance from an abstract machine.

    What the author is saying (quite well) is that the virtual memory performance amounts to cache misses, which cause extra performance overhead. He found a case where it was significant and got a 10x speedup in his particular application.

    The article is a little over-zealous its characterization, though it's careful to note that this is not actually a theoretical novelty. The summary, on the other hand, bastardizes and exaggerates it.

    The article is interesting, and worth reading, but if you RTFS without RTFA you'll be dumber than you were before. Thanks, kdawson.

  9. Re:Theoretical performance vs real-world performan by DragonWriter · · Score: 5, Informative

    Yes it's true. In some real-world applications an algorithm encounters it's worst case running time more than the predicted theoretical average case running time.

    That's actually not the issue.

    The issue is that in the real world, the assumptions underlying the calculation of algorithmic complexity may not hold. For instance, Knuth's analysis that the author of the article here holds to be misleading (not, as the Slashdot title suggests, "wrong") calculates the complexity based on the assumption of ideal random access memory, that is, memory for which all accesses are equal cost.

    In real-world computers using caches (as pretty much all real-world computers do, often at many different levels) that assumption does not hold -- access to some parts of the memory address space are much more expensive than accesses to other parts of the address space, and which parts of the address space are expensive changes over time (and how it changes over time potentially depends on the caching strategy used at every level lower than the level at which the algorithm is implemented.)

    This means that algorithms in the real world can scale worse than their theoretical "worst-case", if that theoretical worst-case scaling is based on the assumption of constant memory access cost, since that assumption does not hold in the real world.

  10. Why trust the OS? by michaelmalak · · Score: 4, Interesting

    I'm not sure what the advantages are of dancing around with the OS's virtual memory system. If it were me, I would detect the amount of physical memory available, allocate 80% of it, lock it into physical memory and manage the paging myself. It would be portable, not subject to changes in the VM algorithm by the OS authors, and easier to directly monitor and improve performance. But that's just me.

  11. Re:Are you serious? Do you even know who phk is? by McNihil · · Score: 4, Insightful

    To place algorithm theory in the same space as implementation is just plain wrong... and the article does say that in a better way than kdawson's sensationalist header.

    The implementation of a particular algorithm on specific hardware is more inline with resource economy than anything else and to subject comparison to the theoretical implementation is just ludicrous.

    For instance one could with simple means device a unit where only bubble-sort would be possible because of the massive overhead of qsort on said machine. This is trivial... for instance Casio Fx180 calculator.

  12. This headline is silly by MadAndy · · Score: 4, Informative
    Knuth's stuff assumes everything is RAM-resident - as long as you don't violate that what he wrote is as valid as ever. I'm quite certain that he'll have different suggestions for tasks involving storing data on disk. Even though the disk writes are implicit because the OS is doing them, they're still there, just as they would be had he coded them explicitly. So of course you're going to get poor performance using a RAM-resident algorithm for a disk-resident application.

    The RAM resident stuff is still useful, both at a lower level, but also for those of us creating applications that can live entirely in memory. A web site I did recently is careful to ensure that the entire primary data set can fit in memory, and for that site everything he wrote is still perfectly valid.

    In fact, for very high performing websites you try to ensure that at least most of your requests come from memory rather than disk, which makes Knuth's stuff more important than ever. If you can't do it in RAM then you'd better have a lot of spindles!

  13. Many servers by tepples · · Score: 4, Informative

    Unless you have many servers

    The article does in fact mention many servers, specifically replacing twelve Squid servers with three Varnish servers.

    it is cheaper to throw money at the problem in the form of physical RAM before you start thinking about problems like this.

    No matter how much physical RAM you have, you're still limited by the speed of the interface between RAM and the CPU. If a set of closely related nodes can fit in a cache line, this improves locality so that the CPU can do more work instead of twiddling its thumbs.

  14. Alas, no more by l00sr · · Score: 4, Interesting

    Unfortunately, he no longer gives out reward checks for finding bugs in his texts. This seems to be mostly because proud bug-finders inevitably post images of the checks online, which of course, contain Knuth's bank account numbers. More discussion here.

  15. Re:Careful there... by Rakishi · · Score: 4, Informative

    No, you're missing the point. And wrong. And also missing a lot knowledge about what O() means.

    It doesn't matter how many orders of magnitude slower an operation is. O() is about SCALING for input size n. That's all. Constant factors do not matter. The GP already said this, please pay bloody attention next time. The running time of each operation is just a constant and has no impact on the O() performance. An order of magnitude or fifty is all the same. You talk about 1/100 speed and compare n to n^2. Hahahaha. Take n = 1000000. You know what the difference between n and n^2 is then? 1000000. Your puny factor of 100 is irrelevant against six orders of magnitude performance difference.

    The worst case performance does not change. It's still O(n) or O (n^2). In fact it's quite possible for an O(n^2) algorithm to be faster than an O(log(n)) algorithm for small n under certain conditions. O is all about n being so bloody large that constant factors don't matter anymore.

    This is all basic introductory algorithms stuff, please read up on it before trying to chime in on any arguments related it it in the future, okay?

  16. The smell of slashdot in the morning... by phkamp · · Score: 4, Informative

    What a misleading title, it is not even in the same continent as the article.

    A large number of people obviously didn't read the actual article.

    And I guess Knuth has quite a fanboi community on slashdot. I wonder if he really appreciates that ?

    Some of those who did read the article, does not seem to know the difference between a binary heap and a binary tree, and even the pretty strong clue to the difference in the text, did not make them go check wikipedia. 10 out of 10 for selfesteem, but 0 out of 10 for clue.

    Those who think CS should be unsullied by actual computers should make sure to note this belief on their resume. (Trust me, everybody you send your resume to will appreciate that.)

    Those who advocate getting rid of Virtual Memory must have much more money for RAM than is sensible. I wish I could afford that.

    About five comments tries, in vain it seems, to explain the point of my article to the unwashed masses (kudos!, but really, what are you doing here ?)

    Not one comment rises to a level where I feel a need to answer it specifically. On Reddit over the weekend there were about a handful.

    Is that really par for the course on slashdot these days ?

    Sic transit gloria mundi...

    Poul-Henning

    --
    Poul-Henning Kamp -- FreeBSD since before it was called that...