Knuth Got It Wrong
davecb writes "Think you've mastered the art of server performance? Think again. Poul-Henning Kamp, in an article at ACM Queue, finds an off-by-ten error in btrees, because they fail to take virtual memory into account. And he solves the problem in the open source 'Varnish' HTTP accelerator, for all of us to see and use."
If were talking about swap then nevermind, nothing you can do there. The filesystem is the swap.
I'm not sure what the advantages are of dancing around with the OS's virtual memory system. If it were me, I would detect the amount of physical memory available, allocate 80% of it, lock it into physical memory and manage the paging myself. It would be portable, not subject to changes in the VM algorithm by the OS authors, and easier to directly monitor and improve performance. But that's just me.
I wrote Knuth an email once. He never wrote me back.
As with the xkdc.org reference in the grandparent, this is something of an in joke. Don doesn't do email. B-)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Squid is his use case to beat? Has he been in any high performance app stack in the past five years?
Squid... really?
Considering Varnish was written due primarily to the lacking performance of Squid, I don't see why this is so bad?
Other then varnish and squid what caching reverse proxy software is there? It looks like Nginx added caching to their core recently, although I'm not exactly sure if its intended to act in the same capacity as squid and varnish or to be more like mod_cache for lighttpd. I guess you could use apache and mod_proxy but that's not exactly high performance. I know my employer looked at the various offerings and we ended up writing our own on top of a webserver called OKWS.
I forget the exact amount, but it was like PI or E dollars for every typo. I am not sure what the payment is for an algorithmic error.
Yes, you are correct. I think the author could have just showed figures 5 & 6 and said obviously figure 6 is better and I would have gotten the point.
Instead I had to read a bunch of text and timing charts before he simply showed what his improvement was. Yes, cache oblivious is worse, but that wasn't the problem Knuth was trying to solve. You could make further arguments that the paging system should group links by location AND popularity. You could move more popular links to the top of the tree so you don't have to traverse past the first page to find them. Also, I would think different applications would have different potential improvements. Algorithms for hosting links to news articles (where newer articles are more popular) might not work well with this algorithm when every newly inserted link ends up at the bottom of the tree.
On the flip side, most people don't care. Unless you have many servers, it is cheaper to throw money at the problem in the form of physical RAM before you start thinking about problems like this.
Much of this also depends on the relative size of the data elements compared to sizes of page or cache lines.
For instance if the b-tree node is roughly the size of a page, the entire node must be read into memory at once from disk. Even reading a single byte of the record may necessitate reading the entire page. However this size will likely be much larger than a cache line. So you could have quite a lot of variance in performance based on how much of the structure you access; say scanning through the whole node versus just reading a few words at the top.
Unfortunately, he no longer gives out reward checks for finding bugs in his texts. This seems to be mostly because proud bug-finders inevitably post images of the checks online, which of course, contain Knuth's bank account numbers. More discussion here.
He uses email; his secretary prints them out (after some selection) and forwards them. And he reacts to them. Well, at least, to some, if he knows you. Don is the most humble genius that I know, and I have always found him approachable in a way that is rare for other famous people.
He also uses email (typically via his secretary, again) to organize trips. I also have direct emails from him when some new TeX developments irks him.
Joachim
People don't write Manifestos any more -- what's going on in this world? [Frank Zappa]
I think "got it wrong" in the past tense is a little harsh, but I would say that a LOT of stuff in Knuth is wrong today, or at least terribly obsolete. It's not useless to be aware of, but you don't want to assume that everything is today still exactly the way Knuth described it in the digital stone age.
I mean, just for example, Knuth spends whole chapters talking about sorting algorithms. Almost all modern programmers don't have any use for that stuff, because re-implementing sort in your application code is pointless and counterproductive. The built-in sort provided by any modern language is optimized at a lower level and will handily beat the pants off anything you can do. Heck, even the Schwartzian Transform is built in to most modern languages these days.
Okay, sure, the guys who build low-level tools (compilers and system libraries and such) still need to know about sorting routines. What percentage of the programmer population is that? 0.1%? And even there, Knuth is somewhat obsolete, because (among other things) the stuff you're sorting is almost certainly stored in high-level cross-platform data structures, not some flat array of machine integers. I mean, it's still true that you probably don't want to code a bubble sort, but that's not exactly a profound revelation.
Cut that out, or I will ship you to Norilsk in a box.