Knuth Got It Wrong
davecb writes "Think you've mastered the art of server performance? Think again. Poul-Henning Kamp, in an article at ACM Queue, finds an off-by-ten error in btrees, because they fail to take virtual memory into account. And he solves the problem in the open source 'Varnish' HTTP accelerator, for all of us to see and use."
"Who the hell are you and what are you doing in my house?"
Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.
10 times faster? Yawn. Wake me up when it's 11 times faster.
And wake me up when it's 1010 times faster.
What one fool can do, another can. (Ancient Simian Proverb)
link
That has to be the one of the better binary jokes around.
Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.
Yeah, using this logic, once quantum computers get to the point of solving general problems there's going to be an awful lot of people who "got it wrong" because their algorithms do not apply in a quantum computer. Advancements in technology causing algorithms & data structures to be tweaked means the original person who thought them up "got it wrong"? Ridiculous.
"Oh, RSA was brilliant. But they got it wrong. You see, they failed to account for xanxinglydiumapping in 2056 computers. Poor stupid wrong bastards."
My work here is dung.
There isn't any "off by ten error", and this isn't telling us anything we don't already know (in CS terms): implementation on an actual computer can be different in performance from an abstract machine.
What the author is saying (quite well) is that the virtual memory performance amounts to cache misses, which cause extra performance overhead. He found a case where it was significant and got a 10x speedup in his particular application.
The article is a little over-zealous its characterization, though it's careful to note that this is not actually a theoretical novelty. The summary, on the other hand, bastardizes and exaggerates it.
The article is interesting, and worth reading, but if you RTFS without RTFA you'll be dumber than you were before. Thanks, kdawson.
That's actually not the issue.
The issue is that in the real world, the assumptions underlying the calculation of algorithmic complexity may not hold. For instance, Knuth's analysis that the author of the article here holds to be misleading (not, as the Slashdot title suggests, "wrong") calculates the complexity based on the assumption of ideal random access memory, that is, memory for which all accesses are equal cost.
In real-world computers using caches (as pretty much all real-world computers do, often at many different levels) that assumption does not hold -- access to some parts of the memory address space are much more expensive than accesses to other parts of the address space, and which parts of the address space are expensive changes over time (and how it changes over time potentially depends on the caching strategy used at every level lower than the level at which the algorithm is implemented.)
This means that algorithms in the real world can scale worse than their theoretical "worst-case", if that theoretical worst-case scaling is based on the assumption of constant memory access cost, since that assumption does not hold in the real world.
How does one get from, "I have not found a systematic flaw in the quality of Knuth et al.'s reasoning" to "Knuth Got it Wrong"?