Knuth Got It Wrong
davecb writes "Think you've mastered the art of server performance? Think again. Poul-Henning Kamp, in an article at ACM Queue, finds an off-by-ten error in btrees, because they fail to take virtual memory into account. And he solves the problem in the open source 'Varnish' HTTP accelerator, for all of us to see and use."
"Who the hell are you and what are you doing in my house?"
wow, my mind just got blown
10 times faster? Yawn. Wake me up when it's 11 times faster.
There's no -1 for "I don't get it."
Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.
You should have read a bit further to the bit with B-trees and B*-trees.
article summary:
"disk is slower than RAM. some doofs don't realize their system is swapping. ergo algorithm is bad."
throw in 'Knuth is wrong' to generate page hits.
???
profit.
non-sensationalized takeaway: "remember swap is slow; try not to use it."
"If still these truths be held to be
Self evident."
-Edna St. Vincent Millay
You mean people are implementing heap like that ?
That would indeed be stupid. People have been using B-tree http://en.wikipedia.org/wiki/B-tree instead of Binary search trees for the cache access (or sequential disk IO) reason forever.
More recently we have been talking about cache oblivious and cache aware algorithm : http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
I should write an article about Z-order and be on the first page of slashdot...
link
Don't Journaled filesystems themselves use b-trees? Does this mean they could be much faster in theory?
Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.
Yeah, using this logic, once quantum computers get to the point of solving general problems there's going to be an awful lot of people who "got it wrong" because their algorithms do not apply in a quantum computer. Advancements in technology causing algorithms & data structures to be tweaked means the original person who thought them up "got it wrong"? Ridiculous.
"Oh, RSA was brilliant. But they got it wrong. You see, they failed to account for xanxinglydiumapping in 2056 computers. Poor stupid wrong bastards."
My work here is dung.
Yes it's true. In some real-world applications an algorithm encounters it's worst case running time more than the predicted theoretical average case running time. This is where case by case optimizations come into play.
Knuth never claimed the algorithm was the best choice in YOUR particular case. Don't drag his name through your sensational mud for the sake of your slashvertisement.
There isn't any "off by ten error", and this isn't telling us anything we don't already know (in CS terms): implementation on an actual computer can be different in performance from an abstract machine.
What the author is saying (quite well) is that the virtual memory performance amounts to cache misses, which cause extra performance overhead. He found a case where it was significant and got a 10x speedup in his particular application.
The article is a little over-zealous its characterization, though it's careful to note that this is not actually a theoretical novelty. The summary, on the other hand, bastardizes and exaggerates it.
The article is interesting, and worth reading, but if you RTFS without RTFA you'll be dumber than you were before. Thanks, kdawson.
I am sympathetic with his argument, but trying to figure out the best algorithm for a given OS and hardware combo is very difficult and usually not portable.
The bitter lessons of a veteran coder: http://bitterprogrammer.blogspot.com
I think you only get it if you find an actual bug in Knuth's software though :)
That's actually not the issue.
The issue is that in the real world, the assumptions underlying the calculation of algorithmic complexity may not hold. For instance, Knuth's analysis that the author of the article here holds to be misleading (not, as the Slashdot title suggests, "wrong") calculates the complexity based on the assumption of ideal random access memory, that is, memory for which all accesses are equal cost.
In real-world computers using caches (as pretty much all real-world computers do, often at many different levels) that assumption does not hold -- access to some parts of the memory address space are much more expensive than accesses to other parts of the address space, and which parts of the address space are expensive changes over time (and how it changes over time potentially depends on the caching strategy used at every level lower than the level at which the algorithm is implemented.)
This means that algorithms in the real world can scale worse than their theoretical "worst-case", if that theoretical worst-case scaling is based on the assumption of constant memory access cost, since that assumption does not hold in the real world.
Algorithm revised in light of real-world performance constraints! Read all about it!
Seriously, we just rewrote a tree (that runs in a high traffic serving environment) this month at work because it wasn't streamlined just right to take full advantage of the underlying architecture. No one will write a paper about it.
Also, hey kids, profile your code.
R-G colorblindness makes it near impossible to distinguish the graphs from each other.
Poul-Henning is definitely not a "doof". He's single-handedly responsible for a huge amount of the FreeBSD kernel. Yes, this is the same FreeBSD that powers Yahoo!, that provided a significant portion of Mac OS X, and runs hundreds of thousands of businesses world-wide.
To suggest that phk doesn't know what he's talking about is absurd. He's one of the top three UNIX gurus in the entire world. In fact, the Internet today is what it is thanks to his hard work and dedication.
I'm not sure what the advantages are of dancing around with the OS's virtual memory system. If it were me, I would detect the amount of physical memory available, allocate 80% of it, lock it into physical memory and manage the paging myself. It would be portable, not subject to changes in the VM algorithm by the OS authors, and easier to directly monitor and improve performance. But that's just me.
I don't have time to read through the article and verify the numbers (at least right now) but anyone who's even paged through TAOCP knows it was written for computers where "swap" was what the operator did when the tape was full. (Ok, they also had drum memory).
Do you even lift?
These aren't the 'roids you're looking for.
If you meet him some day, and you think this stuff is worth it, buy him a beer.
If I have seen further it is by stealing the Intellectual Property of giants.
Good article (increase your locality to get fewer page faults). Stupidly wrong Slashdot headline and summary. "Off by ten error?" Please!
kdawson, next time, RTFA before you post someone's lame summary.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
The summary is wrong when it talks about "an off by ten error in btrees". In fact, the article talks about how normal binary heap implementations are slow when virtual memory is taken into account.
In fact, b-trees ARE cache aware and ARE optimized to limit paging on disk. PHK's algorithm is essentially a cache-aware version of a binary heap.
That is, binary tree is to b-tree as binary heap is to PHK's b-heap.
The RAM resident stuff is still useful, both at a lower level, but also for those of us creating applications that can live entirely in memory. A web site I did recently is careful to ensure that the entire primary data set can fit in memory, and for that site everything he wrote is still perfectly valid.
In fact, for very high performing websites you try to ensure that at least most of your requests come from memory rather than disk, which makes Knuth's stuff more important than ever. If you can't do it in RAM then you'd better have a lot of spindles!
Continuing a discussion...
Seems to be that this bheap just reduces the number of pages probably needed from log(n) to log(n)/C where C is the number of levels that are stored in the same page. And for removing a node it may need to access log(n)/C random pages. So this is just a constant factor improvement... it's just that the constant is number of pages so has a large real world cost.
I'd like to get people's thoughts on using timer wheels instead, like the linux kernel uses for timers. Say you take say 8 wheels of increasing spans of time where each wheel is just an unordered list:
insert: one of the 8 wheel list-ending pages must be resident (certainly will be)
delete: just zero out the list entry (probably 1 page-in).
remove min: 1 page to pop the list head at wheel 0, or a bunch of sequential accesses to cascade the expire timers down.
Could some data structure expert please comment on the relative merits of bheap vs timer wheels? Seems to me that a small fixed set of pages and sequential access should be far better in terms of swapping. The OS should be designed to recognize in-order access, especially if each list is a mmap'd file, and the thread blocked does not need to have any locks (you can replace the list with a new one while it is being processed).
I forget the exact amount, but it was like PI or E dollars for every typo. I am not sure what the payment is for an algorithmic error.
"Virtual memory leads to virtual performance."
Just what I always wanted, a B-tree implementation that is guaranteed to swap.
The author of the article doesn't seem to understand the difference.
In a virtual world it lies: I suspect that most Operating Systems lie about what physical hardware:
VMware allows for over-commitment of most hardware. (CPU, Memory, and Hard Disk). Windows allows for over-commitment of Memory
Since this is making assumptions about memory management: (Flash: Various algorithms may be tuned for optimized use in your specific use-case).
In your case of grabbing 80% of memory: This works if you really need the memory - in which case you have to have it: If this forces Windows (linux, whatever) to swap some "system" memory it make slow down system processes (disk write, responding to network requests, running tomcat...) and cause your system to slow down. Perhaps it works for you.
If your "system" resides in VMware, your memory may be a swap file depending on over-commitment. Say some Joe-Rent-a-Network has connected you a server in a "secure" location. In a Co-Lo. On VMware. Now if each system has an application (or two) that grabs 80% of available memory...
Lets say Your 80% memory grab meets the swap file, across a network link to the cheapo SAN...
If I'm Joe Rent-a-Network and you make me have to go figure out why performance sucks for everyone I'm gonna kick your ass.
Now get off my lawn
Parser error.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
But everyone already knows that (or should). Knuth knows that as well, caches and paging systems are not unknown things to him. He was writing a text book for students, and simplified a complicated problem to the point where it can be studied and analyzed more easily. Similar to teaching students Newtonian physics. It is sensationalist and naive to call Knuth wrong here.
Extra, Extra!
GOTO Successfully Used Without Harm!
Funny how often I wind up using those tags.
Hail Eris, full of mischief...
E pluribus sanguinem
I agree with all of your post except this part. Algorithmic complexity theory is about orders of magnitude, not about precise numbers. That's why we have O(n) as a complexity class but not O(2n), O(3n) as separate classes; saying that an algorithm has O(n) worst-case time performance is saying that the time it takes to run it is approximated by some linear function of the problem size. We don't particularly care which linear function it is, because that depends too closely on your assumptions about the computational model and/or hardware. What we really care about when we call it O(n) is that that's better than something like O(n^2) (a.k.a. polynomial time or just "P") or O(log n), no matter what computational model you assume.
As long as the slow memory accesses in the physical hardware still respect some bound, you can treat that upper bound as the constant worst-case memory access cost, and use the algorithmic complexity analysis to calculate an upper bound on the algorithmic complexity. If we turn your argument on its head, we'd say instead that the actual physical cost of running algorithms is often faster than the complexity class indicates because many memory accesses are much faster than the constant upper bound, but that doesn't make the linear upper bound invalid. The best you might be able to do is to assume a finer-grained computational model with variable cost memory access, and prove that you can get a lower upper bound there, but the original higher upper bound is still an upper bound for the computational model chosen, and can still be useful when reasoning about a system.
Are you adequate?
I'm interested to see if this counts as a true error and thus deserving a check.
Come play free flash games on Kongregate!
"Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time." - by siddesu (698447) on Monday June 14, @07:29PM (#32572714)
The entire discussion in the article really reminded me of 2 classes from academia: Discrete Math (where you see the Set Theory, Probability Theory, Logic, & the P-NP type material discussed in this article), & DataStructures (where you learn about Binary Trees and implementations via computer algorithms of them (which leads into things like Tri-E etc.)).
I'd absolutely HIGHLY recommend both courses to any CSC student in fact (they BOTH make you *THINK*, & much more than other classes I felt @ least, did...).
Why?
Heh - it was the ONLY things that allowed me to "keep up" with some of the article's material is why... lol! Being "familiar beforehand" tends to help that way, you know?
APK
P.S.=> The MORE DIFFICULT of the two subject though, imo @ least? Discrete Math (for me @ least) - As I felt that you don't just "learn it by rote only" & just "doing your homework" only either, as many other maths are like... this one?? You have to have your "thinking cap on", 24x7 in it... still, it's GOOD stuff! apk
Unless you have many servers
The article does in fact mention many servers, specifically replacing twelve Squid servers with three Varnish servers.
it is cheaper to throw money at the problem in the form of physical RAM before you start thinking about problems like this.
No matter how much physical RAM you have, you're still limited by the speed of the interface between RAM and the CPU. If a set of closely related nodes can fit in a cache line, this improves locality so that the CPU can do more work instead of twiddling its thumbs.
For instance, Knuth's analysis that the author of the article here holds to be misleading (not, as the Slashdot title suggests, "wrong") calculates the complexity based on the assumption of ideal random access memory, that is, memory for which all accesses are equal cost.
No, it assumes memory for which all access are *at worst* a certain equal cost - in other words, memory for which accesses have an upper bound. Knuth's analysis still holds if certain pieces of memory are FASTER than the norm. If we take memory access going through swap as the normal, worst case, upper-bound cost of memory access, and RAM and cache hits as special better cases, the analysis still applies to the real world.
This means that algorithms in the real world can scale worse than their theoretical "worst-case", if that theoretical worst-case scaling is based on the assumption of constant memory access cost, since that assumption does not hold in the real world.
Doesn't it? The theoretical worst case corresponds nicely to the real-world worst case of all memory coming from swap. The (typical) case in which there is also a bunch of fast RAM and even faster CPU caches is just not the worst case.
The lesson here is that a sufficiently large corporation is indistinguishable from government. --ultranova
Just don't quote Monty Python or Munroe will mock you. http://xkcd.com/16/
Unfortunately, he no longer gives out reward checks for finding bugs in his texts. This seems to be mostly because proud bug-finders inevitably post images of the checks online, which of course, contain Knuth's bank account numbers. More discussion here.
...not dead yet, to quote Monty Python.
He's one of the top three UNIX gurus in the entire world. In fact, the Internet today is what it is thanks to his hard work and dedication.
Still, I'd trust Don Knuth over Poul-Henning any day--at least Knuth can spell his own first name correctly.
One assumes Knuth the KDE clone of Gnuth.
Come on!! Anyone who has read last 30 years of database research knows data structures which take into account locality of reference perform better. Any commercial database has lots of similar tricks to provide better performance than home-grown datastructures. The article is technically correct, just not something that is new. Anyone implementing large database systems knows this. Most of this work is done within commercial systems so open sources programming wake up one day and think they discovered something know. Go and read ant SIGMOD or VLDB paper and you will find similar claims.
"No, and no. The data structure described by Henning-Kamp is not a B-tree, but a heap" - by jemfinch (94833) on Monday June 14, @09:13PM (#32573608) Homepage
It does appear from "Figure 5" though that it's a typical "Binary-Tree" construct though... I mean, look at HOW it's structured in its node elements!
I.E.->
---
1.) Start with midpoint of dataset as top most parent node since its a BINARY SEARCH involved (easy enough to determine using two pointers (or array indices really even) & doubling the 2nd pointer/array index as to it's size/position for the 2nd one, & keep on doing it, until it "errs-out"/abends trying to double said 2nd pointer/array index location (vs. the first one), which then you can use a Try-Catch type err handler to yield you the midpoint of said dataset which the first "1/2 sized pointer/array index" will then have the value of (you can do this w/out array bound checking on this way even & NOT "Crash", or with it on (overriding std. structured err-handler of array index out of bounds errs too)))...
You can do this "puny trick" above, in order to find said "midpoints" of datasets, even w/out its TOTAL items count being known beforehand.
2.) Then, send each potential child node off to where they belong comparing to nodes as you go, & placing "smaller-than-compared-to-potential parent" child node candidate item to left of said "potential parent/compared-to" node, & larger child node candidate off to right of said "compared-to potential parent" node possible, in order to place child nodes off parents, etc./et al though.
3.) AND, "off you go", to "the land F A S T binary seeks" of items!
---
(Yes - Oversimplified maybe, but that's as best as I know how to "put it on paper" fast/offhand @ least)
Tell me if I am "off" here, as it's been awhile since I visited this type of material... it seriously re-reminds me of DataStructures class in combination with Discrete Math class materials, which is why I posted this here, originally -> http://developers.slashdot.org/comments.pl?sid=1686094&cid=32573724 !
This type of article?? It's the "WHY" of why I frequent this website's pages...
(As every once in a while, there is a "WILD" article about CSC "breakthroughs & ideas", & what-not, that I like reading! This WAS "one of those times"... good stuff!)
APK
P.S.=> Anyhow/anyways: It was good to see they are into using "SSD's" too, to lessen latencies (been a HUGE fan of those here for decades in software ramdisks, & later with "True SSD's" (that don't use FLASH RAM, which is slower on writes than DDR or PC-1xx RAM is for instance, though caching delay writes can help FLASH one SOME) like the CENATEK RocketDrive &/or the Gigabyte IRAM for example (they're awesome to serve up websites OR DB's for them mind you, in fact, because of such read speeds & LOW latencies + on the type I use (non FLASH RAM based)? F A S T Write too)) & they seem to "lend themselves" to this article's concepts well also (as things like Binary Trees ARE about increasing search speeds too & databases' indexes, iirc, also use them LIKE MAD also and? THEY WORK FOR SPEED!)... apk
Pick your deriding descriptive term.
How in the hell did he ever get promoted or appointed or how ever the hell it is you become one of the anointed few who decide what gets posted on /. and what the headline looks like? I mean really, is he related to someone? Is he the owners kid, nephew, love child?
Hey KID! Yeah you, get the fuck off my lawn!
For a well understood generalization of this "problem" and solutions to it, see http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
For a simpler model which still captures the behaviour that the author talks about see "the external-memory model"
It "models a two-level memory system. The first level of memory (the cache) of size M is partitioned into blocks of size B. The second level (main memory or disk ) is limitless but requires us to pay a charge for each access (memory/block transfer). In this model, we measure performance by the number of memory transfers (cache misses)." [ From http://theory.csail.mit.edu/classes/6.897/spring03/scribe_notes/L15/lecture15.pdf ]
There are well known improvements to data structures in both of these models.
He was writing a text book for students, and simplified a complicated problem to the point where it can be studied and analyzed more easily. Similar to teaching students Newtonian physics.
I don't find it all that similar. Newtonian mechanics describes behavior of the objects that people see within their daily lives to within measurable noise level. Ideal random-access memory does not. It's as if aerospace engineers were taught the version of gravity with g (Earth sea level acceleration) instead of the version with G*m/(r + h)^2 (acceleration given arbitrary planetary mass, planetary radius, and altitude). g works for land vehicles but not spacecraft, and as a lead-in to discussion of G*m/(r + h)^2, but not as the final word on useful applications of the subject.
A 300-GB backing store, memory mapped on a machine with no more than 16 GB of RAM, is quite typical. The user paid for 64 bits of address space, and I am not afraid to use it.
As an undergraduate in the early 1990s, I am sure I was told about "out of core" algorithms within my first two years in school, including how people were starting to apply these to "out of cache" scenarios. In other words, everyone knows and takes for granted that you do not run a RAM-oriented algorithm such that its working set expands beyond the RAM space that has the access metrics you require. And B-trees/B*-trees are the classic data structure to organize data that is too large for your main working RAM. And the HPC community has been tuning hybrid data structures to live in multi-level RAM hierarchies for a long long time. I don't care how much of a "guru" the author is. If he wants to believe virtual memory is a transparent abstraction (a typical unix acolyte mistake), he is ignoring so much computer science; virtual memory is a slightly more graceful failure handler for temporary resource crises. If your average working set ever meets swapping, you are doing it wrong.
This really isn't anything new. Knuth didn't get it "wrong". He based his analysis of the algorithms assuming a system that had dedicated memory and where each instruction of code ran uninterrupted and in a consistent fashion.
Certain memory access patterns are "bad" under a system that uses virtual memory, especially when the base system is memory constrained. This has been a well known fact for decades. In fact one of the maybe lost arts of programming was ensuring reference locality, not only of data, but also of code. It was a common practice to ensure that often called subroutines or functions where either located in same page of memory as the calling code, or to group all the often called functions into as few pages of memory as possible.
Basically, every address space has what is sometimes called a working set, a set of pages that have been recently referenced. There are three things that can happen with a working set. It can remain the same size, it can grow and it can shrink. If it remains the same, there is no additional load to the operating system. If it shrinks, there is no additional load to the operating system, in fact this can help a memory constrained system. A growing working set however an lead to a thrashing system. Some operating systems will monitor working set sizes and can adjust dispatch priorities and execution classes depending on what the recent working set size history is. An application with a growing working set may very will find itself at the end of the queue way behind applications that have a static working set size.
Take for an example the following very simple program
Here the working set of this program will be very small. Ignoring the file i/o routines, all the code and data references will be limited to basically a fixed section of memory. From a virtual memory stand point, this is a "well behaved" application.
Now take the following
Functionally the same program, however the data reference pattern here is all over the place. The working set will be large, since many of the buffer pages will be referenced. The program never stays long on the same memory location.
Finally take the following example
Here there will be an initially huge working set as the data is read in. However, the working set will shrink to a reasonable size once the numbercrunching phase starts since the data references will all be localized to a small block of memory.
What's with this retarded, inflammatory and incorrect summary? Oh right, it was kdawson.
Dear slashdot,
Please fire kdawson and hire someone who's less of a muck-raker to edit your summaries.
Thanks.
No, it assumes memory for which all access are *at worst* a certain equal cost - in other words, memory for which accesses have an upper bound. Knuth's analysis still holds if certain pieces of memory are FASTER than the norm. If we take memory access going through swap as the normal, worst case, upper-bound cost of memory access, and RAM and cache hits as special better cases, the analysis still applies to the real world.
Well, both atan() and += are "operations". But clearly a loop of N atans, while of the same polynomial complexity, will be orders of magnitude slower than a loop of N additions. This is because += is O(1) while atan() is O(nlogn) for some accuracy n iirc. Similarly, there's a complexity difference between accessing L1/L2 cache and accessing main memory over a shared bus. The two will never converge or cross; there is no data set size where atan() will outperform +=. To say that the former is a worst-case cost estimate for the latter is only correct in a discrete mathematical sense, for all practical purposes it's meaningless. An algorithm that can rely on addition will always be superior to one relying on atans, even if of the same complexity. In fact, even an O(nlogn) addition-based algorithm will be superior to an O(n) atan based one - the two will converge at infinity (assuming I'm correct about the complexity of atan), so for any finite (real-world) data set the former will be faster.
I know he's a legend and all (and most deservedly so!), but it's not like he's dead or anything. He's only 72. We could ask him.
Before you ask your questions though, thank him - even if you don't know why. You owe him more than you know.
Help stamp out iliturcy.
I can't wait to read the comments on this article. You can be sure of learning a lot. This is the kind of article I miss being featured on Slashdot more frequently.
The issue is that in the real world, the assumptions underlying the calculation of algorithmic complexity may not hold.
Oh, finally reach chapter 2 in your CS 101 textbook, did you?
This means that algorithms in the real world can scale worse than their theoretical "worst-case", if that theoretical worst-case scaling is based on the assumption of constant memory access cost, since that assumption does not hold in the real world.
Hmmm, yes. Probability and statistics only tend to hold true as the sample size increases. This means that in any given single situation, it is nearly impossible to determine if the statistical predictions are going to be valid.
And when your assumptions you base your analysis are wrong, then your predictions mean squat.
Film at 11.
Timer wheels don't scale
This is somewhat counter-intuitive, but it is nonetheless true, if you are wanting to have a very large number of timers, such as if you wanted a large number of 2 MSL timers because you want to support a huge number of TCP connections.
The problem is that given a number of buckets on the wheel, in order to cancel a timer, you have to either keep them as a doubly-linked list, or you have to run on average 50% of the buckets in the list. For timer lbolts, when the wheel turns to the next slot, you have to run the whole list and check each element to see if its expired, since you can't order the elements in the bucket in time-order to preclude the need to continue running when you hit the first non-expired timer. This is because the elements are (effectively) hashed onto the wheel slot. Even if you have buckets-by-magnitude, it's actually not good enough to represent every resolution equally.
For things like the TCP 2 MSL timer wheels that are used in BSD4.4 and later, for scalability, you are better off going back to the BSD 4.3 tailq-list implementation so that you don't have to run all the entries; as previously noted, most timers are cancelled before they expire, so if the lists are ordered by increasing expiration interval (which they can be, for fixed interval values), your overhead then doesn't go up especially badly as your number of timers outstanding goes up.
I've made this change to the TCP 2 MSL timer implementation at several companies, and got good scalability wins every time. One of those companies was Array Networks, and we sold a commercial reverse proxy cache similar to Varnish, although we did massive kernel work in order to scale it better (as in handling ~36,000 TCP connections a second).
-- Terry
I'm fairly certain that past a certain volume, keeping a significant fraction of requests resident in RAM is not actually possible. Consider sites serving up large media files that are constantly changing, and even with a fairly large budget having RAM to cache all of that content is not reasonable. It's not much larger when having sufficient spindles for naïve algorithms is also not an option.
At this point having intelligent algorithms that understand non-uniform memory access patterns is essential, whether it's L1 - L2 cache, cache to SDRAM, RAM to disk, and even local disks to network storage. This is where algorithms like B*-tree and the B*-heap start to perform much better, since they're designed around such non-uniform memory access. Database engines also have been designing indexes around these principles for decades, although in a very specific way that's much more difficult to use generally.
So while it may be comforting to throw out the RAM card and give up when the reality of budgets and billion-dollar-RAM-caching systems are unobtainable and give up, actually trying to solve the underlying problem is much more interesting.
There might be something to do with the Knuth bounty. The dollar value reward is not particularly high,
but the Knuth reward check is some collector's item worth many time more than the face value.
What a misleading title, it is not even in the same continent as the article.
A large number of people obviously didn't read the actual article.
And I guess Knuth has quite a fanboi community on slashdot. I wonder if he really appreciates that ?
Some of those who did read the article, does not seem to know the difference between a binary heap and a binary tree, and even the pretty strong clue to the difference in the text, did not make them go check wikipedia. 10 out of 10 for selfesteem, but 0 out of 10 for clue.
Those who think CS should be unsullied by actual computers should make sure to note this belief on their resume. (Trust me, everybody you send your resume to will appreciate that.)
Those who advocate getting rid of Virtual Memory must have much more money for RAM than is sensible. I wish I could afford that.
About five comments tries, in vain it seems, to explain the point of my article to the unwashed masses (kudos!, but really, what are you doing here ?)
Not one comment rises to a level where I feel a need to answer it specifically. On Reddit over the weekend there were about a handful.
Is that really par for the course on slashdot these days ?
Sic transit gloria mundi...
Poul-Henning
Poul-Henning Kamp -- FreeBSD since before it was called that...
http://research.microsoft.com/en-us/um/people/trishulc/papers/cache-conscious.pdf
This needs a "small bit of correction" (or does it?)
"2.) Then, send each potential child node off to where they belong comparing to nodes as you go, & placing "smaller-than-compared-to-potential parent" child node candidate item to left of said "potential parent/compared-to" node, & larger child node candidate off to right of said "compared-to potential parent" node possible, in order to place child nodes off parents, etc./et al though." - by Anonymous Coward on Monday June 14, @10:18PM (#32574034)
The bolded portion quoted from my original post actually SHOULD go the opposite, as follows:
smaller-than-compared-to-potential parent" child node candidate item to RIGHT of said "potential parent/compared-to" node, & larger child node candidate off to LEFT of said "compared-to potential parent" node possible, in order to place child nodes off parents
(There! Corrected my own "mistake", before anyone else did...)
APK
P.S.=> Thing is though - do you HAVE to put the smaller child elements to the RIGHT of any parent node item and the larger child elements to the LEFT of any parent node item? I have always wondered IF you could do the opposite and get away with it, and also have it work just fine?? It makes sense to me you actually COULD! apk
Allocating too much memory in userspace so the system starts swapping.
And then he comes up with a more efficient way to use the often swapped in and out memory pages.
Better programming is: keep all the junk you need quick in real memory, leave enough space for filesystem caching as well and prevent the OS from most swap activity.
3 machines with 16GB RAM is much cheaper than 1 machine with 300GB RAM. There's nothing wrong with using swap if you do so wisely.
3x16 = 48, not 300.
But I understand what you want to say.
But if the performance of your app is based upon having a lot of swap activity, you should do it wisely.
Allocating too much and resorting to 'let the OS sort it out' is not the best approach, which leaves space for optimisation. This is such an optimisation.
But better programming would be to ensure the system doesn't swap at all or at a very low swapin swapout rate.
A cache miss isn't just "slower" it flushes the cache and affects subsequent memory access speeds. Thus O(n) can become O(n^2).
Well actually 2 copies would lock 96% of starting available memory. I believe the key word is available,
unless of course you're trying to make a funny and it somewhat overshot the mods of course.
"Ok Kids, come a bit closer! Here you can see the effects of long-term Perl misuse. Horrible, isn't it?
No need to be afraid, it's strong glass he's behind. - by Anonymous Coward on Tuesday June 15, @05:23AM (#32575746)
Per my subject-line above, I actually dislike PERL for "web-work", AND, for system scripting as well (though I do think that CGI bins are safer than javascript work, while doing webpages @ least, but also much slower too typically). Regular expressions work is difficult imo, even with a book helping you is why...
APK
P.S.=> If you thought that the post of mine you had responded to was "bad" or "too techno"? Try my other one on for size here -> http://developers.slashdot.org/comments.pl?sid=1686094&cid=32574034 ... apk
3x16 = 48, not 300. But I understand what you want to say.
I suspect you don't. The point is that TFA mentions 300GB VM, on a machine with 16GB physical RAM.
Three such machines might match one machine with 300GB physical RAM, at a much lower price.
3x16 is irrelevant.
"Unless you have many servers, it is cheaper to throw money at the problem in the form of physical RAM before you start thinking about problems like this."
You are thinking just about a single instance. You may be right for a single instance but that is not the right way to think about writing code.
If his performance claims are true then this is well worth it. He used this in a web proxy. The first place that used this web proxy saved no less then 75% of the resources needed for the task.
Multiply that by however many people can use that proxy and you get a lot of savings in power used and machines needed to get the same task done.
This is one of the great but often under achieved potentials in FOSS. Hopefully this can be applied to other tasks. Maybe even in the STL.
Once that happens you have a classic case of "a rising tide lifts all boats."
This kind of optimistically is exactly what should be done. The enter idea behind first structured and now OOP is code reuse. That ideal really is or should be at the heart of FOSS and frankly of CS in general.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
46 years is a long time in CS.
During which time Knuth could have put out volume 4 a little quicker, explaining the effects that memory hierarchy has on the algorithms of previous volumes.
Or is it a stupid Knuth?
These posts express my own personal views, not those of my employer
I agree. In GP I was clarifying what the issue was, not defending the idea in the TFS headline that "Knuth was wrong". A more accurate (though less headline-worthy) title would be "A naive reading of Knuth may be more misleading than most people are aware."
It's not a very useful heap if it has the middle value at the top.
http://en.wikipedia.org/wiki/Heap_(data_structure)
As usual, Hacker News had it 1) first 2) correct and 3) full of comments from people who obviously RTFA.
So perhaps someone can tell me why I still read Slashdot?
According to the article Slashdot uses an HTTP cache called Varnish, which uses the B-heap. Can any /. ed/admin comment on how effective it is?
Also, this idea seems similar to the locality optimizations proposed in algorithms like merge sort http://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort.
See this:
http://en.wikipedia.org/wiki/Binary_heap
"It's not a very useful heap if it has the middle value at the top." - by rakslice (90330) on Tuesday June 15, @01:09PM (#32580468) Homepage
See that URL at the top, because what's noted in this article is NOT a "std. heap", but instead, a BINARY heap... & it IS built up from a BINARY TREE structure!
Additionally, those ARE basically constructed in the manner I noted doing comparisons of each "potential parent node" to it's "potential children node(s)" beneath it, usually using "Greater than" or "Less than" type comparisons & putting items to the left & right of said "potential parent node(s)". IN fact, see this from that same URL on BINARY HEAPS:
http://en.wikipedia.org/wiki/Binary_heap
"The heap property: each node is greater than or equal to each of its children according to some comparison predicate which is fixed for the entire data structure."
(Just like a Binary Tree is constructed & as I noted in my orig. post)
The "pertinent quote" from said Wiki is this:
http://en.wikipedia.org/wiki/Binary_heap
A binary heap is a heap data structure created using a binary tree.
(And, what do BINARY trees need? A pivot, like many sorts do, and that is usually the "mid-most value", because "binary search patterns", depend on that...)
Hence, why I noted to use my "puny trick" IF NEEDED (not always needed, as for instance in DB work, rowcounts in datasets ARE available), to find said PIVOT midpoint value IF you do not know the total # of elements in your dataset beforehand (or, if it changes, such as values being eliminated during say, deduplications of elements/records of said dataset & say, DB style indexing + reindexing is NOT in place in your app)...
APK
P.S.=> However, I did state "correct me if I am 'off'" in my first post, so "have @ it" guys (as rakslice has attempted to do, but, he used a std. heap NOT what this article's about in BINARY heaps usage)!
(Here? I am PRETTY SURE I am correct, but, I can stand new ideas & corrections as much as the next guy can @ times & though I have worked with Binary Trees more than a few times? Binary Heaps I have not, so... keep those "critiques" coming if you wish!)
So again: "have @ it" with the critiques, especially in material of this nature & the fact I have not "visited it" in QUITE SOME TIME (it's the FUN stuff imo!)... apk
You actually have it backwards. With a finite n, it's completely up in the air which one is faster, since constant factors may outweigh the O() complexity. It's only as n approaches infinity that you can say for certain that O(n) will be faster than O(nlogn). That is the point the GP is making.
Also assuming you have operations that are O(1) and O(nlogn), and algorithms that are O(nlogn) and O(n) respectively, the "addition-based" algorithm is asymptotically faster (they will NOT converge), but that isn't the case in your example, it's actually O(1) and O(mlogm) (an independent input m), so for a bounded value of m, the O(nmlogm) may very well be faster than O(nlogn).
The fundamental problem here is a mixing of concerns, the memory access speed has no affect whatsoever on the computational complexity of the algorithm, which is all that classical big-Oh notation addresses. I wouldn't be at all surprised to find that modern notation has a standard way to denote special costs of operations, but if someone says "algorithm x is O(nlogn)", it is understood to mean idealized computational complexity based on the number of operations performed. If you feel like modeling things like number of IO operations as part of the complexity analysis, that is perfectly valid, but it doesn't invalidate the idealized representation of it's complexity.
Put another way, "Quicksort" has a complexity of O(nlogn) no matter what its set size is or what the operations cost, it doesn't matter that for most values of n the function is not computable, or that a particular comparison operation may take 5.6 millennia to compute, the complexity is still O(nlogn).
"I'd say the GP is referring to your use of the English language, rather than the Perl language.." - by Anonymous Coward on Tuesday June 15, @05:15PM (#32583570)
Per my subject-line above also, IS THERE SUCH A SECTION OF THIS FORUMS, and do YOU have a PHD to your name/credit in English? No?? Get back to us when you do have one, because you & yours are obviosly illiterate or you forgot to take your Dyslexia meds/treatments/therapies (whatever) or ADD/ADHD meds!
Plus, your lack of a PHD in English to your credit/name doesn't exactly qualify you & your kind ("writing style critique slinging trolls") as experts in English yourselves.
(OR, alternately? Well - Perhaps there ought to be so "wannabe PHD's in English" (minus the PHD to their credit/name of course) have a "raison d'etre" around here, other than being the trolls you clearly are)...
APK
P.S.=> Of course, there IS the alternate, & that would be to get them a "hooked on phonics" remedial reading course for you & yours, so you're able to read properly as well!
As to your opinions? Opinions on my "writing style" clearly vary!
I.E.-> I have a truckload of upwards mods around here, shown below (AND, as an AC no less, which is FAR MORE DIFFICULT to get those for we AC's, because /.'s default javascript &/or IE model is to list AC posts as "hidden", to bury them as some "inducement" to register... no thanks, that just makes folks EASILY tracked sheep for trollers like you "wannabe PHD" types, lol!).
Oh, the list of my upmods? Ok sure (these are those who disagreed with the "likes of your kind"):
+5 'modded up' posts by "yours truly" (4):
http://it.slashdot.org/comments.pl?sid=1139485&cid=26975021
http://it.slashdot.org/comments.pl?sid=1139485&cid=26974507
http://it.slashdot.org/comments.pl?sid=170545&cid=14210206
http://hardware.slashdot.org/comments.pl?sid=175774&cid=14610147
----
+4 'modded up' posts by "yours truly" (4):
http://slashdot.org/comments.pl?sid=161862&cid=13531817
http://developers.slashdot.org/comments.pl?sid=167071&cid=13931198
http://tech.slashdot.org/comments.pl?sid=1290967&cid=28571315
http://tech.slashdot.org/comments.pl?sid=1461288&cid=30273506
----
+3 'modded up' posts by "yours truly" (5):
http://developers.slashdot.org/comments.pl?sid=155172&cid=13007974
http://it.slashdot.org/comments.pl?sid=166850&cid=13914137
http://slashdot.org/comments.pl?sid=175857&cid=14615222
http://slashdot.org/comments.pl?sid=273931&threshold=1&commentsort=0&mode=thread&cid=20291847
http://it.slashdot.org/comments.pl?sid=1021873&cid=25681261
----
+2 'modded up' posts by "yours truly" (25):
http://it.slashdot.or
"Nobody cares about your shitty posts..." - by Jizzbug (101250) on Wednesday June 16, @11:06AM (#32590744)
Hmmm, well, "opinions vary", & quite clearly, per this list from /. posters alone rating up my posts here many, Many, MANY times, see below...
That's only a SMALL PARTIAL LIST below also, no less, & with myself as an AC poster no less, which does make it more difficult to get "mod ups" really, because /.'s default format is to make AC posts "hidden posts", burying them (or making them be ignored)).
So, does being an "almighty (not, lol), 'registered user' elitist wannabes" who're part of the 'in-crowd' here or elsewhere any smarter/more intelligent & knowledgeable than us AC's?
No.
What it DOES do, however & by way of contrast/comparison, is make YOU all the more "trollable" & trackable via your posting history here (far better than a GOOGLE search for instance would, & that means you are 2x as trackable as us AC's are). Plus, I "blow by" the unfair/unjust "10 posts per 24 hours" limit imposed on AC's, in seconds (literally, without loss of speed like proxies or TOR nail you with)...
The list I noted above? "Here 'tis:"
"My Name is Ozymandias: King of Kings - Look upon my works, ye mighty, & DESPAIR..."
----
Windows NT Magazine (now Windows IT Pro) April 1997 "BACK OFFICE PERFORMANCE" issue, page 61
(&, for work done for EEC Systems/SuperSpeed.com on PAID CONTRACT (writing portions of their SuperCache program increasing its performance by up to 40% via my work) albeit, for their SuperDisk & HOW TO APPLY IT, took them to a finalist position @ MS Tech Ed, two years in a row 2000-2002, in its HARDEST CATEGORY: SQLServer Performance Enhancement).
WINDOWS MAGAZINE, 1997, "Top Freeware & Shareware of the Year" issue page 210, #1/first entry in fact (my work is there)
PC-WELT FEB 1998 - page 84, again, my work is featured there
WINDOWS MAGAZINE, WINTER 1998 - page 92, insert section, MUST HAVE WARES, my work is again, there
PC-WELT FEB 1999 - page 83, again, my work is featured there
CHIP Magazine 7/99 - page 100, my work is there
GERMAN PC BOOK, Data Becker publisher "PC Aufrusten und Repairen" 2000, where my work is contained in it
HOT SHAREWARE Numero 46 issue, pg. 54 (PC ware mag from Spain), 2001 my work is there, first one featured, yet again!
Also, a British PC Mag in 2002 for many utilities I wrote, saw it @ BORDERS BOOKS but didn't buy it... by that point, I had moved onto other areas in this field besides coding only...
Lastly, being paid for an article that made me money over @ PCPitstop in 2008 for writing up a guide that has people showing NO VIRUSES/SPYWARES & other screwups, via following its point, such as THRONKA sees here -> http://www.xtremepccentral.com/forums/showthread.php?s=ee926d913b81bf6d63c3c7372fd2a24c&t=28430&page=3
----
What do I have to say about that much above? I can't say it any better, than this was stated already (from the greatest book of all time, the "tech manual for life" imo):
"But by the grace of God I am what I am: and his grace which was bestowed upon me was not in vain; but I labored more abundantly than they all: yet not I, but the grace of God which was with me." - Corinthians Chapter 10, Verse 10
(And, because I got LUCKY to have been exposed to some really GREAT classmates, professors, & colleagues on the job over time as well)
====
+5 'modded up' posts by "yours truly" (4):
http://it.slashdot.org/comments.pl?sid=1139485&cid=26975021
article summary:
"disk is slower than RAM. some doofs don't realize their system is swapping. ergo algorithm is bad."
Poul-Henning is definitely not a "doof"...To suggest that phk doesn't know what he's talking about is absurd.
Dude, chill. Nobody called Poul-Henning a doof. They only paraphrased him as calling other people doofs, namely those people who blindly follow Knuth's algorithm without checking whether its assumptions apply to their modern situations. If such people are doing professional software development, I would also call them doofs. Everybody acknowledges phk's all-knowingness, but some people were criticizing the headline by kdawson. Nothing to see here, move along.
"I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
http://developers.slashdot.org/comments.pl?sid=1686094&cid=32581292