Revisiting the Five-Minute Rule
In 1987, a study published by Jim Gray and Gianfranco Putzolu evaluated the trade-offs between holding data in memory and storing it on a disk. Known widely as the "five-minute rule," their research was updated and expanded 10 years later. Now, as jamie points out, Communications of the ACM is running an article by Goetz Graefe with another decennial update, evaluating the rule using hardware and software typical of 2007, with an eye toward how flash memory will affect the situation. An excerpt from Graefe's conclusion:
"The 20-year-old five-minute rule for RAM and disks still holds, but for ever-larger disk pages. Moreover, it should be augmented by two new five-minute rules: one for small pages moving between RAM and flash memory and one for large pages moving between flash memory and traditional disks. For small pages moving between RAM and disk, Gray and Putzolu were amazingly accurate in predicting a five-hour break-even point two decades into the future. Research into flash memory and its place in system architectures is urgent and important. Within a few years, flash memory will be used to fill the gap between traditional RAM and traditional disk drives in many operating systems, file systems, and database systems."
http://en.wikipedia.org/wiki/Five-minute_rule
"The 5-minute random rule: cache randomly accessed disk pages that are re-used every 5 minutes."
The more useful 5 second rule.
I couldn't quite figure out if the article willfully ignored the advent of SSDs or was written before they were available and not updated to include them (but it appears the article was updated to include other current technology).
Given the fact that SSDs are likely going to replace rotational media for most applications in the future, it makes this article basically meaningless, at least insofar as the fact that flash memory and the disk are/will be synonymous. As the article is basically predicated around the entire fact that flash memory will change the 5 minute rule to a degree, it invalidates the entire article.
To be relevant, the article really needs to include the current state of SSDs and a likely projection (10 year) of where the state of the art in SSDs.
I do, however, suspect we may see a shift from drives all together at some point (perhaps more than 10 years, but perhaps not) and the computer will just have persistent storage for everything in MRAM or some other technology that obliterates the line between RAM (for speed) and drives (for storage) - it's just one big pool that's hyper fast and persistent.
So really, I don't think this article has held up in even the intervening two years since 2007, and it certainly won't hold up for another 10 years.
Isn't it still the case the flash drive speed slowly degrades as they fill up and delete blocks, as it marks blocks off as used even though they are half full, etc? And that windows 7 is going to somewhat address this issue? Also, are their claims now that you can get millions of writes still holding water? I'm not real convinced yet. The speed is there, but there still seem to be fundamental issues. Like for instance, this PC I'm using right now is a backup machine, and its old 40gig drive is really slow. I can boot linux off a USB flash drive. Would that be any faster, and more importantly, how long would the usb drive last from swapping? Theoretically it should be faster and throughput should be higher with USB 2. I'd only need like a 16 gig stick or something......
zosxavius photography
The article I read spent a good deal of time talking about flash memory. What article are YOU referring to?
five minutes is an awful long time for food to remain on the floor before you pick it up to eat it...
Ask Me About... The 80's!
How clean is the floor and how sticky is the food?
Stickier food may be better to eat off of the floor, as the part that touched the floor is somewhat more likely to stick to the floor, rather than some of the floor sticking to the food that you eat.
Nerd rage is the funniest rage.
Not as long as you scrape all the dog hair off first.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Started reading and found stuff like "every 400 seconds, which they rounded to five minutes" - 5 minutes = 300 seconds, so their rounding was knock 33% off - great accuracy.
Then "The break-even interval is about inversely proportional to the record size. ... and two minutes for 4KB pages." followed by "Nonetheless, the breakeven interval for 4KB pages was still around five minutes." - so was 2 minutes - now 5 minutes so a 150% increase is ignored.
Glad these people don't work out my utility bills.
Fucking same thing for me haha. I thought they were going to show evidence that the food one drops on the floor is in fact *not* safe within 5 minutes, only within 2.
that's my word, holla...
Aha, finally someone else who gets this concept. I have tried (and failed, so far) to explain this to a few people.
10 FILL MUG WITH COFFEE
20 DRINK COFFEE
30 GOTO 10
These days, the database is not what it used to be. Local clients use shared memory. JVMs and entire web servers are incorporated directly into the database executables. The old concept of the separation of the database from its clients no longer applies.
When you are running a database, what business does the OS have, deciding what data is to be paged in or not? The database is in a far better position to make these decisions, and it can be based on much better rules than "5-minute" heuristics.
I am waiting for a true hybrid system to be built. One that has the OS installed in read only flash and applications on a separate drive. you might ask why? but then stop to realize what would happen if viruses couldn't overwrite the system settings. that to clean up a virus all you had to do was to reboot.
How would such a hybrid system correct a discovered defect in the operating system?
This article indicates that Flash Memory (AKA SSDs) are only going to be an intermediary between rotational media and RAM.
If your handheld device or subnotebook PC has only an internal SSD and no internal hard drive, then you will store any data that doesn't fit on your SSD on a hard drive plugged into a Hi-Speed USB port, copying it to the SSD when it is needed. For example, you'd keep the video footage that you are editing on the SSD and other projects on the hard drive. That sounds to me like a memory hierarchy, albeit one that occasionally requires manual intervention to connect the (offline) long-term mass storage to the machine.
It's just a bit dirty, it's still good, it's still good!
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
I don't know if I agree with their use of flash as extended ram. Why not just make it a super fast drive that sits close to the cpu and give it the illusion of being an IDE drive so us normal users can just make it a swap partition.
A swap partition has a lot more erase traffic than a data partition, and access patterns high in erases are thought to wear out SSDs faster than hard drives.
If you'd RTFA in the last article about the 5-second rule, then you'd know that it is a modern formulation of Genghis Khan's 12-hour rule, stating that you should not eat meat that had been on the floor for more than 12 hours. You'd also have learned that the amount of bacteria transmitted from a floor in five second and five minutes is about the same.
I am TheRaven on Soylent News
Does swap really erase? I was always just under the impression that if a block was no longer used it was just demapped. If the OS decides to write to that block again it just merely overwrites that block.
A flash block must be erased before it is overwritten, and each sector is guaranteed for only about 100,000 erases. Lots of writes are fine on larger SSDs because the wear leveling program in the drive's controller will move rarely-rewritten logical sectors (such as free space and read-only files) to more-worn physical sectors. But if you're devoting an entire device to swap, there won't be a lot of rarely-rewritten logical sectors for the controller to make use of.
The article I read spent a good deal of time talking about flash memory. What article are YOU referring to?
The article treats flash as something you place in between hard drives and memory. This turned out not to happen (with a few exceptions). SSD's simply replace hard drives. Hybrid systems are rare, and it doesn't look like they will become more common -- either you can live with the slowness of hard drives, or you can't. The mainstream will switch to SSD's for everything except backup applications.
There are some hybrid SAN's, but they're damn expensive. At that price they have a hard time competing with simpler pure-flash SAN's.
Except if you're using ZFS. You can put a (SLC or MLC) SSD drive into just about any system and tell it to act as a (write or read) cache.
The way I see it, the advent of SSD storage gives us the ability to extend the cache layering abstraction we already use into a smooth continuum between the cpu's L1 cache (L2, L3, dram simms, etc...) and traditional HD-based disk cache. SDD doesn't quite close the gap but it actually fills a major portion of it.
The article makes the mistake of assuming fairly small SSDs, but is otherwise spot-on. It isn't possible to use tiny SSDs in the 8G range as a paging medium for caching memory, it simply isn't enough storage for wear levels to be acceptable. On the other-hand, a 64G-256G SSD provides a better basis both in wear leveling and in traditional cache metrics. This is particularly true as hard drives exceed the 2TB/unit mark.
There is a general fallacy here where some people seem to believe that SSDs will replace hard drives. That is not going to happen any time soon or possibly even ever, not unless flash densities can be increased by two orders of magnitude. The cost per byte of storage is immense between the two. What *IS* happening is that SSDs are now capable of replacing HDs in particular circumstances where the absolute quantity of storage is not the primary need. At the same time traditional HDs themselves have replaced a large portion of the spectrum that used to be held by archival media, and the need for terrabyte-sized storage systems has increased drastically as high resolution digital cameras and video become more important to the general consumer.
I think most people still have the 'paging is bad' mindset, because traditional paging is a fairly inefficient operation on a traditional hard drive. That isn't the case when it comes to SSDs as the technology matures. Paging, in fact, could become the tour-De-force that allows systems to more fully utilize all of their resources. SSDs have not quite gotten to this point yet but it is obvious that they will quickly get there, probably in as little as two years.
-Matt
I expect that SSDs will in fact replace HDDs. But probably not with FLASH. Other solid state non-volatile random access memory technologies are likely to come around, hopefully with better wear and density characteristics. There is a general fallacy here where some people seem to believe that SSDs can only ever be composed of FLASH
SIGSEGV caught, terminating
wait... not that kind of sig.
What I love about slashdot is its scalability. The discussion ranges anywhere from the design of a Google data center in 2015 to some guy's psychological stance toward his next netbook purchase in 2009. Sometimes it's unclear which end of the spectrum is under debate, but the discussion happily progresses in a state of astral superposition. When this gets too confusing, even for slashdot, the moderation system helps to sort things out. For example, if the comment
Flash memory is set to replace rotational media.
is moderated +1 insightful, then we know we're talking about some guy's future netbook purchase. Or if the same comment is moderated -1 troll, then we know we're talking about Google data centers in 2015.
Flash memory begins to fade - ZDNet.co.uk from 2005
"The scaling laws are not favourable to flash," said Tom Lee, an associate professor of electrical engineering at Stanford University and a founder of Matrix Semiconductor, which makes a 3D memory chip that performs flash-like functions. "The noises are getting louder now, so it looks like manufacturers are already in that new age of diminished gains."
Numonyx Breakthrough Delivers First 45nm NOR Flash Memory Chips from Jan 2009
"Numonyx engineers overcame major scaling limitations by developing new process techniques to produce the 7th generation MLC NOR flash on the industry's most advanced 45nm technology, and to be the first to bring the cost and performance benefits to our customers." ..."
...
"At a time when the entire industry grapples with the scalability of all flash memory technologies,
Brewster Kahle
I think Brewster Kahle is going to jump off a bridge when he learns that Seagate is exiting the disk drive business in 2010. If you think CERN or EOS cost a lot of money, try updating the budget with SSD specified as the primary storage layer.
A useful way to view this transition is the long tail on steroids. 99% of the world's stored information will be held by a few hundred mega-scale institutions (NASA, Google, CERN, GenBank) on rotating hard drives, while 99% of the world's gadgets have no hard drive at all.
The same thing happened in software. The C language represents a tiny sliver of source code written over the last ten years, but if you could measure the number of machine instructions executed by language of origin, C would continue to represent a very large slice of the pie. A major factor in the success of scripting languages is that the problems these languages don't handle well can be off-loaded to a well established compiled language. If you cherry pick your niche, it's amazing how much more convenient it looks compared to the ancestral technology which didn't.
I thought the paper was quite good, and more relevant than 99% of what I read these days. I'm always interested in analysis of hybrid solutions. In the engineering world, there is a de facto allergy to hybrid solutions. We tend to achieve the best result by scaling a single virtue to the max, rather than engaging in the jello-like trade-offs involved in balancing complementary virtues. I first began to think about this when ethernet trounced ATM by the simple measure of vastly over-provisioning bandwidth.
The exception to this is on the large scale where operational costs exceed all other costs, such as major data centers.
This is one of the reasons why progress in ecology is so painfully achieved: ecological systems almost always demand hybrid solutions, and we're not terribly comfortable with this. Engineers prefer monarchy. In ecological systems, life is complicated, and you can't just sit there and
Since this is slashdot, I'd bet most will pick bacteria over carpet fuzz any day ... after all, if it doesn't look fuzzy ...
or this ...
....don't make slow ass websites. Cache everything. That is the new rule. Google does it. Facebook does it.
The lesser known 5 minutes 5 second rule combines the two: It states that if the case is left off a desktop computer for more than 5 minutes and 5 seconds Pizza and coke will spontaneously migrate from a computer lab desk and contaminate your RAM, CPU and motherboard.
These posts express my own personal views, not those of my employer
I'm not sure where people get this acceptable wear levels for SSD's thing.
All the math I've seen indicates that, presuming reasonable wear algorithms, if you write the volume of your data to disk every day your drive will last for about 30 years and that it will scale linearly with shifts up or down in that amount. An ordinary hard disk lasts about 5 years, so for your 8 GB disk you'd have to be writing approximately 48 GB of data to it every single day in order for it to not last longer than a current HD, and the SSD is still readable at the end(though this isn't super important for caching). That's certainly not an implausible amount of data, even for a home desktop user, but it is fairly high, and since it's fairly linear, you'd reach a feasible number for pretty much any ordinary data set well before 64 GB, let alone 256 GB
I think that it's probably also inaccurate to say that SSD's will never replace HDD's. Hard drive sizes are certainly increasing, but the need for capacity isn't increasing anywhere near as fast. Under present usage patterns, most users will never fill a 1 TB drive within the lifetime of the drive, certainly not with anything that can't be archived. If you can get price point down to 25 cents per GB it would probably be worthwhile to use them in pretty much any circumstance excluding applications involving really heavy writes or a lot of long linear reads. Data Center Storage is already substantially more expensive than that, and the extra speed would make up for a lot of space issues. Even if you had to use 4 disks instead of 1.
That price point probably won't be met in the next year or so, but it's certainly achievable within the next couple. True by then HDD's may be 4 TB for the same price or more, but a couple of TB you don't need vs a faster system isn't much of a choice.
Isn't ReadyBoost essentially a hybrid system?
Also I rememeber that one of the main disadvantages of Btrfs over ZFS was that I doesn't support using SSD to speed up overall access, while ZFS does.
Readyboost and Superfetch are really just hacks to get around the 3GB or so ceiling in 32 bit Vista due to incomplete support of the Pentium Pro and later processors (PAE extension). With the 64 bit versions (or the server 32 bit versions, or any OS produced by anyone other than Microsoft in the last decade) you can use real memory instead for improved performance. Consider that you are grabbing all that stuff from disk and doing the relatively slow write to flash to save time when it needs to go into memory later. The far better answer is to have enough memory and only handle the stuff once.
The point of the original story was that SSD is cheaper than ran. Clearly moving everything from SSD into ram is going to make things faster, but so would moving everything from HDD to ram. Its just a matter of cost vs benefit.
I am not too sure how the cost/benefit of ReadyBoost stacks up, but I'd guess that plugging in a decent flash-drive would be cheaper than trying to find obsolete Laptop ram.
The wikipedia entry for Superfetch doesn't mention anything about only being 32bit, and it sound like it would work better the more RAM it has to play with: In any case, hack or not, ReadyBoost is an example of a hybrid system.
Stickier food may be better to eat off of the floor, as the part that touched the floor is somewhat more likely to stick to the floor, rather than some of the floor sticking to the food that you eat.
Its an interesting subject that probably needs more investigation.
Sticky foods often leave a layer on the floor and as a result the part you pick up has only been in contact with the floor. However, around the edges of the contact the sticky food is very good at picking up loose dust and the like.
Runny liquid-covered foods may stay on the floor at the slightest contact, avoiding the sticky problem, but what about dirt that gets mixed into the top layer of the dropped food during the impact?
Hard dry foods have surprisingly little contact area with the floor. No citation but I do recall seeing studies where even slight moistness leads to a very large increase in bacteria picked up.
In theory, there's no difference between theory and practice; in practice there is.
Indeed I am waiting for a true hybrid system to be built. One that has the OS installed in read only flash and applications on a separate drive. you might ask why? but then stop to realize what would happen if viruses couldn't overwrite the system settings. that to clean up a virus all you had to do was to reboot.
You mean like the way I run Windows under VMware and roll back to a snapshot?
Advanced "universal" memory technologies (fast, non-volitile) such as MRAM, FRAM, maybe RRAM will alter this landscape significantly. While some are available now, we'll have to wait a few more technology generations before they have the density to realistically compete with hard drives or even Flash.
Lurking in the desert
I don't quite see how it would allow you to more fully utilize resources. This is usually the argument given by people that think paging somehow will increase your systems performance because it makes more optimal use of all the memory.
Unfortunately, this is only true if you have the perfect paging algorithm. It basically requires that you have a paging algorithm that is almost psychic in its ability to predict what future data may or may not be needed. Current paging algorithms, although no doubt complex, are however no where near that smart. So given a non-optimal algorithm, there's no guarantee that system performance will actually benefit.
For example, some pagers think that anything read from disk is worth caching, even if the data is being consumed at a rate SLOWER than the underlying media can supply it -- a video stream is a good example, why bother caching such a stream at all when it is consumed at speeds much lower than today's HD throughput? The same goes for unpredictable access patterns -- if the underlying data is much larger than your memory, any caching of such accesses is likely to result in no gains whatsoever. Another great example is a nightly scan of all files -- none of these files is likely to be touched twice in a row (it's a scan after all), but it results in huge amounts of data that may need to be cached.
However, all these examples of where caching actually gains you very little do result in pressures to swap out data that wasn't used for even longer periods of time (despite the fact that it is more likely to be needed again). This results in applications getting swapped out that are dormant. The amount of memory these applications take is nowadays often a tiny fraction of total memory, yet they get sacrificed so that the pager can devote even more memory for what can often be pointless caching -- it accomplishes NOTHING by doing this.
The end result is that after a night of sporadic and slow network activity, maybe some kind of backup or indexing activity, all your applications are swapped out and unresponsive.
Now I realize this is most likely more a discussion intended for server systems, but even those systems suffer from this -- workers arriving in the morning find that their log-in attempts take longer than usual, opening certain applications takes longer than usual, simply because the processes that handle these actions have been dormant too long and got swapped out overnight.
Readyboost and Superfetch are really just hacks to get around the 3GB or so ceiling in 32 bit Vista due to incomplete support of the Pentium Pro and later processors (PAE extension). With the 64 bit versions (or the server 32 bit versions, or any OS produced by anyone other than Microsoft in the last decade) you can use real memory instead for improved performance. Consider that you are grabbing all that stuff from disk and doing the relatively slow write to flash to save time when it needs to go into memory later. The far better answer is to have enough memory and only handle the stuff once.
There's never enough memory for that, which is the whole point of why it's needed. To say nothing of RAM being volatile.
"Free" would mean that Dell, Compaq, and the rest sold Windows machines at the same price that I can buy a comparable (key word, comparable) No-OS or Linux machine. That doesn't happen, nor will it happen any time soon. The fact is, I can almost always build a No-OS machine that is superior to the OEM's offerings for the same, or a lesser cash outlay.
Where can I get a No-OS laptop kit? But even if you're only talking about desktops, your No-OS machine comes as a kit, and end users don't want to have to put together a kit.
I don't have to pay Linux a couple hundred dollars for a license every time I set up a new machine.
You don't have to pay when you build a new machine, but you may have to pay hardware makers to replace Linux-incompatible hardware when you convert a machine from a no-longer-supported version of Windows to Linux. It happened a few times when I converted sub-500 MHz PCs that had run Windows 98 or Windows 2000 to Linux. For instance, I tried Mandriva a few years ago back when it was Mandrake, and it wouldn't run X with even 2D acceleration on my Radeon card. The ATI situation has improved since then, but the Microtek flatbed scanner model that I happen to own is still listed as unsupported in SANE years later.
There are, and always has been, a plethora of vendors willing to sell you a computer without an OS.
Not in my home town. Best Buy and Target recently discontinued their Linux subnotebooks, and locally-owned computer shops tell me they don't do Linux.*
* I define "doing Linux" as either shipping a Linux operating system on the PC or warranting that Ubuntu or Mandriva will work out of the box.