Mass Storage Leaves Microchips in the Dust
Roland Piquepaille writes "This article from Wired Magazine looks at storage with a new angle. 'Right now I am sitting in front of a whirring 60-gigabyte hard disk that cost less than $100. Do the math: If back then 10 megabytes cost $1,000, then 60 gigabytes would have cost x, where x = $6,000,000 and "back then" = 18 years ago. I'm sitting in front of $6,000,000 worth of mass storage, measured at mid-1980s prices. We have Moore's law for microprocessors. But who's coined a law for hard disks? In mass storage we have seen a 60,000-fold fall in price -- more than a dozen times the force of Moore's law.' DeLong also looks at a non-distant future when a $100 mass storage device will hold a full terabyte. He also thinks that with disk space becoming cheaper and cheaper, we'll be tempted to archive everything about ourselves, including pictures and videos. This is in fact the goal of the Gordon's Bell project, MyLifeBits. You can learn more about the MyLifeBits project by reading this NewsFactor Network article. Check this column for more details."
Moore's law says nothing about price though. If you are going to compare hard disks to processors in the same general terms using Moore's law, shouldn't you compare increase in storage size to increase in processing power?
"I turn away with fright and horror from the lamentable evil of functions which do not have derivatives."
18 years ago? When was this written?
2003 - 18 = 1985.
I bought a 20 MB MFM hard disk in 1985 for $400.
looked at another way, hard drive capacities have just been doubling faster than processor speeds.
If 10MB back then cost $1k then 1MB cost $100, so we just do the 60G/1M and get a 60,000 time increase in storage capacity for the same price. Doubling times would then be log(2)60k = 15.9 or so, or about once every 1.1 years over 18 years. Contrast this with moore's law which states that processor speeds double every 1.5 years.
The downside is that access times have tracked closer to a linear function.
I don't agree.
The physical drive passes by the head at a certain rate, depending on the speed of rotation of the platter and the distance the head is from the center of the platter.
Lets say 1 inch from the center, going 7200 rpm.
This means that the disk will be passing under the head at about
2 * pi * 7200 = inches per minute
/ 60 = inches per second
/ 12 = feet per second
((2 * pi * 7200) / 60) / 12 = about 62.8 feet per second 1 inch from the center of the platter.
Now, lets pretend that there is some amount of data in this 6.28 inch circle that the head travels. Lets say 1,000 bytes. This means that 62800 bytes pass under the head in one second. Now, lets make the data more dense, which is how hard drives hold more data. Lets say there is 1,000,000 bytes. This means that there are 62,800,000 bytes passing under the head in one second. So really, the data has gotten faster. That's why you don't really need a high RPM drive that has extremely high data density (like 150gig drives), cause that won't be the limiting factor.
So actually, hard drives have gotten faster. At least the data passing by has gotten faster.
By "process it all", he probably means being able to address the area on the disk (think extremes).
By go over into negative integers, integers are an allocated space in memory that holds a number...if the number is bigger than the allocated space, what does it do!? 11111111 + 1 = 00000000 (keeping 8 bits of data). Look up signed integers. Since it's just binary...how can you represent a negative number? Well, you can't directly, you do it with little tricks that everyone agrees on. Look it up...you obviously need to.
Dude, seriously, what the fuck are you talking about?
Seriously, dude, it might be nice to know what your talking about and speak English, instead of using phrases like "to clown on you".
Your cpu doesn't "process it all" now. If talks with what it needs do.
But the more and more data you have, the more likely you are to try and handle large quantities. Search every text file on your system, or merely scan and process a file at 600 DPI instead of 300.
I'm also pretty damn confused as to what you mean by negative integers? Hopefully that was some weak attempt at a buffer-overflow joke or a stack dump or something because the logical part of my brain
The logical part of your brain obviously never studied computers very much. In assembly, if you continue adding to a signed integer value, it will overflow to negative. In 16 bits, 32767 + 1 = -32768, IIRC. If you program in C or Fortran or any other language that doesn't check overflow, the same thing will happen. I've seen reports that I had transfered -2 GB this session, because the program overflowed at 4 GB. Same principle.
:)
thanks for explaining my joke. it's sad when people flame from ignorance...
muerte
That only really works well for you if you access your disk drives like a magnetic tape (sequentially). Otherwise, you're going to be seeking all over the place and accessing data in a manner that may be too random to exploit disk caching mechanisms.
If you want to get to a particular block on the disk (rather than what happens to be under the read hit) HD seek times still blow.
A Pirate and a Puritan look the same on a balance sheet.
Why is it that everybody continues to equate [M|G]Hz with CPU speed? It's only a component. Those old 4.77Mhz boxes took several Hz to complete a single instruction. A modern Intel or AMD chip runs several instructions per cycle. So therefore, a current top of the line system actually runs 3-5 thousand times faster than the original PC/XT. (But it still takes a couple to 3 minutes or so to boot up.)
The problem is "the last mile". Connection to residential homes just hasn't kept pace. It may be that wireless, not the telcos, will manage to deliver the bandwidth that you seek.
Opinions my own, statements of fact may contain errors
if you want reliability, just use SCSI.
It is more expensive, and more reliable
you don't need the newest ultra320 equipment,
plain ultra wide drives and cards are plenty
fast for most tasks
You've got some problems with your facts. But, since you're playing the math games that I like to play, I'll cut you some slack. And then I'll expound.
First off, disk access time is dominated by actuator movement (seek time). Rotational latency on a 15,000rpm disk is 2ms, not 4. The fastest 15K drives have 3.5-4ms seek time. Slower drives have slower actuators, meaning the ratio of seek time to rotational latency is about the same, 2:1.
Seek time on large drives is of no importance. Seek time on small drives is of supreme importance. Small drives should be used to store the OS, applications, and small data files. Rapid access to disparate regions of the disk is important since these drives are primarily limited by IO/sec. Large drives are used for mass data storage. Large data storage (media, in my case) is dominated naturally enough by large files whereas applications and user data tend to be tiny. My media drive, for example has about 11,000 files in 95GB, or about 110 seeks/GB. My OS/apps drive, on the other hand, has over 89,000 files in 5.75GB, or 16000 seeks/GB.
Consider that a high-end drive can handle perhaps 600 IO/sec, and a large IDE drive can handle perhaps 150. Clearly then we have a problem: usage patterns differing by 150:1 in terms of number of seeks are not matched well to drives differing by 4:1 in seek performance. As you've demonstrated, physics cannot allow us to increase SCSI's seek performance to 150X that of bulk IDE drives.
The only way to achieve that sort of performance is with solid state storage. RAM costs about $150/GB - let's see someone mass-produce consumer-grade SSDs. Call it the "drive accelerator" and build it into a removable HDD bay. I guarantee that 1GB of RAM caching the most-used files on a hard drive would see performance skyrocket. Sure, it would be expensive, but it would be cheaper than the 15k SCSI boot disk I have, and a whole lot faster.
High-speed Road Trip (18.000KPH)
First off, disk access time is dominated by actuator movement (seek time). Rotational latency on a 15,000rpm disk is 2ms, not 4. The fastest 15K drives have 3.5-4ms seek time.
:)
Example: Seagate 15krpm drive: average seek time 3.6ms. You are correct that the *average* rotational latency will be 2ms, since the full rotational latency is 4ms. However, 2ms out of 3.6ms is more than half, meaning rotational latency dominates (even though you were right about the average rotational latency being important, not the full rotational latency).
As for slower drives, a seagate 7200rpm disk has a full rotational latency of 8.3ms, meaning average is 4.2ms - the average seek time is 8.5ms - in this case you are correct that actuator movement time dominates. Thus, part of your statement is proven by example: On lower-end disks they use crappier actuators - nice one
Seek time on large drives is of no importance. Seek time on small drives is of supreme importance.
I don't know why you think that seek time on large drives is of no importance... Filesystems (and databases) do fragment large "sequential" data files (or tables) - so streaming a 10GiB file will cost you a lot more seeks than just the one needed to go to the beginning of the file. Two (or more) concurrent streams, and you have seek-nightmare. Secondly, not everyone fills their drives with huge files. On one of my data drives, I have more than 100GiB in more than 1 million files.
I agree that some form of hierarchial storage (probably built into the drive) would be a way to go. It's done already, to some degree; cache (RAM) -> disk (and optionally -> tape). The RAM will have to be battery backed, if you want to see write improvements as well - while writes are usually more forgiving than reads, with non-battery-backed RAM you still need to seek+write in order to flush a write to the disk.
Moore's law causes a problem to manufacturers - how to keep up the profit margins. I suspect this is the major driving force behind many new technologies.
For example, once disk drives are cheap enough to give everyone 100+GB local storage, we get much more expensive SANs, NAS servers and network caches.
Once complete PCs began to cost under $200, we get blade servers and micro cases to keep the price (and total profit) up.
Before the flames start, I'm perfectly aware that some people's requirements will dictate the more expensive solution. But in many cases you can go for multiply redundent cheaper devices, with higher total reliability, and still get change from the price of the "enterprise" products.
Andrew Yeomans
Hmmm,
it seems those years were the inflection point for physical disk size; production disks circa 1983-4 were still 5-20MB in a cabinet the size of a small desk, and 20 times heavier. Within a year or two, they fell in size and new technology came online. I (we) had over 250 spindles of disk, some removable, that we were responsible for, including manually repairing the drives, and replacing and realigning all 20 heads by hand (no, really). They were simply larger versions of today's technology (barring magnetoresistive and size advantages), with none of the benefits of the smaller versions including much more moving mass, high power requirements, weaker magnet fields from "rare earth magnets", ran hot, etc. The actuators had magnets and shunts the size of an (american) football, and the seek time had to have been so much slower, though they would still move so fast as to be a complete blur. They would nicely remove your fingers if given a chance. In those days, I also had a chance to go to Memorex Canada's facility in Winnipeg and hands-on "work" on a line that built drives for a couple of days or so. Sidenote: Many did not like an Americans; some were rude because my president decided to test cruise missiles on northern candian Moose, or something like that; I was too young to care. I guess I don't blame them.
Roughly 1985, 1986 I was IMPRESSED with a 5 1/4" Maxtor 40MB drive; I had died and went to heaven.
Later, there was a medium sized ESDI disks; they worked, but they seemed to have the worst characteristics both(?) in that they failed for us often(maybe it was just the 80s and everything was junk), heat was always a problem, the controller(s) and cabling were large and could be problematic (could not/would not relocate bad or failing data for example; we had Exabyte cont. and Toshiba Falcon(?) drives); on to SCSI and to Fibre Channel from SASI/Shugart and cannot look back.
Great memories.
Example: Seagate 15krpm drive: average seek time 3.6ms. You are correct that the *average* rotational latency will be 2ms, since the full rotational latency is 4ms. However, 2ms out of 3.6ms is more than half, meaning rotational latency dominates (even though you were right about the average rotational latency being important, not the full rotational latency).
Seek time is 3.6ms. Access time is 5.6ms. The seek time is the time it takes for the heads to seek to the proper location. This is followed by (correct, an average of) 2ms of rotational latency, for a total (again, average) 5.6ms access time.
I don't know why you think that seek time on large drives is of no importance...
Because for the majority of users, it is.
You're correct that large files can be fragmented, but the fact is that most users' large files (movies, audio, etc.) are never edited, meaning no excess fragments are created. Then the only source of fragmentation is deletions, which produces relatively little fragmentation. Your million files in 100G is still 100K/file, rather larger than my boot drive (which I assume for lack of further evidence to be "normal").
Taking the time to check fragments, I have 13602 fragments in 90G (~6.6M/fragment) for data and 112594 fragments in 5.75G (~50K/fragment) for boot/apps. The ratio is still about the same.
As for concurrent reads, this is a problem of firmware optimization. Two applications making full-out reads of two separate files should be served by firmware in the following manner: the drive reads a buffer-full of one file and then seeks to the other. In an STR-bound application like this, the drive seeks as little as possible. In a situation where applications are making repeated small (but sequential) reads, the firmware should seek to maintain a buffer half-full of one file and half-full of the other, performing read-ahead caching to allow bits of each file to be sent to the host with a minimum of seeking on the drive's part.
There are few circumstances where firmware optimization cannot mask seek performance, and these typically involve small datasets or are not suitable for large drives for other reasons.
High-speed Road Trip (18.000KPH)