Hard Drive Capacity Confusion, Lucidly Explained
mrklin writes "James Wiebe of wiebetech.com has written a clear example of how hard drive capacity is calculated (PDF file) by hard drive manufacturers (base 10) and OS (base 2). He failed to name how the capacity should be described, though."
With storage prices falling through the floor, does it matter to anyone except whiny nerds whether the byte counts are done in base 10 or base 2?
In the words of William Shatner, "Get a life!"
This one will hold 30 days of Porn
Now, this one here will hold 45 days of Porn
Break it down to something Everyone understands
In the grand scheme of things, drive capacity issues seem to revolve around lawyers more than consumers.
I wish that the major manufacturers would stop putting 1 BIG drive in the system, and put 2 normal sized ones in and MIRRORED.
As somebody who gets blasted by customers when they failed to do their backup, an out of the box, pre mirrored system would be far better for the consumer than properly labelling those lost 200 MB.
Sorry, that's my partially related rant for this evening.
My mom says I'm cool.
As far as ordinary users (i.e. anyone who doesn't have to deal with TLBs, memory pages, disk sectors and the like) are concerned, there's really no reason left to use binary units; 2^9 bytes per sector, 8 sectors per filesystem block, etc. are all low-level conveniences that the user shouldn't have to even notice. Though I personally am too used to the binary units to switch easily, the vast majority of users probably wouldn't even notice the difference, aside from their computers finally reporting the right size for their hard disks. Granted, overcoming the huge momentum for binary units will be difficult, but one could always consider it practice for getting the USA to accept metric.
I think it's a little odd that he claimed that Hard drive makers have "Always" done this. I very specifically remember advertisements for hard drives being "One Billion Bytes" (with like a 14 point small print letting us know that it was indeed 1000000000 bytes). After that "billion bytes" became gigabytes and the font became smaller.
I've also heard that for some drive makers "gigabyte" means 1^20*10^3 (i.e. one thousand megabytes) and things like that.
autopr0n is like, down and stuff.
This is not a matter of base-10 vs base-2... a base-10 number is written as "2875" for example. A base-2 number is written as "10100110". A base-16 number is written as "8A3F0"...
This is a matter of UNITS used - like inches vs. feet, or in this case GiB vs GB.
Geez, get the terminiology right...
Ah, and therein lay the crux of the matter. The problem is that *everywhere else* kilo-, mega-, etc. prefix units (to stop the megapolis argument) they denote powers of 10. A megavolt is a million volts. A kilometer is 1000 meters. A gigahertz is a billion hertz. Only in computer science have people redefined the units to refer to anything other than powers of 10. *That* is what the debate revolves around, and that is what is IMO the mistake of people early on. The solution is to make kilobytes officially be 1000 bytes (as the IEC has) and use a different unit for the powers of two.
About two years ago there was a debate about this. Can't remember the details of that debate. Maybe it was when those "mebibytes" were introduced. I still say now what I said then.
I think there should be "short megabytes" and "long megabytes", and the same for gigabytes. Like this:
Then all we need is to get hard drive manufacturers and OS vendors to state whether they are using short or long tons, er, gigabytes.
As to abbreviations, take Donald Knuth's suggestion. Use the capital letter twice to suggest binaryness. 1 MMB = one long megabyte; 1 GGB = one long gigabyte. I like this much better than the now-standardized MiB men-in-black abbreviation for long megabytes (which are still not called long megabytes in the standard, they are called mebibytes, which sounds silly and no one uses it).
Who's with me?
Sunlit World Scheme. Weird and different.
Those are too hard to pronounce. Who not just distinguish them by prefixing the metric ones with the word "metric", as we do with tons and metric tons.
kilobyte = 1024 bytes
metric kilobyte = 1000 bytes
hear hear!
a CDR 650/700 Mb
a DVD[+-]R: 4.7 salesman Gb
= 4.7*1000*1000*1000/1024 = 4589843 kb (= 4.37 Gb)
AFAIK base-10 is just plain cheating.
Genius doesn't work on an assembly line basis. You can't simply say, "Today I will be brilliant."
The only relevant issue is the meaning of words like kilobyte, megabyte, and gigabyte. Wiebe describes how you can arrive at two different answers for drive capacity depending on how you define the word "gigabyte," but does so completely uncritically. For example, he describes the drive manufacturer logic and writes that "the drive's claim of 123.5 GB is verified with this simple mathematical formula." But the issue is what the word "gigabyte" means, and the formula presented sheds no light on the word's conventional usage or etymology. I personally was raised to use these terms to correspond the numbers that are powers of two. Wiebe doesn't give me any point of reference to shed light on whether it's reasonable to use the meanings drive manufacturers do. (Of course I already know the answer, but that's beside the point.)
Wiebe uses some other odd logic, exemplified in point 3.7. He writes that the consumer was never cheated, because a drive advertised as having a capacity of 123.5GB had just that in "decimal based" capacity. This is a bizarre way to characterize the complaints. Consumers who believe they were cheated aren't claiming they didn't get 123.5GB for any definition of the word gigabyte. They're claiming they didn't get 123.5GB by the conventional definition of the word as commonly used in connection with computers. In my view, they're right, although I don't personally get too upset about it.
I don't know what he's talking about; my Pentium 66 insists that 1024 x 1024 x 1024 = 1,000,000 exactly.
Am I the only one who heard Roxette to sing "I'm gonna get blitzed for some sex"?
...he ignores the fact that HD manufacturers are happy using bytes which are 8 bits, all the while flaunting the established convention that MB/GB refers to binary megabytes and binary gigabytes. Why don't they specify the size of their HDs in bits?
Even better-- pay in Canadian currency. That way it really is smaller. "I paid you 299 dollars and ninety-nine cents, just like we agreed upon. The fact is, you never specified the American dollar or the Canadian dollar, so I just used the unit more convenient for me."
"Why Subscribe?" Good question...
But personally I strongly reject this "kibibytes" attempt at CS revisionist history. Stick with what CS people have been using as measurements for decades, I say,
Why shouldn't CS people stick to what the rest of the sciences have been using for decades, that "kilo" means 1000? This CS thing of making "kilo" stand for 1024 is an attempt at revisionist history.
There's always another perspective.
Or we could just beat the hard-disk manufacturers with a stick until they understand that most people expect 1 kilobyte to be 1024 bytes :P
You are out of touch. If you conducted a scientific survey of 100 random adults who own PCs and asked them:
"How many bytes are in a kilobyte?" you really think that more than 50 would answer "1024"?
I'd be surprised if more than 10 did, personally.
100% of the non-geek population equates kilo with base 10, not base 2.
- It uses terms like "binary math" versus "decimal math". Last I checked, they were both equally viable ways of doing math, and as any viable method of doing math should be, they both always get the same answer! See section 3.5 if you want to get really mad! It isn't that the math is different that is causing a problem, it is that the algorithm is different. It just so happens that the algorithm was inspired by a number which is convenient when dealing with binary because it is an even power of 2.
- There is no discussion of why HDD makers use normal math while OS makers use "computer-ese". It isn't wholly discountable that HDD makers are interested in making their drives look as big as possible against the competition, and if one manufacturer says a Gigabyte is 10^9 bytes then they all have to. And he paints the 1024-byte KiloByte basically as a stupid idea, which it isn't (albeit confusing).
- The explanation (such as it is) for how much data is lost to OS overhead is inaccurate at best. He got his info for the Mac from the Drive Utility (akin to Disk Management or fdisk in MS-land), but got his WinXP info probably from the explorer. Fdisk will not report any filesystem size considerations, just the partition sizes, so neither should the Drive Utility. I'm betting the 1026 "lost" bytes are the partition table. This makes it look like the Mac loses 1026 bytes, while Windows tosses about 11 MB out the door. While I'm not trying to advocate for Windows, that simply isn't fair. He goes on to say that he has "no explanation for these variations", which brings me to my next point.
- He can't explain the size variations between OSes, yet he makes this statement:
So now he's trying to explain it, and not doing a very good job. First of all, the FS overhead will vary roughly proportionally to the size of the partition, so giving out a number like 70 MB and saying that a "typical drive" loses this much is careless at best. Secondly, I'm not conviced that he doesn't actually have 70 MB of data on that drive. There's no accounting for the 11 MB that aren't showing up as "used", which sounds like FS metadata to me. I don't have a drive handy to format, so I don't know if Windows shows "0 used" on a clean NTFS drive or not (oh, is he using NTFS or FAT32... the world may never know). The bottom line: he should have used the Disk Management tool to compare apples to apples (no pun intended).
- And the bottom bottom line is that he's in the storage business, and shouldn't be so ignorant. He's got a degree in mathematics for crying out loud!
I appreciate that this needs to be explained, and I know all too well that the average computer user (read average American) can hardly count, much less do it in binary, so a simple explanation is good. But I never think things should be simplified to the point of gross inaccuracy. This is just further compounded with the obvious lack of a clue. Someone write a better (and perhaps shorter) account for this, please!I have to agree here. I've been using computers since the early 80's and the "kilo" or "mega" notation was well understood then to be an approximation, at least in my circles, to their decimal prefix equivalents.
... use it; point out to the rest of the world that MB is inaccurate and should mean 1000*1000 bytes, that MiB in fact *means* 1024*1024 bytes and this will solve our confusions within a generation.
"A kilobyte in a computer is 1024 bytes only because in base-2 it is simpler to count in 1024's than in 1000's"
That said, and everyone learned that back when people had to learn about computers (instead of growing up with them), this approximation is *still* just an approximation.
Just because you grew up thinking a kilo meant 1024 because you're in a non-metric country doesn't mean a kilo means 1024. It means your predecesors didn't bother using a different name for a different number (back when "the world will never need more than maybe 10 computers").
Mebi is available now
- Michael T. Babcock (Yes, I blog)
This issue reminds me of a practice used in another industry. The auto industry commonly reports horsepower and torque for their cars as measured at the engine's crank/flywheel vs at the wheels. While the measurements themselves are an accurate reflection of an engine's general performance alone you typically do not just buy an engine, you buy a system which is the car. When the engine's performance in measured within the context of the car--meaning at the wheels--then the truth is revealed. That revelation shows, on average, a loss of 10-20% when power is measured at the wheels vs the crank. Which spec do you think a manufacturer is going to release?