Hard Drive Capacity Confusion, Lucidly Explained
mrklin writes "James Wiebe of wiebetech.com has written a clear example of how hard drive capacity is calculated (PDF file) by hard drive manufacturers (base 10) and OS (base 2). He failed to name how the capacity should be described, though."
With storage prices falling through the floor, does it matter to anyone except whiny nerds whether the byte counts are done in base 10 or base 2?
In the words of William Shatner, "Get a life!"
This one will hold 30 days of Porn
Now, this one here will hold 45 days of Porn
Break it down to something Everyone understands
The real units joke is starting to get old...
In the grand scheme of things, drive capacity issues seem to revolve around lawyers more than consumers.
I wish that the major manufacturers would stop putting 1 BIG drive in the system, and put 2 normal sized ones in and MIRRORED.
As somebody who gets blasted by customers when they failed to do their backup, an out of the box, pre mirrored system would be far better for the consumer than properly labelling those lost 200 MB.
Sorry, that's my partially related rant for this evening.
My mom says I'm cool.
Why are we seeing another article on this very issue?
Everyone understands HD manufacture's measuring systems. Failing that, we could just have billy fix up windows to overstate drive capacity to all windows users and they would never know any better.
No more Micro$oft bashing from me. Its like bashing at the special olympics.
As far as ordinary users (i.e. anyone who doesn't have to deal with TLBs, memory pages, disk sectors and the like) are concerned, there's really no reason left to use binary units; 2^9 bytes per sector, 8 sectors per filesystem block, etc. are all low-level conveniences that the user shouldn't have to even notice. Though I personally am too used to the binary units to switch easily, the vast majority of users probably wouldn't even notice the difference, aside from their computers finally reporting the right size for their hard disks. Granted, overcoming the huge momentum for binary units will be difficult, but one could always consider it practice for getting the USA to accept metric.
I think it's a little odd that he claimed that Hard drive makers have "Always" done this. I very specifically remember advertisements for hard drives being "One Billion Bytes" (with like a 14 point small print letting us know that it was indeed 1000000000 bytes). After that "billion bytes" became gigabytes and the font became smaller.
I've also heard that for some drive makers "gigabyte" means 1^20*10^3 (i.e. one thousand megabytes) and things like that.
autopr0n is like, down and stuff.
This is not a matter of base-10 vs base-2... a base-10 number is written as "2875" for example. A base-2 number is written as "10100110". A base-16 number is written as "8A3F0"...
This is a matter of UNITS used - like inches vs. feet, or in this case GiB vs GB.
Geez, get the terminiology right...
Ive noticed that some companies tend to go a little over the hard drive specified size. Most notably with maxtor. My 160GB and 200GB hard drives are actually 163.9GB and 203.9GB. On the other hand Ive found that Western digital seems to have drives slightly smaller than their advertized capacity (59.8GB for a 60GB drive and 79.97GB for an 80GB drive)
All misspellings and grammatical errors in the above post are intentional and part of my artistic expression.
Ah, and therein lay the crux of the matter. The problem is that *everywhere else* kilo-, mega-, etc. prefix units (to stop the megapolis argument) they denote powers of 10. A megavolt is a million volts. A kilometer is 1000 meters. A gigahertz is a billion hertz. Only in computer science have people redefined the units to refer to anything other than powers of 10. *That* is what the debate revolves around, and that is what is IMO the mistake of people early on. The solution is to make kilobytes officially be 1000 bytes (as the IEC has) and use a different unit for the powers of two.
About two years ago there was a debate about this. Can't remember the details of that debate. Maybe it was when those "mebibytes" were introduced. I still say now what I said then.
I think there should be "short megabytes" and "long megabytes", and the same for gigabytes. Like this:
Then all we need is to get hard drive manufacturers and OS vendors to state whether they are using short or long tons, er, gigabytes.
As to abbreviations, take Donald Knuth's suggestion. Use the capital letter twice to suggest binaryness. 1 MMB = one long megabyte; 1 GGB = one long gigabyte. I like this much better than the now-standardized MiB men-in-black abbreviation for long megabytes (which are still not called long megabytes in the standard, they are called mebibytes, which sounds silly and no one uses it).
Who's with me?
Sunlit World Scheme. Weird and different.
hear hear!
a CDR 650/700 Mb
a DVD[+-]R: 4.7 salesman Gb
= 4.7*1000*1000*1000/1024 = 4589843 kb (= 4.37 Gb)
AFAIK base-10 is just plain cheating.
Genius doesn't work on an assembly line basis. You can't simply say, "Today I will be brilliant."
Well, he does say this:
And this:
But personally I strongly reject this "kibibytes" attempt at CS revisionist history. Stick with what CS people have been using as measurements for decades, I say, and not submit to what the drive manufacturers want to use for inflated advertising.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
It depends on how many times you've watched "Back to the Future."
Oh man, that just brought back memories. A bunch of geeks sitting at Round Table pizza for a BBS party all trying to get the highest in base 2 decimals. 512, 1024, 2048, 4096, 8192, etc. all shouting to be heard.
No wonder you'd never see a woman at those parties, must have scared them off. of course, nowadays, you see women geeks much more often, thank God.
1f u c4n r34d th1s u r34lly n33d t0 g37 l41d Capitalization really works: i helped my uncle jack off a horse
What's next? Monitor sizes? I love my 19" (18" viewable) monitor!
Monitors are measured by the diameter of the actual physical glass tube inside the monitor. It's a clear and non-ambiguous way to measure things, not perfect, but it's no trickery.
But when Joe Windows formats his new 120 gig HD and finds it only holds 112 GB he's going to feel cheated on those "missing" 8 GB.
.: Max Romantschuk
The only relevant issue is the meaning of words like kilobyte, megabyte, and gigabyte. Wiebe describes how you can arrive at two different answers for drive capacity depending on how you define the word "gigabyte," but does so completely uncritically. For example, he describes the drive manufacturer logic and writes that "the drive's claim of 123.5 GB is verified with this simple mathematical formula." But the issue is what the word "gigabyte" means, and the formula presented sheds no light on the word's conventional usage or etymology. I personally was raised to use these terms to correspond the numbers that are powers of two. Wiebe doesn't give me any point of reference to shed light on whether it's reasonable to use the meanings drive manufacturers do. (Of course I already know the answer, but that's beside the point.)
Wiebe uses some other odd logic, exemplified in point 3.7. He writes that the consumer was never cheated, because a drive advertised as having a capacity of 123.5GB had just that in "decimal based" capacity. This is a bizarre way to characterize the complaints. Consumers who believe they were cheated aren't claiming they didn't get 123.5GB for any definition of the word gigabyte. They're claiming they didn't get 123.5GB by the conventional definition of the word as commonly used in connection with computers. In my view, they're right, although I don't personally get too upset about it.
I need to clear the 101th bit of that byte.
1 4m l33t. 1 4m d4 b0mb. 1 hax0red da 0xF49B5Cth byte 0f dat file.
0o1232 the number of the beast. (a music by Iron Maden)
Good morning, kids. Please open your history books in the CDXXXVIIth page.
-
Roses are #FF0000, Violets are #0000FF, find / -name '*base*' |xargs chown -R us && mv zig greatjustice
I think Wikipedia's entry on gigabyte should make this crap appear really stupid. Here's a clip from the entry:
Since most people who buy computers are not in "computer science or computer programming", I would argue the value used by storage manufacturers is perfectly applicable when selling computers in the mainstream.
Sadly, it appears lawsuits rather than education on a minor issue will be used to settle this matter, which will lead to a precedent that will be yet another aggrivation for the computer industry. Damnit, if you're a lay person, it's safe to say that 1,000 Megabytes is roughly 1 Gigabyte.
Join Tor today!
I don't know what he's talking about; my Pentium 66 insists that 1024 x 1024 x 1024 = 1,000,000 exactly.
Am I the only one who heard Roxette to sing "I'm gonna get blitzed for some sex"?
Hey Jimmy, assuming you're using FAT32 as your XP filesystem, which uses 73.8 MB of space for every gigabyte, not just 73.8 megs one time, that adds up to roughly 8,856MB of space used for the filesystem. Which on a labeled 123.5 GB drive, leaves you with roughly 115GB of space! Wow! The HD manufacturers were right!
The OS *do* use a negligible amount of drive space in these days with 100+ GB hard drives. And you're confusing file systems with operating systems. Just because an OS allow you to use a file system that waste resources, doesn't mean the OS itself use a lot of drive space.
Beware: In C++, your friends can see your privates!
...he ignores the fact that HD manufacturers are happy using bytes which are 8 bits, all the while flaunting the established convention that MB/GB refers to binary megabytes and binary gigabytes. Why don't they specify the size of their HDs in bits?
Even better-- pay in Canadian currency. That way it really is smaller. "I paid you 299 dollars and ninety-nine cents, just like we agreed upon. The fact is, you never specified the American dollar or the Canadian dollar, so I just used the unit more convenient for me."
"Why Subscribe?" Good question...
"Our computers are binary, so the hard drives that we put in them should be measured using the binary (Base-2) representation."
Eh, no. Binary is interesting to computers, not to humans. Humans care about numbers multipliable by 10.
A human can understand the concept of a byte, a single letter. However, a human, unless he's really into computers, doesn't care much about how many bits are in a byte. It may be 8-bits per byte, but what about error correction etc?
A human can easily multiply 1000 by 1000 and know what the answer is, but ask him to do 1024 by 1024 and he's going to scratch his head. But if he knows that he's got 1,000 useful bytes/characters, then he doesn't need to know about how many bits are in a byte, and the powers of 2, etc.
So no, I don't agree with you. Human readability is at issue here. If somebody really wants to know how many bits are on an HD, they're wanting to know more than most people who'll plunk down money for a drive.
(note: I realize you didn't necessarily mean bits, but I did kind of need to make that point so the rest of my statement made more sense. Hope I didn't sound like I was misrepresenting what you said.)
"Derp de derp."
Every time this issue comes up, and somebody proposes to abandon the use of K = 1024, people react saying that would be bad, because K = 1000 doesn't fit with power-of-2-based storage in computers.
./ religious wars' should be stopped. And I think it will: ifconfig, for example, already reports bytes transmitted/received as GiB, MiB, ...
That may very well be, but that's not the point. We can still use binary prefixes. But it doesn't make any sense to use the same prefix with different meanings. It makes perfect sense to use different prefixes, and to use different symbols for them. That way there is no confusion.
I never understood this really. CS is, of all fields, a field where it is important to be unambiguous. One byte wrong, or even one bit, and the computer doesn't understand it, or misunderstands it. Yet where it comes to defining storage units, we hijack the established 'kilo' and make it mean 1024 instead of 1000. Not even always: 1kbps is never 1024 bits per second, always 1000 bits per second.
I say, where it makes sense to use binary prefixes, let's do it. And let's be clear about it. The current 'Look ma, I'm using binary prefixes but I made them look exactly like the usual decimal prefixes so I can create confusion and
This sig under construction. Please check back later.
> describe the size in terms of number of songs. (of course,
I forget... is it 1.7 threesomes per song, or is it the other way around?
Sheesh, evil *and* a jerk. -- Jade
No, dummy. He is talking notation, not numbers. We have to change the words we use to describe numbers in computer science, not the numbers themselves.
The "kilo" in kilobytes is an abuse of SI metric notation. "kilo", "mega" and "giga" mean 1000, 1 000 000, and 1 000 000 000 to physicists, engineers, chemists, and the general scientific community. How arrogant or short-sighted were computer scientists to think that they could simply re-define these prefixes to mean 1024, 1024 * 1024, and 1024 * 1024 * 1024?
The real solution is to stop abusing widely accepted terminology and switch to the suggested "kibibytes", "mebibytes" and "gibibytes". Yes, it sounds stupid, but that's only because it's unfamiliar. It's not as stupid as using one set of prefixes for two different purposes. In fact, it's that very usage that led to this stupid conflict between "hard drive manufacturer gigabytes" and "operating system gigabytes".
From a consumer standpoint it makes sense to make 1K = 1000 bytes and so on, but from a computer viewpoint, it's best to leave it as is. All in all, people should research what a kilobyte is (in terms of how many bytes it is) before they become experts in storage capacities for computers.
Geez. Repeat after me: Computer are intended to be used by PEOPLE, not the other way around. Nobody, I repeat, nobody, outside of the CS community uses kilo, mega, or giga to mean anything but 10^3 (10 to the power of 3), 10^6 or 10^9. Why should Joe Sixpack on the street, or even a Physics professor with no CS knowledge, have to "research" what "gigabyte" means in the context of computer science? It should mean 1 000 000 000 bytes, plain and simple. If someone wants to express the number 1024^3, they should make up a new word such as "gibi-" instead of using existing terminology.
Of course, this will NEVER happen, because in any given community, the majority of people would rather stick with widely accepted and entrenched mistakes than bother to change their behaviour or ideas. Just witness the ridiculous C notation for assignment:
a = a + 1
In many other programming languages and mathematics "=" means "equality" NOT assignment; saner languages use ":=" for assignment. Yet, because of C's popularity, we will be stuck with this abuse of notation forever, especially since any new languages (such as Java) will try to cater to C programmers.
If you can't see why this is a mistake, consider this. In a language with "=" for equality and ":=" for assignment, you only have learn one new thing: that ":=" means assignment. In C, you have to learn two things: "=" means assignment, NOT equality and "==" means equality. How stupid is that? Everyone already knows that "=" means equality; why change that? Everyone already knows that "kilo" means 1000; why change that?
Now, thanks to the "grandfathers of CS" or whoever, I have to remember my standard SI prefixes (okay, that's no problem), I need to know that in most CS applications kilo, mega, giga, etc. mean 1024, 1024^2, 1024^3, etc. and I need to remember that in CERTAIN CS applications kilo, mega, giga, etc. have their standard meanings.
Oh sorry, but what was I thinking? It's the hard drive manufacturers who are stupid.... (sarcasm). Did you ever think that one of the reasons they use the standard definition of "giga-" to calculate drive sizes is that most NORMAL PEOPLE (i.e. the majority of computer users) don't know that giga means 1024^3? More to the point, how many ordinary people care to calculate (or memorize) the exact value of a gigabyte? (Of course, I'm sure another reason is that they get to "inflate" their hard drive sizes).
To summarize my overly long post, one of the main reasons computer consumers are constantly being ripped off, misled and confused is that CS geeks like us keep forgetting or never cared that computers are nothing more than tools for people. Maybe you need to take a Human-Computer Interaction course or something, if you can't understand that.
- It uses terms like "binary math" versus "decimal math". Last I checked, they were both equally viable ways of doing math, and as any viable method of doing math should be, they both always get the same answer! See section 3.5 if you want to get really mad! It isn't that the math is different that is causing a problem, it is that the algorithm is different. It just so happens that the algorithm was inspired by a number which is convenient when dealing with binary because it is an even power of 2.
- There is no discussion of why HDD makers use normal math while OS makers use "computer-ese". It isn't wholly discountable that HDD makers are interested in making their drives look as big as possible against the competition, and if one manufacturer says a Gigabyte is 10^9 bytes then they all have to. And he paints the 1024-byte KiloByte basically as a stupid idea, which it isn't (albeit confusing).
- The explanation (such as it is) for how much data is lost to OS overhead is inaccurate at best. He got his info for the Mac from the Drive Utility (akin to Disk Management or fdisk in MS-land), but got his WinXP info probably from the explorer. Fdisk will not report any filesystem size considerations, just the partition sizes, so neither should the Drive Utility. I'm betting the 1026 "lost" bytes are the partition table. This makes it look like the Mac loses 1026 bytes, while Windows tosses about 11 MB out the door. While I'm not trying to advocate for Windows, that simply isn't fair. He goes on to say that he has "no explanation for these variations", which brings me to my next point.
- He can't explain the size variations between OSes, yet he makes this statement:
So now he's trying to explain it, and not doing a very good job. First of all, the FS overhead will vary roughly proportionally to the size of the partition, so giving out a number like 70 MB and saying that a "typical drive" loses this much is careless at best. Secondly, I'm not conviced that he doesn't actually have 70 MB of data on that drive. There's no accounting for the 11 MB that aren't showing up as "used", which sounds like FS metadata to me. I don't have a drive handy to format, so I don't know if Windows shows "0 used" on a clean NTFS drive or not (oh, is he using NTFS or FAT32... the world may never know). The bottom line: he should have used the Disk Management tool to compare apples to apples (no pun intended).
- And the bottom bottom line is that he's in the storage business, and shouldn't be so ignorant. He's got a degree in mathematics for crying out loud!
I appreciate that this needs to be explained, and I know all too well that the average computer user (read average American) can hardly count, much less do it in binary, so a simple explanation is good. But I never think things should be simplified to the point of gross inaccuracy. This is just further compounded with the obvious lack of a clue. Someone write a better (and perhaps shorter) account for this, please!Kids today are tyrants. They contradict their parent, gobble their food, and tyrannize their teachers. - Socrates 400 BC
Somehow I get the feeling that it's mostly Americans who refuse to accept that kilo=10^3, mega=10^6, giga=10^9. (Please read on before moderating as troll/fb.)
I guess that's because you aren't used to kilo, mega and giga, except to the (incorrect) power-of-2 definitions. To someone who lives pretty much anywhere else in the world (ie. where metric units are used), kilo has always been 10^3 and mega has always been 10^6. Well, except in most fields of CS (but not telecommunications or HDD capacity).
What's happening is that several different fields of science are slowly starting to overlap, and suddenly there's real confusion: for someone, kilo=10^3, for someone else it's 10^3 EXCEPT in some cases it's 2^10.
This source of confusion should be fixed now when it's still possible. It may seem to this audience that Computer Science == Life (and most of you probably don't need to think about data in terms of telecommunications) and therefore you think kilo=2^10 is standard, but for a huge majority of people it simply is not so.
Kilo has always been 1000 and will always be 1000. It's us the computer people who have made a mess of it, and we're also responsible for cleaning it up.
"If anyone needs me, I'm in the angry dome."
Now if the drive manufacturers really wanted to go decimal, they'd use a 10 bit byte...but that would mean they had to give you a bigger drive for your money!
Oh yeah, and did anyone else laugh like a drain when the author used "IBM", "hard drive" and "reliable" in the same paragraph? ;-)
When I am king, you will be first against the wall.
This issue reminds me of a practice used in another industry. The auto industry commonly reports horsepower and torque for their cars as measured at the engine's crank/flywheel vs at the wheels. While the measurements themselves are an accurate reflection of an engine's general performance alone you typically do not just buy an engine, you buy a system which is the car. When the engine's performance in measured within the context of the car--meaning at the wheels--then the truth is revealed. That revelation shows, on average, a loss of 10-20% when power is measured at the wheels vs the crank. Which spec do you think a manufacturer is going to release?