Slashdot Mirror


Hard Drive Capacity Confusion, Lucidly Explained

mrklin writes "James Wiebe of wiebetech.com has written a clear example of how hard drive capacity is calculated (PDF file) by hard drive manufacturers (base 10) and OS (base 2). He failed to name how the capacity should be described, though."

36 of 482 comments (clear)

  1. Does it matter anymore? by Dancin_Santa · · Score: 5, Insightful

    With storage prices falling through the floor, does it matter to anyone except whiny nerds whether the byte counts are done in base 10 or base 2?

    In the words of William Shatner, "Get a life!"

    1. Re:Does it matter anymore? by dtfinch · · Score: 4, Informative

      I'm a whiny nerd, and it doesn't matter much to me whether hard disk manufactures define sizes in multiples of base 10 or base 1010.

      But I want to know how each drive handles error correction. A sector isn't REALLY 100000000 bytes when stored on disk, but has extra information to help it detect and correct most small errors. Some manufacturer could skimp on the error correction to increase storage capacity or reduce cost, but the drive would likely crap out sooner than others on the market.

    2. Re:Does it matter anymore? by |deity| · · Score: 5, Interesting
      Even the article states that you are losing 10% of the capacity you would expect. I think 10% is significant enough to complain about.

      The author at one point in the article says that operating systems have historically not documented how size is counted. Like the engineers at a drive manufacturing company aren't smart enough to know that if you calculate a kilobyte in base 2 you are going to calculate a megabyte, or gigabyte in base 2.

      Yes if you are smarter then your average computer user, which is to say smarter then a really dumb rock you should know that what's reported on a drive is not the actuall size.

      It still hacks me off. It's like a soda manufacturer deciding it's ok to redefine an ounce so that they can claim that their drink is larger then it is or just use a smaller container and claim it's still the same size.

      Does it matter, yes and it will matter more as storage capacity increases.

      If you use a computer it does all calculations in binary, it only makes sense for the capacity of the drive to be calculated in binary.

      --
      Environmentalists are their own worst enemy. ~tricklenews.com
    3. Re:Does it matter anymore? by OverlordQ · · Score: 4, Interesting

      Like the engineers at a drive manufacturing company aren't smart enough to know that if you calculate a kilobyte in base 2 you are going to calculate a megabyte, or gigabyte in base 2.

      That's where the standard agrument fails, because mega, kilo, giga, terra, et al are base 10 prefixes not base 2.

      --
      Your hair look like poop, Bob! - Wanker.
    4. Re:Does it matter anymore? by Bi()hazard · · Score: 4, Interesting

      This is a big issue for those who use RAID arrays based on intercahngeable hard drives. This is a common practice among large corporations, and drive manufacturers' nonstandard descriptions of sizes make it very difficult to mix manufacturers within an array.

      Buying from company A gives you 120GB=120 billion bytes, and buying from B gives you 120GB=128,762,169,664 bytes. If we have an array of 10 disks at the larger size and swap one out for the smaller size, the disks cannot be treated as interchangeable anymore, and the array loses much of its efficiency, or is forced to waste the extra space on the larger drives.

      The bottom line is that this costs money. Companies are locked into using one supplier and must pass up opportunities for good deals. The lack of flexibility and occasional screw ups by interns who don't check which drive is which uses up the IT department's time.

      Nobody really cares whether a GB is 1 billion or a funny number that comes from base 2, but a lot of people with a lot of money care whether 1 GB from company A equals 1 GB from company B. One of these days the industry will have to standardize.

      It's just as bad as monitor sizes-they measure those at funny angles and have different sized black margins around the viewable area. Just a couple months ago a manager here ordered a new 19 inch monitor and was so annoyed by the margins that he sent it back to be replaced. We gave him an old, lower quality monitor with the settings adjusted to minimize the margin. Some guy in IT took the new one home with him, and wrote it off as trashed defective equipment.

    5. Re:Does it matter anymore? by kryonD · · Score: 3, Informative

      Please take note that the amount of free space on an empty, but FORMATTED hard drive will always be a noticable chunk less than full capacity as the OS requires storage space overhead for the file system.

      I just finished explaining this to someone who was whining about their 128MB USB keychain drive only having 123MB of space.

      Your directory structure has to be kept somewhere.

      --
      I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
    6. Re:Does it matter anymore? by vrt3 · · Score: 3, Informative
      if I'm going to buy a 120 GB hard drive, i expect there to be 120 * 2^30 = 128,849,018,880 bytes on the drive.

      if I'm going to buy a 120 GB hard drive, I expect there to be 120 * 10^9 = 120,000,000,000 bytes on the drive.

      The hard drive I got had 113 GB (113*2^30 = 121,332,826,112 bytes).

      The hard drive I got had 113 GiB (113 * 2^30 = 121,332,826,112 bytes).

      That is a difference of 7,516,192,768 bytes (7 GB).

      That is a difference of - 1,332,826,112 bytes... actually there were more bytes than you should have expected.

      --
      This sig under construction. Please check back later.
    7. Re:Does it matter anymore? by Fweeky · · Score: 3, Insightful

      Um; if your drive's reporting a lot of reallocated sectors you should RMA it -- even with top-end 80G platters, sector remapping happens seldom.

      There are plenty of failure modes which will result in lots of remapped sectors, but that's a side-effect of the drive having difficulty reading/writing in general due to component failure, which to be honest is probably less common now than it has been.. uh.. ever (cooked and/or shocked to death drives excepted).

    8. Re:Does it matter anymore? by Theatetus · · Score: 4, Insightful
      But the point remains that the HD sellers are using the wrong count and the question that comes to the person who knows is "why?". The answer is simple - to mislead

      Maybe I'm being a naive optimist here, but there seems to be a much more sensible reason:

      The way memory is addressed makes it convenient to use the base-2 units.

      Storage is not addressed in a way that makes it particularly convenient to use base-2 units.

      Got that? That's why we use them on memory. Storage is not addressed that way, so like everything else we tend to use base 10 to describe it.

      --
      All's true that is mistrusted
    9. Re:Does it matter anymore? by ergo98 · · Score: 5, Insightful

      "In reality it seems that they want to sell product with decimal G capacities but have customers believe they are buying disk with conventionally calculated capacity and hoping that no one would notice."

      This is all so absolutely ridiculous. Firstly, about 99% of people on the streets, including most computer users, aren't mentally calculating the power of 2 capacities when you say that a hard-drive has 40GB, or a memory module has 512MB -- Instead they mentally have an awareness that 40GB is "big, but 80GB is better", and "512MB is good". I highly doubt they're going to get their shiney new drive, and DRATS! - they have 42949672960 of virus filled emails to fit in there, but instead they only got 40000000000.

      Secondly, hard drive manufacturers, as a general rule, have used the power of 10 rule since before I first became interested in computers about 18 years ago - this is the standard, and if you haven't read the byline "GB refers to 1,000,000,000 bytes" then you just haven't been looking.

      This whole campaign is just contrived and attention seeking nonsense. I suspect that someone just finished their "Computers 101" course, and they think they've discovered an amazing fraud being perpetrated upon the public by those dastardly harddrive manufacturers.

    10. Re:Does it matter anymore? by dotgain · · Score: 3, Funny

      Ah, the 32 gig limit. But that's the 32 x 2^30 B and not 30 x 10^6, so you could comfortably fit, oh, say /afiftygigdrive/ in it no worries.

    11. Re:Does it matter anymore? by miyoo · · Score: 3, Insightful
      It's not really that hard to figure out. AFAIK, ALL hard disk manufacturers report their drive sizes in terms of 10^9 bytes. Because of some grand conspiracy to decieve? No. Simply because statistically speaking a person who walks down the aisle of his local electronics store is more likely to buy the drive with the big number "120" on it than the one that has a "113". Anybody who used the 'binary' system would be giving up a lot of sales because people would simply choose the one with the bigger number.

      AMD started calling their processors names like "XP2000" rather than advertising the clock speed. AMD was getting killed because most people measure the value of their computer by how many GHz it is (AMD being behind Intel), not by how well it actually runs their applications (AMD being comperable). Misleading? Maybe, but I think they pretty much had to do this to stay competetive.

      In other words, they're not lying about hard disk sizes, they're marketing. They don't actually want to deliberately deceive people because that would make their customers angry and give them a bad name. But they do want to influence their customers' perception of the value they are getting from a particular product. Why do you think you're paying $199.99 for that hard disk instead of $200.00?

  2. Gigi? Nah Gibi? Nah by l810c · · Score: 5, Funny
    How much Porn will it hold?

    This one will hold 30 days of Porn

    Now, this one here will hold 45 days of Porn

    Break it down to something Everyone understands

    1. Re:Gigi? Nah Gibi? Nah by orthogonal · · Score: 4, Funny

      How much Porn will it hold?

      This one will hold 30 days of Porn


      Now, now, now, this is just wrong!

      Everybody knows you don't measure porn in days.

      True porn afficianados know that you measure porn in terms of the amount of keyboard cleaning required.

    2. Re:Gigi? Nah Gibi? Nah by Trogre · · Score: 3, Funny

      True porn afficianados know that you measure porn in terms of the amount of keyboard cleaning required.

      Hmmmm

      Help me live longer! ...

      No.... I don't think I'll be doing that.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
    3. Re:Gigi? Nah Gibi? Nah by darkov · · Score: 5, Funny

      Your idea is good, but it needs a unit since "days of porn" clumsy. I propose the "ejac" which is one days worth of porn. Larger units are derived using the usual base 10 system:

      decaejac
      kiloejac
      megaejac
      gigaejac ... and so on

      This is a handy unit since it can be converted into time (1 ejac = 20 minutes), liquid volume (1 ejac = 10cc), sound volume (1 ejac = 90dB) and distance (1 ejac = 75cm).

      If we all pull together, with this as our common goal, we can make the ejac a truly universal unit.

    4. Re:Gigi? Nah Gibi? Nah by Johnso · · Score: 3, Funny
      This is a handy unit...

      Literally.

      If we all pull together...

      Then we'd just have a mess on our hands... and keyboards...

      --
      I'm a signature virus. Please copy me to your signature so I can replicate.
  3. Big whoop by attemptedgoalie · · Score: 3, Interesting

    In the grand scheme of things, drive capacity issues seem to revolve around lawyers more than consumers.

    I wish that the major manufacturers would stop putting 1 BIG drive in the system, and put 2 normal sized ones in and MIRRORED.

    As somebody who gets blasted by customers when they failed to do their backup, an out of the box, pre mirrored system would be far better for the consumer than properly labelling those lost 200 MB.

    Sorry, that's my partially related rant for this evening.

    --
    My mom says I'm cool.
  4. Ditch binary units by achurch · · Score: 4, Insightful

    As far as ordinary users (i.e. anyone who doesn't have to deal with TLBs, memory pages, disk sectors and the like) are concerned, there's really no reason left to use binary units; 2^9 bytes per sector, 8 sectors per filesystem block, etc. are all low-level conveniences that the user shouldn't have to even notice. Though I personally am too used to the binary units to switch easily, the vast majority of users probably wouldn't even notice the difference, aside from their computers finally reporting the right size for their hard disks. Granted, overcoming the huge momentum for binary units will be difficult, but one could always consider it practice for getting the USA to accept metric.

    1. Re:Ditch binary units by Monkelectric · · Score: 4, Informative
      Huh? no reason to use binary units? What are you smoking and can I have some? :)

      The reason we use binary units is for engineering reasons ... Back in the way back time there was no such thing as a disk drive, and there was only ram. Ram had/has to be made in a power of two because it has to completley fill its address space so the NEXT ram chip begins where the other ends. Otherwise you'd have holes in your address space.

      --

      Religion is a gateway psychosis. -- Dave Foley

  5. Strange by autopr0n · · Score: 3, Interesting

    I think it's a little odd that he claimed that Hard drive makers have "Always" done this. I very specifically remember advertisements for hard drives being "One Billion Bytes" (with like a 14 point small print letting us know that it was indeed 1000000000 bytes). After that "billion bytes" became gigabytes and the font became smaller.

    I've also heard that for some drive makers "gigabyte" means 1^20*10^3 (i.e. one thousand megabytes) and things like that.

    --
    autopr0n is like, down and stuff.
  6. WTF? by MarvinIsANerd · · Score: 5, Insightful

    This is not a matter of base-10 vs base-2... a base-10 number is written as "2875" for example. A base-2 number is written as "10100110". A base-16 number is written as "8A3F0"...

    This is a matter of UNITS used - like inches vs. feet, or in this case GiB vs GB.

    Geez, get the terminiology right...

  7. 6 pages?! by TwistedGreen · · Score: 5, Informative
    The 6 pages of the article, summarized in three lines:
    Hard drive manufacturers measure capacity in multiples of 1,000,000,000 (10^9) Bytes.
    Operating systems measure capacity in multiples of 1,073,741,824 (2^30) Bytes.
    Some people get confused because they both call it a gigabyte.
    I really don't think this is such a big deal. OSes are started to specify the proper GiB instead of GB, so there shouldn't be a problem anymore.
  8. Re:Base 2 by EvanED · · Score: 4, Interesting

    Ah, and therein lay the crux of the matter. The problem is that *everywhere else* kilo-, mega-, etc. prefix units (to stop the megapolis argument) they denote powers of 10. A megavolt is a million volts. A kilometer is 1000 meters. A gigahertz is a billion hertz. Only in computer science have people redefined the units to refer to anything other than powers of 10. *That* is what the debate revolves around, and that is what is IMO the mistake of people early on. The solution is to make kilobytes officially be 1000 bytes (as the IEC has) and use a different unit for the powers of two.

  9. I've said this before by Sunlighter · · Score: 4, Insightful

    About two years ago there was a debate about this. Can't remember the details of that debate. Maybe it was when those "mebibytes" were introduced. I still say now what I said then.

    I think there should be "short megabytes" and "long megabytes", and the same for gigabytes. Like this:

    • One short ton is 2,000 pounds and one long ton is 2,240 pounds.
    • One short kilobyte is 1,000 bytes and one long megabyte is 1,024 bytes.
    • One short megabyte is 1,000,000 bytes and one long megabyte is 1,048,576 bytes.
    • One short gigabyte is 1,000,000,000 bytes and one long gigabyte is 1,073,741,824 bytes.
    • One short terabyte is 1,000,000,000,000 bytes and one long terabyte is 1,099,511,627,776 bytes.
    • And so forth...

    Then all we need is to get hard drive manufacturers and OS vendors to state whether they are using short or long tons, er, gigabytes.

    As to abbreviations, take Donald Knuth's suggestion. Use the capital letter twice to suggest binaryness. 1 MMB = one long megabyte; 1 GGB = one long gigabyte. I like this much better than the now-standardized MiB men-in-black abbreviation for long megabytes (which are still not called long megabytes in the standard, they are called mebibytes, which sounds silly and no one uses it).

    Who's with me?

    --
    Sunlit World Scheme. Weird and different.
  10. Re:But seriously by dtfinch · · Score: 4, Insightful

    Those are too hard to pronounce. Who not just distinguish them by prefixing the metric ones with the word "metric", as we do with tons and metric tons.

    kilobyte = 1024 bytes
    metric kilobyte = 1000 bytes

  11. Re:Base 2 by den_erpel · · Score: 3, Informative

    hear hear!

    a CDR 650/700 Mb
    a DVD[+-]R: 4.7 salesman Gb
    = 4.7*1000*1000*1000/1024 = 4589843 kb (= 4.37 Gb)

    AFAIK base-10 is just plain cheating.

    --
    Genius doesn't work on an assembly line basis. You can't simply say, "Today I will be brilliant."
  12. article sidesteps the entire issue by drfireman · · Score: 4, Insightful

    The only relevant issue is the meaning of words like kilobyte, megabyte, and gigabyte. Wiebe describes how you can arrive at two different answers for drive capacity depending on how you define the word "gigabyte," but does so completely uncritically. For example, he describes the drive manufacturer logic and writes that "the drive's claim of 123.5 GB is verified with this simple mathematical formula." But the issue is what the word "gigabyte" means, and the formula presented sheds no light on the word's conventional usage or etymology. I personally was raised to use these terms to correspond the numbers that are powers of two. Wiebe doesn't give me any point of reference to shed light on whether it's reasonable to use the meanings drive manufacturers do. (Of course I already know the answer, but that's beside the point.)

    Wiebe uses some other odd logic, exemplified in point 3.7. He writes that the consumer was never cheated, because a drive advertised as having a capacity of 123.5GB had just that in "decimal based" capacity. This is a bizarre way to characterize the complaints. Consumers who believe they were cheated aren't claiming they didn't get 123.5GB for any definition of the word gigabyte. They're claiming they didn't get 123.5GB by the conventional definition of the word as commonly used in connection with computers. In my view, they're right, although I don't personally get too upset about it.

  13. Old chips, new drives by Flakeloaf · · Score: 3, Funny

    I don't know what he's talking about; my Pentium 66 insists that 1024 x 1024 x 1024 = 1,000,000 exactly.

    --

    Am I the only one who heard Roxette to sing "I'm gonna get blitzed for some sex"?

  14. And yet... by arb · · Score: 3, Insightful

    ...he ignores the fact that HD manufacturers are happy using bytes which are 8 bits, all the while flaunting the established convention that MB/GB refers to binary megabytes and binary gigabytes. Why don't they specify the size of their HDs in bits?

  15. Re:Damnit - It happened to me today! by shadowcabbit · · Score: 3, Funny

    Even better-- pay in Canadian currency. That way it really is smaller. "I paid you 299 dollars and ninety-nine cents, just like we agreed upon. The fact is, you never specified the American dollar or the Canadian dollar, so I just used the unit more convenient for me."

    --
    "Why Subscribe?" Good question...
  16. Re:Naming reference by Piquan · · Score: 4, Insightful

    But personally I strongly reject this "kibibytes" attempt at CS revisionist history. Stick with what CS people have been using as measurements for decades, I say,

    Why shouldn't CS people stick to what the rest of the sciences have been using for decades, that "kilo" means 1000? This CS thing of making "kilo" stand for 1024 is an attempt at revisionist history.

    There's always another perspective.

  17. Re:But seriously by danheskett · · Score: 3, Insightful

    Or we could just beat the hard-disk manufacturers with a stick until they understand that most people expect 1 kilobyte to be 1024 bytes :P

    You are out of touch. If you conducted a scientific survey of 100 random adults who own PCs and asked them:

    "How many bytes are in a kilobyte?" you really think that more than 50 would answer "1024"?

    I'd be surprised if more than 10 did, personally.

    100% of the non-geek population equates kilo with base 10, not base 2.

  18. Article inaccurate and uninformed by rpwoodbu · · Score: 3, Informative
    The basic point of the article is accurate: that HDD manufacturers use "standard" metric prefixes and OSes use "computer-ese" "metric-esque" prefixes, thus the confusion. However, the article notably lacks in these areas (and perhaps less notably in others):
    • It uses terms like "binary math" versus "decimal math". Last I checked, they were both equally viable ways of doing math, and as any viable method of doing math should be, they both always get the same answer! See section 3.5 if you want to get really mad! It isn't that the math is different that is causing a problem, it is that the algorithm is different. It just so happens that the algorithm was inspired by a number which is convenient when dealing with binary because it is an even power of 2.
    • There is no discussion of why HDD makers use normal math while OS makers use "computer-ese". It isn't wholly discountable that HDD makers are interested in making their drives look as big as possible against the competition, and if one manufacturer says a Gigabyte is 10^9 bytes then they all have to. And he paints the 1024-byte KiloByte basically as a stupid idea, which it isn't (albeit confusing).
    • The explanation (such as it is) for how much data is lost to OS overhead is inaccurate at best. He got his info for the Mac from the Drive Utility (akin to Disk Management or fdisk in MS-land), but got his WinXP info probably from the explorer. Fdisk will not report any filesystem size considerations, just the partition sizes, so neither should the Drive Utility. I'm betting the 1026 "lost" bytes are the partition table. This makes it look like the Mac loses 1026 bytes, while Windows tosses about 11 MB out the door. While I'm not trying to advocate for Windows, that simply isn't fair. He goes on to say that he has "no explanation for these variations", which brings me to my next point.
    • He can't explain the size variations between OSes, yet he makes this statement:
      We note that operating systems take a portion of drive capacity for use as file tables. A typical drive utilizes 70MegaBytes for this function, which is not significant on a drive with a capacity of 120GB.
      So now he's trying to explain it, and not doing a very good job. First of all, the FS overhead will vary roughly proportionally to the size of the partition, so giving out a number like 70 MB and saying that a "typical drive" loses this much is careless at best. Secondly, I'm not conviced that he doesn't actually have 70 MB of data on that drive. There's no accounting for the 11 MB that aren't showing up as "used", which sounds like FS metadata to me. I don't have a drive handy to format, so I don't know if Windows shows "0 used" on a clean NTFS drive or not (oh, is he using NTFS or FAT32... the world may never know). The bottom line: he should have used the Disk Management tool to compare apples to apples (no pun intended).
    • And the bottom bottom line is that he's in the storage business, and shouldn't be so ignorant. He's got a degree in mathematics for crying out loud!
    I appreciate that this needs to be explained, and I know all too well that the average computer user (read average American) can hardly count, much less do it in binary, so a simple explanation is good. But I never think things should be simplified to the point of gross inaccuracy. This is just further compounded with the obvious lack of a clue. Someone write a better (and perhaps shorter) account for this, please!
  19. Re:KiB, MiB, GiB by MikeBabcock · · Score: 3, Interesting

    I have to agree here. I've been using computers since the early 80's and the "kilo" or "mega" notation was well understood then to be an approximation, at least in my circles, to their decimal prefix equivalents.

    "A kilobyte in a computer is 1024 bytes only because in base-2 it is simpler to count in 1024's than in 1000's"

    That said, and everyone learned that back when people had to learn about computers (instead of growing up with them), this approximation is *still* just an approximation.

    Just because you grew up thinking a kilo meant 1024 because you're in a non-metric country doesn't mean a kilo means 1024. It means your predecesors didn't bother using a different name for a different number (back when "the world will never need more than maybe 10 computers").

    Mebi is available now ... use it; point out to the rest of the world that MB is inaccurate and should mean 1000*1000 bytes, that MiB in fact *means* 1024*1024 bytes and this will solve our confusions within a generation.

    --
    - Michael T. Babcock (Yes, I blog)
  20. Computers and Cars by vraxoin · · Score: 3, Insightful

    This issue reminds me of a practice used in another industry. The auto industry commonly reports horsepower and torque for their cars as measured at the engine's crank/flywheel vs at the wheels. While the measurements themselves are an accurate reflection of an engine's general performance alone you typically do not just buy an engine, you buy a system which is the car. When the engine's performance in measured within the context of the car--meaning at the wheels--then the truth is revealed. That revelation shows, on average, a loss of 10-20% when power is measured at the wheels vs the crank. Which spec do you think a manufacturer is going to release?