Hard Drive Capacity Confusion, Lucidly Explained

← Back to Stories (view on slashdot.org)

Hard Drive Capacity Confusion, Lucidly Explained

Posted by timothy on Tuesday October 7, 2003 @05:26PM from the fudging-the-numbers dept.

mrklin writes "James Wiebe of wiebetech.com has written a clear example of how hard drive capacity is calculated (PDF file) by hard drive manufacturers (base 10) and OS (base 2). He failed to name how the capacity should be described, though."

7 of 482 comments (clear)

Min score:

Reason:

Sort:

6 pages?! by TwistedGreen · 2003-10-07 17:45 · Score: 5, Informative

The 6 pages of the article, summarized in three lines:
Hard drive manufacturers measure capacity in multiples of 1,000,000,000 (10^9) Bytes.
Operating systems measure capacity in multiples of 1,073,741,824 (2^30) Bytes.
Some people get confused because they both call it a gigabyte.
I really don't think this is such a big deal. OSes are started to specify the proper GiB instead of GB, so there shouldn't be a problem anymore.
Re:Does it matter anymore? by dtfinch · 2003-10-07 17:50 · Score: 4, Informative

I'm a whiny nerd, and it doesn't matter much to me whether hard disk manufactures define sizes in multiples of base 10 or base 1010.

But I want to know how each drive handles error correction. A sector isn't REALLY 100000000 bytes when stored on disk, but has extra information to help it detect and correct most small errors. Some manufacturer could skimp on the error correction to increase storage capacity or reduce cost, but the drive would likely crap out sooner than others on the market.
Re:Base 2 by den_erpel · 2003-10-07 18:02 · Score: 3, Informative

hear hear!

a CDR 650/700 Mb
a DVD[+-]R: 4.7 salesman Gb
= 4.7*1000*1000*1000/1024 = 4589843 kb (= 4.37 Gb)

AFAIK base-10 is just plain cheating.

--
Genius doesn't work on an assembly line basis. You can't simply say, "Today I will be brilliant."
Re:Ditch binary units by Monkelectric · 2003-10-07 18:58 · Score: 4, Informative

Huh? no reason to use binary units? What are you smoking and can I have some? :)
The reason we use binary units is for engineering reasons ... Back in the way back time there was no such thing as a disk drive, and there was only ram. Ram had/has to be made in a power of two because it has to completley fill its address space so the NEXT ram chip begins where the other ends. Otherwise you'd have holes in your address space.

--
Religion is a gateway psychosis. -- Dave Foley
Re:Does it matter anymore? by kryonD · 2003-10-07 19:47 · Score: 3, Informative

Please take note that the amount of free space on an empty, but FORMATTED hard drive will always be a noticable chunk less than full capacity as the OS requires storage space overhead for the file system.

I just finished explaining this to someone who was whining about their 128MB USB keychain drive only having 123MB of space.

Your directory structure has to be kept somewhere.

--
I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
Re:Does it matter anymore? by vrt3 · 2003-10-07 20:19 · Score: 3, Informative

if I'm going to buy a 120 GB hard drive, i expect there to be 120 * 2^30 = 128,849,018,880 bytes on the drive.
if I'm going to buy a 120 GB hard drive, I expect there to be 120 * 10^9 = 120,000,000,000 bytes on the drive.
The hard drive I got had 113 GB (113*2^30 = 121,332,826,112 bytes).
The hard drive I got had 113 GiB (113 * 2^30 = 121,332,826,112 bytes).
That is a difference of 7,516,192,768 bytes (7 GB).
That is a difference of - 1,332,826,112 bytes... actually there were more bytes than you should have expected.

--
This sig under construction. Please check back later.
Article inaccurate and uninformed by rpwoodbu · 2003-10-07 20:44 · Score: 3, Informative
The basic point of the article is accurate: that HDD manufacturers use "standard" metric prefixes and OSes use "computer-ese" "metric-esque" prefixes, thus the confusion. However, the article notably lacks in these areas (and perhaps less notably in others):
- It uses terms like "binary math" versus "decimal math". Last I checked, they were both equally viable ways of doing math, and as any viable method of doing math should be, they both always get the same answer! See section 3.5 if you want to get really mad! It isn't that the math is different that is causing a problem, it is that the algorithm is different. It just so happens that the algorithm was inspired by a number which is convenient when dealing with binary because it is an even power of 2.
- There is no discussion of why HDD makers use normal math while OS makers use "computer-ese". It isn't wholly discountable that HDD makers are interested in making their drives look as big as possible against the competition, and if one manufacturer says a Gigabyte is 10^9 bytes then they all have to. And he paints the 1024-byte KiloByte basically as a stupid idea, which it isn't (albeit confusing).
- The explanation (such as it is) for how much data is lost to OS overhead is inaccurate at best. He got his info for the Mac from the Drive Utility (akin to Disk Management or fdisk in MS-land), but got his WinXP info probably from the explorer. Fdisk will not report any filesystem size considerations, just the partition sizes, so neither should the Drive Utility. I'm betting the 1026 "lost" bytes are the partition table. This makes it look like the Mac loses 1026 bytes, while Windows tosses about 11 MB out the door. While I'm not trying to advocate for Windows, that simply isn't fair. He goes on to say that he has "no explanation for these variations", which brings me to my next point.
- He can't explain the size variations between OSes, yet he makes this statement:
  We note that operating systems take a portion of drive capacity for use as file tables. A typical drive utilizes 70MegaBytes for this function, which is not significant on a drive with a capacity of 120GB.
  So now he's trying to explain it, and not doing a very good job. First of all, the FS overhead will vary roughly proportionally to the size of the partition, so giving out a number like 70 MB and saying that a "typical drive" loses this much is careless at best. Secondly, I'm not conviced that he doesn't actually have 70 MB of data on that drive. There's no accounting for the 11 MB that aren't showing up as "used", which sounds like FS metadata to me. I don't have a drive handy to format, so I don't know if Windows shows "0 used" on a clean NTFS drive or not (oh, is he using NTFS or FAT32... the world may never know). The bottom line: he should have used the Disk Management tool to compare apples to apples (no pun intended).
- And the bottom bottom line is that he's in the storage business, and shouldn't be so ignorant. He's got a degree in mathematics for crying out loud!
I appreciate that this needs to be explained, and I know all too well that the average computer user (read average American) can hardly count, much less do it in binary, so a simple explanation is good. But I never think things should be simplified to the point of gross inaccuracy. This is just further compounded with the obvious lack of a clue. Someone write a better (and perhaps shorter) account for this, please!