Apple Kicks HDD Marketing Debate Into High Gear
quacking duck writes "With the release of Mac OS X 10.6 Snow Leopard, Apple has updated a support document describing how their new operating system reports capacities of hard drives and other media. It has sided with hard drive makers, who for years have advertised capacities as '1 GB = 1,000,000,000 bytes' instead of the traditional computer science definition, and in so doing has kicked the debate between marketing and computer science into high gear. Binary prefixes for binary units (e.g. GiB for 'gibibyte') have been promoted by the International Electrotechnical Commission and endorsed by IEEE and other standards organizations, but to date there's been limited acceptance (though manufacturers have wholeheartedly accepted the 'new' definitions for GB and TB). Is Apple's move the first major step in forcing computer science to adopt the more awkward binary prefixes, breaking decades of accepted (if technically inaccurate) usage of SI prefixes?"
The SI prefixes have been around for nearly 5 decades, and have a specific meaning used by everybody. Every scientist uses them in one way or another, and for every last one of of them, they have the same meaning.
Why can't we, the C.S. people, accept that?
Giga is 10^9. It has been 10^9 since it was created. It was never, ever meant to be anything but 10^9.
If you want to talk about 1024^3, then it's Gibi. Gibi is 2^30 since it was created. It was never, ever meant to be anything but 2^30.
Get over it.
(and yes, I try to always use GiB whenever it's appropriate).
http://en.wikipedia.org/wiki/IEEE_1541
These IEEE recomendations seam like common sense to me.
1 KB = 1,000 bytes
1 MB = 1,000,000 bytes
1 GB = 1,000,000,000 bytes
And for you droids and androids out there:
1 KiB = 1,024 bytes
1 MiB = 1,048,576 bytes
1 GiB = 1,073,741,824 bytes
I always thought it was just clueless marketing morons who couldn't do math. The same group of people responsible for marketing CRT-based monitor sizes (the TUBE is 17", including behind the 2" bezel), tape drive storage capacities (assuming 2:1 compression ratio!) and all electronics battery life measurements (examples too numerous to list).
I can't count the number of times I had to explain to people who bought an extra hard drive where 3% of it disappeared when they checked the size in Windows Explorer.
While Apple is certainly rules by the marketing drones, they aren't morons by any stretch of the imagination. I think the engineering people finally just gave in when their grandmother called and asked why her new 500 GB drive was only showing 482 GB when installed. I can hear them crying with frustration all the way over on the other coast.
Learning HOW to think is more important than learning WHAT to think.
For 1KB the difference is 24 bytes.
For 1MB, 2**20 - 10**6 = 48576, 48KB difference or 4.6% less than the larger of the two.
For 1GB, 2**30 - 10**9 = 73741824 (73MB), 6.9%.
For a 1TB hard disk you're being short-changed by 9%: 94 gigabytes!
The Darwin versions of utilities like du and df have had the -h and -H (human readable numbers with either binary or decimal prefixes) the opposite way around to FreeBSD since 10.5. They made the existing switches, that had always reported the power-of-two sizes, display the power-of-ten ones and moved the old behaviour to the new option. In FreeBSD, they added new options for the power-of-ten versions. I wondered why my files suddenly became smaller after copying them to a FreeBSD machine for a while before I noticed this.
I am TheRaven on Soylent News
Kilo = 1<<10
Mega = 1<<20
Giga = 1<<30
Tera = 1<<40
Goes by ten!
No, you still get around 6-7 Gigs back by installing Snow Leopard, but it's reported as higher than that. When we installed it on a Macbook Pro 13" at work, we actually got 15 gigs back. Which was puzzling until we learned that everything was counted in base 10 now, so it makes sense and it is as Apple advertised.
"Not to mention all the idiots who use words like boxen."
Anonymous Coward on Monday August 04, @06:49PM
The "standard"? All of the standards associations recommend using G/M/K as prefixes with the base-10 meanings, and using the unambiguous Gi/Mi/Ki (gibi/mebi/kibi) for base-2 measurements. One standards organization was willing to allow the deprecated use of G/M/K as base-2 for measuring semiconductor memory (i.e., RAM) only.
Do you also recommend that we will suddenly measure disk drive capacity in a different unit if/when we all move to using quantum computers or computers based on some other new currently unfamiliar technology?
Oh, and BTW, at least one of the technologies which has a small chance of replacing current RAM technologies, phase-change memory, could theoretically store 3 or 5 states per unit cell instead of 2 or 4, given the right material undergoing the phase change. One of the reasons not to do it is because it would be a pain to convert to and from base-2 to interface with the computer, so in the long run it is possible (but not necessarily likely, because there is a large initial development cost) that some computing devices will be designed to work in base-3 or base-5 if only to better utilize the abilities of PCM.
Because as far as disk space occupation goes, that file may as well be 16KB.
A bigger issue, for me, is why the stupid Finder reports file sizes based on blocks! This makes no sense. I can plug in a flash drive, and the Finder will report that a 12KB file, copied to the desktop, is now a 16KB file. This isn't rocket science, FIX IT already, Apple!!
Well, given an 8k or 16k block size, a 12k file *DOES* consume 16k of usable disk space. Plus 600-700 bytes for the inode and directory entry. Plus more if there's any magic Apple-y metadata associated with the file.
For what reason do you expect any filesystem browser to report the exact number of bytes in a file? I'm almost always more interested in knowing how much disk space is used by the file - 16k in your example. In a filesystem like JFS that dynamically allocates inodes, I might even expect it to report the space used by the inode. FWIW, 'du' will report 16k in your example as well. Is 'du' wrong too?
Also, what should it report for directories? Taking a directory of the source of GHC 6.10.4 on my computer as an example (lots and lots of smallish source code files):
$ find . -type f -exec cat {} \; | wc -c
29776950
$ du -sk .
35036 .
Those numbers don't match (taking into account the conversion between bytes in the first case and kb in the second), but I can't see a reason ever to care about the first one. It's not even a very good indicator of what size an uncompressed tar file would be.
Finally, I just went and took a look at a small file on the desktop of my mac. "Get Info" tells me:
Size: 8 KB on disk (782 bytes)
So it *does* report the number of bytes in the file, as well as the disk usage, clearly labeled. Now I really don't exactly know what you're whining about.
Additive identity, multiplicative cancellation, distributive multiplication over addition: pick any two (unless 1 = 0)
...then all the English speaking countries should switch to metric according to your logic..
Actually, according to this, the US is one of three backwards countries that are not using the metric system.
According to the US CIA World Factbook in 2006, the International System of Units is the official system of measurement for all nations except for Burma, Liberia, and the United States.
I hate our system and I use metric on my own. My car is all metric. I just have to go back to the old system when communicating with others.
Because they make the disk with a sector size of 512 bytes (likely 4096 bytes inside the drive)
With modern drives and most especially flash drives, the CHS values normally are physically meaningless.
Except, with a flash drive the erase block size is likely to be 2^19 or 2^20 bytes. It's easy to set the drive so that the cylinders are 1048576 bytes, just set the heads to 64 and the sectors to 32. Each cylinder is then 1Mbyte, one real megabyte and one or two erase blocks.
Then 2^20 bytes is a reasonable size for an allocation unit too.
The smallest power of 10 that has 512 as a factor is 10^9. That is far too large for a cylinder or an allocation unit, even on a terabyte drive.
To put it bluntly, they use powers of two unless it's needed to con the consumer.
'du', disk use, obviously should describe the actual used space on the drive, as that is the name of the program. I, however, would rather any other form of file management to note the physical size of the data in the file. Checking file sizes against, say, a website you just uploaded is a quick and easy way to ensure it all transferred for example.
!Equality through palindromes semordnilap hguorht ytilauqE!
this should be
1 kB = 1,000 bytes
there is no such thing as Kelvin Bytes
This is revisionist at best and really just wrong. Despite all "wisdom" to the contrary, there has never been a universal acceptance of 1 MB = 2^20 bytes on computers. For instance, all of IBMs mainframe hard drives from the 60s and 70s were sold using base-10 prefixes. Early desktop hard drives from the 80s used both. I think the ST506 used base-2, but some other models used base-10. All networking and communications standards (ethernet, modems, PCI, SATA...) use base 10 prefixes for MB/s and Mbit/s. 3.5" floppy disks used NASA-style units where 1 MB = 10^3*2^10. Even while RAM is still almost always measured in base-2 units (due to manufacturing issues making it much easier to produce in power-of-2 sizes -- something which is not true for hard drives) the speed of the memory bus on your CPU is still measured in base-10 units.
It is a *good* idea to have K and M mean the same thing everywhere. A system where a 1 GB/s link transfers 0.93 GB every second is stupid. This is especially important as computers are being used in more and more environments. Should a 1 megapixel camera mean 2^20 pixels? What about CDs with a 44.1 KHz sampling rate?
The negative terminal of a battery supplies the electrons and they move from negative to positive when a conductor is placed between the two poles. The two popular notations for charge flow, "Conventional Flow Notation" and "Electron Flow Notation", do not dispute this. The difference is that "Electron Flow Notation" illustrates the physical movement of electrons (from "negative" to "positive") and "Conventional Flow Notation" illustrates the "movement" of the electrical charge from the "positive" terminal to the "negative" terminal. As electrons move from - to +, the "positive" side of the battery becomes less positive in relation to the "negative" side, effectively meaning that the electrical charge is moving from + to - (in "Conventional Flow Notation"). The electrons are still moving from the - battery terminal to the + battery terminal, though.
No they won't. Network speeds use the same terminology that RF engineers use: base-10 prefixes.
Can you be Even More Awesome?!
You can specify a number of blocks instead of percent with -r
Isn't the physical size what it takes up on the physical media it is stored on (i.e., the same as "disk use"); I think what you mean is the logical size.
Because as far as disk space occupation goes, that file may as well be 16KB.
OS X reports disk space better than Windows, Finder reports a 2.5MB file as taking 2,572,834 bytes of disk space. And it depends on what file system the disk uses and the size of the clusters. The smaller cluster, the minimum file sizes can be, the less space is wasted.
At least OS X used to report disk space better, but with this change to Snow Leopard it's no longer true.
Falcon
Should there be a Law?
Uh.. the inch is technically an SI unit. It is defined as exactly 2.54 cm.
No, it's not. SI uses the metre for length measurement, and nothing else. You can alter it with the various prefixes, and there's is only one thousand meters in a kilometre, not twenty-four more.
The "inch" from the United States customary units is defined as 2.54 centimetres, but it doesn't make it part of the SI..
OS X reports disk space better than Windows, Finder reports a 2.5MB file as taking 2,572,834 bytes of disk space.
Which version of Windows are you talking about? There would seem to be a "Size on disk" field in the properties dialog of at least XP and 7, and I'm pretty sure it's been there in several older versions.
The prefixes (which is what the arguments concern) are.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
how did this get modded up? this is misinformation.
du(1) man page (snow leopard):
-H Symbolic links on the command line are followed, symbolic links
in file hierarchies are not followed.
-h "Human-readable" output. Use unit suffixes: Byte, Kilobyte,
Megabyte, Gigabyte, Terabyte and Petabyte.
df(1) man page (snow leopard):
-H "Human-readable" output. Use unit suffixes: Byte, Kilobyte,
Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the
number of digits to three or less using base 10 for sizes.
-h "Human-readable" output. Use unit suffixes: Byte, Kilobyte,
Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the
number of digits to three or less using base 2 for sizes.
this is exactly same output as man pages fro those two in FreeBSD 6.1
this is man page from debian linux:
-h, --human-readable
print sizes in human readable format (e.g., 1K 234M 2G)
-H, --si
likewise, but use powers of 1000 not 1024
so it seems to me that behavior of darwin is exactly same as gnu tools.
You do know that 'byte' is defined as the smallest addressable unit in a system, and is not always 8 bits? There have been computers that used 6, 7, 8, and 9 bit bytes.
Once computers started using integrated circuits, there was motivation to standardize on 8 bit bytes in order to use commodity parts. But byte is ambiguous enough that communications standards use the term octet instead.
If a computer is built with bit-wide parts (tubes, transistors, diodes, early ICs), a byte might not be eight bits. If it is built with parts wider than one bit, it's safe to assume eight bit bytes.