Changes in HDD Sector Usage After 30 Years
freitasm writes "A story on Geekzone tells us that IDEMA (Disk Drive, Equipment, and Materials Association) is planning to implement a new standard for HDD sector usage, replacing the old 512-byte sector with a new 4096-byte sector. The association says it will be more efficient. According to the article Windows Vista will ship with this support already."
Well, CD-ROMs use 2352 bytes per sector, ending up with 2048 actual bytes after error correction. Looking at the size of the HDDs these days a 4096-byte sector seems pretty reasonable.
Serving time in Aristotelean prison for violating laws of physics
I thought cluster sizes were already 4KB for efficiency, and LBA for larger drive sizes. So how does changing the sector size change things? (Especially when we don't access drives by sector/cylinder anymore?)
In Soviet Russia, articles before post read *you*!
Why not a 32768 bit sector?
So... If I write down a little 16-byte message to myself in Notepad containing a name and a phone number, it will take up 4096 bytes.
On most systems in use today, it already does.
Blame the file system, not the sector size on the media.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
You're thinking of 'cluster'. This is tied to the file system that is actually used on the disk. Even with the current 512-byte sector, a normal NTFS partition of, say, 200GB, uses 4KB cluster and a single file takes up a minimum of 4KB already.
Serving time in Aristotelean prison for violating laws of physics
Most "normal use" filesystems nowadays (FAT32, Ext3, HFS, Reiser) all use 4K blocks by default. That means that the smallest amount of data that you can change at a time is 4k, so every time you change a block, the HDD has to do 8 writes or reads. That would leave the drive preforming 8x the number of commands that it would need to.
As filesystems are slowly moving towards larger block sizes, now that the "wasted" space on drives due to unused space at the ends of blocks are not as noticable, moving up the size on the underlying hardware also makes sense. I don't think that this can make things too much faster, but it would allow SATA drives (and SCSI also) to quesu more commands in their internal buffers, as they will onyl be recieving one command per read/write that the filesystem does, instead of 8.
My UID is prime and so is this number: 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0.
NTFS will write something that small into the MFT.
Best analogy is a gym locker room
You have say, 10 lockers up and 20 lockers accross
You can only put one thing in a locker, so you cant put your gym shorts in the same one as your shoes. But if you have lots of socks, you can pile them in, and take up two or three if neccessary.
Space is wasted if you have a really big locker, but it's only holding a sock.
Now, you've got to record where all of this stuff is, or you will take forever to find that sock. So you set asside a locker to hold the clipboard with designations.
Now to bring this back into real life. There are a _lot_ of sectors on a disk. So keeping track of all of them starts requiring a substantial amount of resources. I imagine they are finding it easier to justify wasting space for small files in order to make it easier to keep track of them. Average file sizes are also going up, so it's not as big of a problem as it used to be either. It's all relative...
No, I am not an English major. My posts are subject to typos and incorrect grammar. Do not expect perfection.
Small devices like cellphones typically save files of several kilobytes, whether they be the phonebook database or something like camera images. Whether the data is saved in a couple large sectors or 8 times that many small sectors isn't really an issue. Either way will work fine, as far as the data is concerned. The biggest problem is the amount of battery power used to transfer those files. If you have to re-issue a read or write command (well, the filesystem would do this) for each 512-byte block, that means that you will spend 8 times more energy (give or take a bit) to read or write the same 4k block of data.
Also, squaring away each sector after processing is a round trip back to the filesystem which can be eliminated by reading a larger sector size in the first place.
Some semi-ATA disks already force a minimum 4096-byte sector size. It's not necessarily the best way to get the most usage out of your disks, but it is one way of speeding up the disk just a little bit more to reduce power consumption.
Also, Solitaire will be replaced by Duke Nukem Forever on every shipped copy of Vista. And if you're one of the first 100 in line at any Best Buy when you pick up Vista, you will also get a free Phantom game console.
Well of course Vista will ship with this supported already. Just like WinFS...er..
Um, it already does take up 4K or more. Unless you have a hard disk smaller then 256MB.
p pro/reskit/c13621675.mspx and scroll down to Table 13-4
See: http://www.microsoft.com/technet/prodtechnol/winx
If you notice, in most of the useful cases the custer size is 4K. Making the hard disk match this seems like a good idea to me.
And EXT2 also uses a 4K block size.
Also remember it's for large disks, no FS that I know of supports a cluster (or block) size smaller then 4K for large disks.
-Ariel
I'm willing to sell this account for the right price.
Actually, if you're using NTFS, the data will be stored directly in the file entry in the MFT, taking zero dedicated clusters or sectors. The maximum size for this to happen is like 800 bytes.
Here's a short description of how NTFS allcates space. On volumes larger than 2GB, the cluster size (the granularity the FS uses to allocate space) was 4k already unless you specified something else when formatting the drive. Also, Windows NT has supported disk sector sizes larger than 512 bytes for a long time; it's just that anything else has been rare.
I'm sorry, Your response has to be in some form of star-trek (or sci-fi) I would have accepted this however...
Best analogy is Spock's gym locker room
Spock has say, 10 space lockers up and 20 space lockers accross
Spock can only put one thing in a locker, so Spock cant put his gym shorts in the same one as your shoes. But since Spock has lots of socks, He can pile them in, and take up two or three if neccessary.
Space is wasted if Spock uses a really big locker, but it's only holding a sock.
Now, you've got to record where all of this stuff is, or you will take forever to find that sock. (I guess the tricorders are broken) So Spock sets aside a locker to hold the clipboard with designations.
Now to bring this back into real life. There are a _lot_ of sectors on a disk. So keeping track of all of them starts requiring a substantial amount of resources. I imagine they are finding it easier to justify wasting space for small files in order to make it easier to keep track of them. Average file sizes are also going up, so it's not as big of a problem as it used to be either. It's all relative...
Competent file system handlers can use disk blocks larger or smaller than the file system block size, but there are some benefits to using the same number for both. Although it may provide more data-per-drive to use larger blocks and you can index larger drives with 32-bit numbers, the drive has to use better (larger and more complex) CRCs to ensure sector data integrity integrity, the granularity of replacement blocks may end up wasting more space simply to provide an adequate count of replacements, and there are still some disk space management tools that insist on working in terms of "cylinders", regardless of the fact that the disk drives have had variable density zones for ages. The range from 4K (common disk block size) to 16K works as a decent compromise.
"Back in the day" running System V on SMD drives, where you could use almost any block size from 128 Bytes to 32K (the CRCs were weak after that) and control the cylinder-to-cylinder offset of block 0 from the index, I spent a few days trying different tuning parameters and found that, due to the 4K size of the CPU pages, and of the file blocks and swap it really did give a significant improvement in performance. I tried 8K and 16K, because the file system handler could be convinced to break them up, but didn't get any better performance, so used 4k for the spares granularity.
Perhaps I should take one of my late-model SCSI drives, which support low-level reformatting, and try the tests again. 16KByte file system blocks on 16KByte sectors might really be a win now. Have to do some research to see what I can do with CPU page sizes, too.
Simple answer - every file would then have a minimum size of 4MB
What are you listening to? (http://megamanic.blogetery.com/)
HDD manufacturers are looking to increase the amount of data stored on each platter. With larger sector sizes, the HDD vendor can use more efficient codes. This means better format efficieny and more bytes to the end user. The primary argument being that many OSes already use 4K clusters.
During the transition from 512-byte to 1K, and ultimately 4K sectors, HDDs will be able to emulate 512-byte modes to the host (i.e. making a 1K or 4K native drive 'look' like a standard 512-byte drive). If the OS is using 4K clusters, this will come with no performance decrease. For any application performing random single-block writes, the HDD will suffer 1 rev per write (for a read-modify-write operation), but that's really only a condition that would be found during a test.
Almost all filesystems I know of use at least 4Kb clusters. NTFS does come with 512 byte on smaller partitions.
LBA accesses on sector boundaries, so for larger HDD's, you need more bits (currently 28-bit LBA, which some older bioses support, means a maximum of 128GB- 2^28*512=2^28*2^9=2^37) Since 512-bytes were used for 30 years, I think it is easy to assume it will not last for 10 more years (getting to LBA32 limit). So why not shave off 3 bits and also make it an even number of bits (12 against 9).
Also there is something called "multible block access" where you make only one request for up to 16 (on most HDD's) sectors. For 512-byte sectors you have 8K, but for 4K sectors that means 64K. Great for large files (IO overdead and stuff).
On the application side this sould not affect anyone using 64-bit sizes (since only the OS would know of sector sizes), as for 32-bit sizes it already is a problem (4G limit).
So this sould not be a problem because on a large partition you will not have too much wasted space (i have around 40MB wasted space on my OS drive for 5520MB of files, and I would even accept 200MB)
It only means that a 4MB block would be the smallest atomic unit you could write on a disk. Writing to parts of it would require to first read it, then modify it, then write it. A lot of FS would implement this by always caching full blocks. But you could still pack many files in a single block. Most FS already work with pretty large (logical) block sizes (16KB ain't uncommon) and will "fragment" them for very small files. Databases often compact records end to end in a block.
But of course, 4MB is fscking large. One problem would be to make them truly atomic. Current drives are supposed to have enough power to be able to complete a 512 bytes writes even if power is lost and stuff likes that.
Want to write a single byte? Then read 4MB, modify 1 byte, and write 4MB back to the disk.
That's a bonus for all those boot-sector virus writers - 8 times more space to do their dirty deeds...
But really...think about this: if each sector has overhead, then any file over 512 bytes will have less overhead, and you'll effectively get more space in most cases. What percentage YOUR files are less than 4k?
You could have added MS with FAT32 and NTFS. The problem is we're not talking about filesystem cluster sizes, which are software-configurable, but the disks' actual sector size, which is hardware that HFS+ has no effect on.
Man that takes me back. Where's my toupee....
Do not mock my vision of impractical footwear
The real reason for this is that as densities go up, the number of bits affected by a bad spot goes up. So it's desirable to error correct over longer bit strings. The issue is not the size of the file allocation unit; that's up to the file system software. It's the size of the block for error correction purposes. See Reed-Solomon error correction.
All major Linux file systems (except XFS) already support arbitrary sector sizes up to 4096 bytes, e.g. for s/390 Mainframes that traditionally use 4096 byte sectors on Linux.
The poeple who would need to write support for this are Jeff Garzik (libata) and James Bottomley (scsi). It's not that this would require a terribly complicated patch though.
Modern DASD architecture is almost completely hidden from the user. In the (good?) old days system software needed to interface closely with the DASD and needed to understand the hardware architecture to gain maximum performance from the devices (I know because I work on such systems within an IBM mainframe environment on airline systems which require extremely high speed data access).
Nowadays the disk 'address' of where the data actually resides is still couched in terms that appear to refer to the hardware itself but in 'serious' DASD subsystems (e.g. the IBM DS8000 enterprise storage systems )the actual way in which the hardware handles the data is masked from the operating system. Data for the same file is spread across many physical devices and some version of RAID is used for integrity.
The 4096 value for data 'chunks' has to do with the most efficient amount of data that can be transmitted down a DASD channel (between a host and storage in large systems or the bus in self-contained systems)
The idea of a 'file address' would cease to exist and it would be replaced by a generic 'data address' if it weren't for the in-built assumptions about data retrieval within all current Operating Systems.
5 K becomes 8 K.. times 500 ... is a whopping 1.5 MEGAbytes wasted. I mean, that is more than fits on a floppy. What a waste.
Actually a 200 GB drive can still store 25 million files. How many fonts do you have?
FWIW the advantage is in the error correction. For a 1 bit secotro size, you'd need 3 bits to store it with error correction. As the block becomes larger, the error correction becomes more powerful. That is where the advantage is.
Of course data can still be stored byte-wise on the disk - it is only that a small update will require a read-modify-write transaction.
Windows Vista will ship with this support already.
Oh YEAH? Well Linux has had support for it for eleventeen years, and the Linux approach is more streamlined anyway!
I know I'm tired because I misread the first name as Inigo and the next thing through my head was
"Hello. My name is Inigo Molnar. You changed the sectors. Prepare to die."
Informative, but wrong.
Some file systems can pack multiple tail fragments into one block.
Watch this Heartland Institute video
4Kbyte is the size of a page of memory on all modern architectures. Given all modern operating systems use demand page loading of executables, and implement paging (swap space), a sector size that matches the size of a memory page will probably result in better performance.
Oolite: Elite-like game. For Mac, Linux and Windows
Hmm. This reminds me of the time when I bought my first external Firewire drive (120Gb) and used it to back up my 10Gb iMac, which had lots of small files (fonts, Word 5.1 documents, etc). Those 10Gb of backups ended up occupying 90Gb of drive space because the external drive had been pre-formatted with some large sector size, and even the smallest file took up half a megabyte! So I had to reformat the drive and start again...
You must think in Russian.
All modern operating systems do demand page loading of executables and use paging space on disk (the swapper). Memory pages are all 4Kbyte on all the CPU architectures we are using at the moment in a personal computer. Therefore, 4Kbyte is probably the ideal size (since now loading a page into memory takes only one read command instead of 8). Making it bigger than 8Kbyte would complicate VMM design (since if you only need to load one page, you now wind up loading two and having to throw one away, or at best, you'd wait twice as long while 8kbyte loads instead of 4kbyte).
Oolite: Elite-like game. For Mac, Linux and Windows
That already exists. It's called a "child." Geeks might think they are hard to obtain, but in fact they tend to pop up unexpectedly quite often. They also have an audio interface, are touch-sensitive, run off of bio-mass fuel, and can even do the dishes after they have been around for a few years. They can be attached to a Playstation or an iPod too. When you first get them they are quite noisy and smelly with a few leaks, but that goes away after the break-in period. They don't come with a users manual though. Documentation is sparse. You have to get a third-party handbook.
Actually, this almost can't be anything but a good thing.
First of all, most OSes these days use a memory page size of 4k. Having your IO system page match your CPU page makes it much more efficient to DMA data and the like. Testing has shown that this is generally a helpful.
Second, RAID will benefit here. Larger blocks mean larger disk reads and writes. In terms of RAID performance, this is probably a good thing. Of course, the real performance comes from the size of the drive cache, but don't underestimate the benefit of larger blocks. Larger blocks mean the RAID system can spend more time crunching the data and less time handling block overhead. The fact that more data must be crunched for a sector write is of concern, but I'd bet it won't matter too much (it only really matters for massive small writes, not generally a RAID use case).
Third, (and EVERYONE seems to be missing this) some file systems DON'T waste slack space in a sector. Reiserfs (v3 and v4) actually takes the underused blocks at the end of the files (called the "tail" of the file) and creates blocks with a bunch of them crammed together (often mixed in with metadata). This has been shown to actually increase performance, because the tail of files are usually where they are most active and tail blocks collect those tails into often accessed blocks (which have a better chance of being in the disk cache).
Netware 4 did something called Block Suballocation. While not as tightly packed as Reiser tail blocks, it did take their larger 32kb or 64kb blocks (which were chosen to keep block addresses small and large file streaming faster) into disk sectors and storing tails in them.
NTFS has block suballocation akin to Netware, but Windows users are, to my knowledge, out of luck until MS finally addresses their filesystem (they've been putting this off forever). Windows really would benefit from tail packing (although the infrastructure to support it would make backwards compatability near impossible).
To my knowledge, ReiserFS is the only filesystem with tail packing. If you are really interested in this, see your replacement brain on the Internet.
Fourth, larger sectors means smaller sector numbers. Any filesystem that needs to address sectors usually has to choose a size for the sector addresses. Remember FAT8, FAT12, FAT16, and FAT32? Each of those numbers were the size of sector references (and thus, how big of a filesystem they could address). This will prevent us from needing to crank up the size of filesystem references eventually.
Finally, someone mentioned sector size issues with defragmenters and disk optimizers. These programs don't really care as long as all of the sectors on the system are the same size. Additionally, they could be modified to deal with different sector sizes. Ironically, modern filesystems don't really require defragmentation, as they are designed to keep fragments small on their own (usually using "extents"). Ext2, Ext3, Reiserfs and the like do this. NTFS does it too, although it can have problems if the disk ever gets full (basically, magic reserved space called the MFT gets data stored in it and the management information for the disk gets fragmented permenantly). If it weren't for a design choice (I wouldn't call it a flaw as much as a compromise) NTFS wouldn't really need defragmentation. ReiserFS can suffer from a limited form of fragmentation. However, v4 is getting a repacker that will actively defragment and optimize (by spreading out the free space evenly to increase performance) the filesystem in the background.
I really don't see how this can be bad unless somebody makes a mistake on backwards compatability. For those Linux junkies, I'm not sure about the IDE code, but I bet the SATA code will be overhauled to support it in a matter of weeks (if not a single weekend).
I think Mauve has the most RAM. --PHB (Dilbert Comic)
Wow, finally, a new block size, never heard of that idea before.
0 0/ul10k300.htm allows 512, 516, 520, 524, 528 but there are devices that do several steps between 128 and 2k or so...)
Doesn't anyone remember that SCSI-drives that support a changeable block-size are around since basically forever? Of course with harddisks it was used mostly to account for additional error-correcting / parity bits, but also magneto-optical media could be written with 512 or 2k (if I remember correctly).
(first hit I found: http://www.starline.de/en/produkte/hitachi/ul10k3
However, storing data in them can be a lot of effort (there are special institutions to help with that, called schools), and they are known to lose data every now and then. Moreover, there's often quite a bit lateny in reading data, and in some cases even repeated requests might not suffice to get at the data at all. The data reading speed isn't too fast either, and the writing speed is truly horrible. Moreover, they need years to completely start up (although some data can already be written and read during startup time), and they can't be switched off when you don't need them, because they won't restart again. Also, while they have a sleep mode, you cannot simply activate that. Usually it will only work at certain times, and even then they may refuse to go to sleep for quite some time. It seems, however, that many of them can be sent to sleep mode in the evening by sending them special large data streams (so-called bedtime stories). OTOH they must stay in sleep mode for quite some time to function properly, so don't even think of using them in a 24/7 application (although you have to prepare to support them 24/7, since sometimes they spontaneously end their sleep mode at unexpected times, and in that case they tend to demand for immediate maintenance).
All in all, they are not really a good replacement for a hard disk.
The Tao of math: The numbers you can count are not the real numbers.
You're all missing one key point. Your 512 byte sector is NOT 512 bytes on disk. The drive stores extra track/ecc/etc information. So a 4096-byte sector means less waste, more sectors, more useable space.
Tom
Someday, I'll have a real sig.
Uhmm... NO!
This is a quick and dirty hack to check that the generated data is correct. I'm not going to spend weeks designing a data file format, and an API plus conversion tools to export the files to an excel compatible format.just because I've got an inefficient file system.
A new hard drive would be a better investment. Or alternatively just ignore the problem since NTFS seems to hande these adequately.
And sometimes its simply impossible to write a solution that will work like this. Some applications require a large number of discrete files.
Also, 4 KB is the size of a page in the x86 architecture. Some operative systems would have problems (ie: they'd need to rewrite something) to handle block sizes bigger than 4 KB.
Wow, that's nice. Time to add a small cgi script to my webserver, and link it as an image:
Slight pedeantry:
Actually, it's 7200rpm, not rps. You get 120rps, so a platter rotation is actually 1/120 second.
It's like the film... I, DEMA... about an intelligent disk drive who err... needed to save the world *cough*
The revolution will not be televised... but it will have a page on Wikipedia
Now, offer me money.
Power too. Promise me that.
Offer me everything I ask for.
I want my 512 byte sectors back, you son of a bitch.
4Kbyte is the size of a page of memory on all modern architectures.
Huh? Which modern architectures?
The only systems I run that still have 4k page sizes are x86 systems.
x86-32 = 4k
x86-64 = 4k
G4,G5 = 4k
alpha (64bit) = 8k
sparc (64bit) = 8k
ia64 = 16k
and at least on the ia64 platform the page size is configurable at compile time.
In 1963, when IBM was still firmly committed to variable length records on disks, DEC was shipping a block-replacable personal storage device called the DECtape. This consisted of a wide magnetic tape wrapped around a wheel small enough to fit in your pocket. Unlike the much larger IBM-compatible tape drives, DECtape drives could write a block in the middle of the tape without disturbing other blocks, so it was in effect a slow disk. To make block replacement possible all blocks had to be the same size, and on the PDP-6 DEC set the size to 128 36-bit words, or 4608 bits. This number (or 4096, a rounder number for 8-bit computers) carried over into later disks which also used fixed sector sizes. As time passed, there were occasional discussions about the proper sector size, but at least once the argument to keep it small won based on the desire to avoid wasting space within a sector, since the last sector of a file would on average be only half full.
(Boot sector is one, so we start off odd right after boot sector. There are usually 2 FAT copies (even), so after FAT offset stays odd. For root directory size, there is usually no compelling reason to make it an even size, however usually Windows makes it an even size anyways, guaranteeing that start of cluster space stays odd).
So, to make a long story short, even if cluster size is a multiple of 4K, this wont help, because it is oddly aligned (meaning that each write of a 4K cluster would always straddle 2 sectors!)
Presumably, Windows will make appropriately parametrized FAT systems once these disks become available, but there will be implications when restoring old FAT images on the new drives.
BIOSes will also need to deal with these disks, or how will you be able to boot if you replace your old PC's hard disk with a 4K sector disk, while still keeping the old motherboard?
And even if the BIOS can deal with it, forget about dd'ing your old system over to the new disk, because of the FAT issue mentioned above.
Well, current Linux bootloaders probably deal with lack of space just fine. For example, GRUB installs itself as 512-byte stub loader ("stage 1") + the rest of the boot loader stored in an ordinary file in the filesystem ("stage 2"). I don't think GRUB's design will change much: It's meant to be so that stage 2 and the menu.lst can be updated without touching the boot block, anyway.
And it's probably not the OS or boot loader that sets limits to the boot block size, it's probably the BIOS that loads the stuff to memory...
Yeah the PSE bit (bit 4) in CR4, here's some info: http://www.ddj.com/documents/s=961/ddj9605n/
Slashdot is proof that Sturgeon's Law applies to mankind.
Not all operating systems use block/sector numbers at the device-driver level (and there are good arguments against it, though most OS's do it).
The Amiga used byte-offsets and lengths for all IO's. This did eventually cause problems when disk drives (which started at 10-20MB when the Amiga was designed) got to 4GB, but a minor extension allowing 64-bit offsets solved that. 64-bit offsets shouldn't overflow very soon....
For the device driver, it's no big deal to shift the offset if the sector size is a power-of-two, and it allows for weird-ass devices with non-power-of-two sector sizes (like old MAC SCSI drives), devices without a sector paradigm, etc all using the same API. Thus you can mount a 2048-byte block FS on a 512-byte sector device without knowing or caring; you can (with a cooperative device driver) mount a 512-byte FS on a 2048-byte sector device (if the device is willing to accept arbitrary-offset transfers, which they can, though it hurts speed), or mount a block-oriented FS on a bytestream-oriented device (like a file...).
Linux has supported media with 4K and 2K blocksize for some years (about 7 I think offhand). 2K media comes up with optical disks a lot.
You could easily have a "compatibility" mode where the interface returns 512 byte blocks even though its stored internally as 4096-byte blocks. You'd sacrifice performance, of course, but that probably not a huge issue when you're running legacy systems on newer hardware.