Slashdot Mirror


Changes in HDD Sector Usage After 30 Years

freitasm writes "A story on Geekzone tells us that IDEMA (Disk Drive, Equipment, and Materials Association) is planning to implement a new standard for HDD sector usage, replacing the old 512-byte sector with a new 4096-byte sector. The association says it will be more efficient. According to the article Windows Vista will ship with this support already."

19 of 360 comments (clear)

  1. Ah, error correction. by wesley96 · · Score: 5, Insightful

    Well, CD-ROMs use 2352 bytes per sector, ending up with 2048 actual bytes after error correction. Looking at the size of the HDDs these days a 4096-byte sector seems pretty reasonable.

    --
    Serving time in Aristotelean prison for violating laws of physics
    1. Re:Ah, error correction. by Ark42 · · Score: 5, Informative

      Hard drives do the same thing - for each 512 bytes of real data, they actually store near 600 bytes onto the disk with information such as ECC and sector remapping for bad sectors. There is also tiny "lead-in" and "lead-out" areas outside each sector which usually contain a simple pattern of bits to let the drive seek to the sector properly.
      Unlike CD-ROMs, I don't believe you can actually read the sector meta-data without some sort of drive-manufacturer-specific tricks.

    2. Re:Ah, error correction. by alexhs · · Score: 5, Informative

      Unlike CD-ROMs, I don't believe you can actually read the sector meta-data

      What are you calling meta-data ?
      CDs also have "merging bits", and what is read as a byte is in fact coded on-disk as 14 bits, and you can't read C2 errors either, that are beyond the 2352 bytes that really are all used as data on an audio CD, an audio sector being 1/75 of a second, 44100/75*2(channels)*2(bytes per sample) = 2352 bytes and it has correction codes in addition too. You can however read subchannels (96 bytes / sector)

      When dealing with such low-level technologies, reading bits on disk doesn't mean anything as there really are no bits on the disc, just pits and lands (CD) or magnetic particles (HD) causing little electric variations on a sensor, then no variation is interpreted as 0 and a variation is interpreted as a 1, and you need variations even when writing only 0's as a reference clock.

      without some sort of drive-manufacturer-specific tricks.

      Now of course, as you cannot change HD platters within different drive with different heads like you can do with a CD, each manufacturer can (and will !) encode differently. It has been reported that hard disks with the same reference wouldn't "interoperate" exchanging the controller part because of differing firmware versions, while the format is standardized for CDs or DVDs.

      they actually store near 600 bytes

      (that would be 4800 bits) In that light, they're not storing bytes, just magnetizing particles. Bytes are quite high-level. There are probably more than a ten thousands magnetic variations for a 512 byte sector. What you call bytes is already what you can read :) But there is more "meta-data" than that.

      Here's an interesting read quickly found on Google just for you :)

      --
      I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
  2. Re:That's nice by jcr · · Score: 4, Informative

    So... If I write down a little 16-byte message to myself in Notepad containing a name and a phone number, it will take up 4096 bytes.

    On most systems in use today, it already does.

    Blame the file system, not the sector size on the media.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  3. No, that's not 'sector' by wesley96 · · Score: 4, Informative

    You're thinking of 'cluster'. This is tied to the file system that is actually used on the disk. Even with the current 512-byte sector, a normal NTFS partition of, say, 200GB, uses 4KB cluster and a single file takes up a minimum of 4KB already.

    --
    Serving time in Aristotelean prison for violating laws of physics
  4. Re:Quick Explain How! by AngelofDeath-02 · · Score: 5, Interesting

    Best analogy is a gym locker room
    You have say, 10 lockers up and 20 lockers accross
    You can only put one thing in a locker, so you cant put your gym shorts in the same one as your shoes. But if you have lots of socks, you can pile them in, and take up two or three if neccessary.

    Space is wasted if you have a really big locker, but it's only holding a sock.

    Now, you've got to record where all of this stuff is, or you will take forever to find that sock. So you set asside a locker to hold the clipboard with designations.

    Now to bring this back into real life. There are a _lot_ of sectors on a disk. So keeping track of all of them starts requiring a substantial amount of resources. I imagine they are finding it easier to justify wasting space for small files in order to make it easier to keep track of them. Average file sizes are also going up, so it's not as big of a problem as it used to be either. It's all relative...

    --
    No, I am not an English major. My posts are subject to typos and incorrect grammar. Do not expect perfection.
  5. Good for small devices by BadAnalogyGuy · · Score: 4, Interesting

    Small devices like cellphones typically save files of several kilobytes, whether they be the phonebook database or something like camera images. Whether the data is saved in a couple large sectors or 8 times that many small sectors isn't really an issue. Either way will work fine, as far as the data is concerned. The biggest problem is the amount of battery power used to transfer those files. If you have to re-issue a read or write command (well, the filesystem would do this) for each 512-byte block, that means that you will spend 8 times more energy (give or take a bit) to read or write the same 4k block of data.

    Also, squaring away each sector after processing is a round trip back to the filesystem which can be eliminated by reading a larger sector size in the first place.

    Some semi-ATA disks already force a minimum 4096-byte sector size. It's not necessarily the best way to get the most usage out of your disks, but it is one way of speeding up the disk just a little bit more to reduce power consumption.

  6. In Vista already? by sinnerman · · Score: 5, Funny

    Well of course Vista will ship with this supported already. Just like WinFS...er..

  7. Re:Quick Explain How! by BadAnalogyGuy · · Score: 5, Funny

    I'm willing to sell this account for the right price.

  8. Re:Quick Explain How! by realcoolguy425 · · Score: 5, Funny

    I'm sorry, Your response has to be in some form of star-trek (or sci-fi) I would have accepted this however...

      Best analogy is Spock's gym locker room

    Spock has say, 10 space lockers up and 20 space lockers accross

    Spock can only put one thing in a locker, so Spock cant put his gym shorts in the same one as your shoes. But since Spock has lots of socks, He can pile them in, and take up two or three if neccessary.

    Space is wasted if Spock uses a really big locker, but it's only holding a sock.

    Now, you've got to record where all of this stuff is, or you will take forever to find that sock. (I guess the tricorders are broken) So Spock sets aside a locker to hold the clipboard with designations.

    Now to bring this back into real life. There are a _lot_ of sectors on a disk. So keeping track of all of them starts requiring a substantial amount of resources. I imagine they are finding it easier to justify wasting space for small files in order to make it easier to keep track of them. Average file sizes are also going up, so it's not as big of a problem as it used to be either. It's all relative...

  9. It's all about Format Efficiency by alanmeyer · · Score: 5, Informative

    HDD manufacturers are looking to increase the amount of data stored on each platter. With larger sector sizes, the HDD vendor can use more efficient codes. This means better format efficieny and more bytes to the end user. The primary argument being that many OSes already use 4K clusters.

    During the transition from 512-byte to 1K, and ultimately 4K sectors, HDDs will be able to emulate 512-byte modes to the host (i.e. making a 1K or 4K native drive 'look' like a standard 512-byte drive). If the OS is using 4K clusters, this will come with no performance decrease. For any application performing random single-block writes, the HDD will suffer 1 rev per write (for a read-modify-write operation), but that's really only a condition that would be found during a test.

  10. Boot sector virii by TrickiDicki · · Score: 5, Funny

    That's a bonus for all those boot-sector virus writers - 8 times more space to do their dirty deeds...

  11. Re:Cluster size? by scdeimos · · Score: 5, Informative
    I thought cluster sizes were already 4KB for efficiency, and LBA for larger drive sizes.
    Cluster sizes are variable on most file systems. On our NTFS web servers we tend to have 1k clusters because it's more efficient to do it that way with lots of small files, but the default NTFS cluster size is 4k. LBA is just a different addressing scheme at the media level to make a volume appear to be a flat array of sectors (as opposed to the old CHS or Cylinder Head Sector scheme).
  12. Re: Apple in the forground again by n.wegner · · Score: 4, Informative

    You could have added MS with FAT32 and NTFS. The problem is we're not talking about filesystem cluster sizes, which are software-configurable, but the disks' actual sector size, which is hardware that HFS+ has no effect on.

  13. How do you know how your data is actually stored ? by Horus1664 · · Score: 4, Insightful

    Modern DASD architecture is almost completely hidden from the user. In the (good?) old days system software needed to interface closely with the DASD and needed to understand the hardware architecture to gain maximum performance from the devices (I know because I work on such systems within an IBM mainframe environment on airline systems which require extremely high speed data access).

    Nowadays the disk 'address' of where the data actually resides is still couched in terms that appear to refer to the hardware itself but in 'serious' DASD subsystems (e.g. the IBM DS8000 enterprise storage systems )the actual way in which the hardware handles the data is masked from the operating system. Data for the same file is spread across many physical devices and some version of RAID is used for integrity.

    The 4096 value for data 'chunks' has to do with the most efficient amount of data that can be transmitted down a DASD channel (between a host and storage in large systems or the bus in self-contained systems)

    The idea of a 'file address' would cease to exist and it would be replaced by a generic 'data address' if it weren't for the in-built assumptions about data retrieval within all current Operating Systems.

  14. Re:What's the case for Linux? by Aggrav8d · · Score: 5, Funny

    I know I'm tired because I misread the first name as Inigo and the next thing through my head was

    "Hello. My name is Inigo Molnar. You changed the sectors. Prepare to die."

  15. Re:30 years doing what? by Derling+Whirvish · · Score: 5, Funny
    I have my eyes peeled for a bio-drive, something noxious smelling that you feed with potato rinds which stores your data directly in its DNA.

    That already exists. It's called a "child." Geeks might think they are hard to obtain, but in fact they tend to pop up unexpectedly quite often. They also have an audio interface, are touch-sensitive, run off of bio-mass fuel, and can even do the dishes after they have been around for a few years. They can be attached to a Playstation or an iPod too. When you first get them they are quite noisy and smelly with a few leaks, but that goes away after the break-in period. They don't come with a users manual though. Documentation is sparse. You have to get a third-party handbook.

  16. System Pages, RAID, Tail Blocks, and Addressing by KagatoLNX · · Score: 4, Insightful

    Actually, this almost can't be anything but a good thing.

    First of all, most OSes these days use a memory page size of 4k. Having your IO system page match your CPU page makes it much more efficient to DMA data and the like. Testing has shown that this is generally a helpful.

    Second, RAID will benefit here. Larger blocks mean larger disk reads and writes. In terms of RAID performance, this is probably a good thing. Of course, the real performance comes from the size of the drive cache, but don't underestimate the benefit of larger blocks. Larger blocks mean the RAID system can spend more time crunching the data and less time handling block overhead. The fact that more data must be crunched for a sector write is of concern, but I'd bet it won't matter too much (it only really matters for massive small writes, not generally a RAID use case).

    Third, (and EVERYONE seems to be missing this) some file systems DON'T waste slack space in a sector. Reiserfs (v3 and v4) actually takes the underused blocks at the end of the files (called the "tail" of the file) and creates blocks with a bunch of them crammed together (often mixed in with metadata). This has been shown to actually increase performance, because the tail of files are usually where they are most active and tail blocks collect those tails into often accessed blocks (which have a better chance of being in the disk cache).

    Netware 4 did something called Block Suballocation. While not as tightly packed as Reiser tail blocks, it did take their larger 32kb or 64kb blocks (which were chosen to keep block addresses small and large file streaming faster) into disk sectors and storing tails in them.

    NTFS has block suballocation akin to Netware, but Windows users are, to my knowledge, out of luck until MS finally addresses their filesystem (they've been putting this off forever). Windows really would benefit from tail packing (although the infrastructure to support it would make backwards compatability near impossible).

    To my knowledge, ReiserFS is the only filesystem with tail packing. If you are really interested in this, see your replacement brain on the Internet.

    Fourth, larger sectors means smaller sector numbers. Any filesystem that needs to address sectors usually has to choose a size for the sector addresses. Remember FAT8, FAT12, FAT16, and FAT32? Each of those numbers were the size of sector references (and thus, how big of a filesystem they could address). This will prevent us from needing to crank up the size of filesystem references eventually.

    Finally, someone mentioned sector size issues with defragmenters and disk optimizers. These programs don't really care as long as all of the sectors on the system are the same size. Additionally, they could be modified to deal with different sector sizes. Ironically, modern filesystems don't really require defragmentation, as they are designed to keep fragments small on their own (usually using "extents"). Ext2, Ext3, Reiserfs and the like do this. NTFS does it too, although it can have problems if the disk ever gets full (basically, magic reserved space called the MFT gets data stored in it and the management information for the disk gets fragmented permenantly). If it weren't for a design choice (I wouldn't call it a flaw as much as a compromise) NTFS wouldn't really need defragmentation. ReiserFS can suffer from a limited form of fragmentation. However, v4 is getting a repacker that will actively defragment and optimize (by spreading out the free space evenly to increase performance) the filesystem in the background.

    I really don't see how this can be bad unless somebody makes a mistake on backwards compatability. For those Linux junkies, I'm not sure about the IDE code, but I bet the SATA code will be overhauled to support it in a matter of weeks (if not a single weekend).

    --
    I think Mauve has the most RAM. --PHB (Dilbert Comic)
  17. Re:30 years doing what? by maxwell+demon · · Score: 5, Funny

    However, storing data in them can be a lot of effort (there are special institutions to help with that, called schools), and they are known to lose data every now and then. Moreover, there's often quite a bit lateny in reading data, and in some cases even repeated requests might not suffice to get at the data at all. The data reading speed isn't too fast either, and the writing speed is truly horrible. Moreover, they need years to completely start up (although some data can already be written and read during startup time), and they can't be switched off when you don't need them, because they won't restart again. Also, while they have a sleep mode, you cannot simply activate that. Usually it will only work at certain times, and even then they may refuse to go to sleep for quite some time. It seems, however, that many of them can be sent to sleep mode in the evening by sending them special large data streams (so-called bedtime stories). OTOH they must stay in sleep mode for quite some time to function properly, so don't even think of using them in a 24/7 application (although you have to prepare to support them 24/7, since sometimes they spontaneously end their sleep mode at unexpected times, and in that case they tend to demand for immediate maintenance).

    All in all, they are not really a good replacement for a hard disk.

    --
    The Tao of math: The numbers you can count are not the real numbers.