Long Block Data Standard Finalized

Higher Reliability? by MankyD · 2007-05-01 09:27 · Score: 3, Insightful

How does larger block sizes result in better reliability? Intuitively, I would almost think the opposite, since a single byte corruption means a much larger block is now erroneous. I obviously am missing something though.

--
-dave
http://millionnumbers.com/ - own the number of your dreams

Re:Higher Reliability? by silas_moeckel · 2007-05-01 09:38 · Score: 3, Insightful

I would think it has to do with the ability to have more bits for ecc type functions. Blocks would need to be terminated somehow so there is a fixed overhead per block. Reducing this overhead by a factor of 8 would leave more room for a larger parity type field and the more bits in there the larger failure that it can detect, fix and relocate. This would all assume they will not use the new space to push up sizing. Course this is all my rather speculative guesswork.

--
No sir I dont like it.
Re:Higher Reliability? by 5pp000 · 2007-05-01 09:41 · Score: 5, Informative

The longer block sizes add reliability because the error correcting codes have more to work with at a time (more data bits, but also more ECC bits).
As for wasted space, that's under the filesystem's control, not the drive's.

--
Your god may be dead, but mine aren't!
Re:Higher Reliability? by msauve · 2007-05-01 09:42 · Score: 2, Interesting

I don't have access to the actual standard, but would guess that they're really claiming more reliability for the same storage capacity, not more reliable in absolute terms.

They can take what would have been per-block overhead with smaller sector sizes and reuse that data space for more robust error correcting codes, while maintaining the same capacity.

But, good question, since in terms of absolute reliability I can't picture anything in the current spec which would prevent private (not visible at the interface level) methods from being used (RAID within a drive?) with current drives.

--
"National Security is the chief cause of national insecurity." - Celine's First Law
Re:Higher Reliability? by GooberToo · 2007-05-01 11:16 · Score: 2, Insightful

I don't have access to the actual standard, but would guess that they're really claiming more reliability for the same storage capacity, not more reliable in absolute terms.

In the real world this translates into, "more reliability". Reliability has always been relative to dollars spent. This means given the same dollars you are more reliabile. This means, given absolute dollars, you are more reliable.
Re:Higher Reliability? by cerberusss · 2007-05-01 23:38 · Score: 2, Funny

As for wasted space, that's under the filesystem's control, not the drive's.
I use a raw device, you insensitive clod!

--
8 of 13 people found this answer helpful. Did you?

Why 4096? by MBCook · 2007-05-01 09:27 · Score: 3, Insightful

Is there a good reason why 4096 was chosen? Is that just an artifact of this being designed in 2000? At this point very few files on the average system would be smaller than this. It seems to me they could have quite safely chosen something like 16k which would have improved things more, future proofed them more, yet still have been small enough as to not waste a tremendous amount of space (like if they chose 512k).

Why not make it variable, in that each drive can have it's own value (limited to a power of 2, between 512 and say 512k)? That way one drives today could be 4k, with drives in a few years being more without requiring another 7 years for a new standard?

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

Re:Why 4096? by 42forty-two42 · 2007-05-01 09:38 · Score: 5, Insightful

Operating systems tend to use 4096-byte blocks already, as that's the size of a memory page on x86 and amd64. If you were to require 16kb transfers, then the block cache would have to start allocating contiguous four-page groups for DMA transfers and the like, which could be difficult if memory is fragmented; in comparison, pages are the basic allocation unit for RAM, so 4kb's easy to find.
Re:Why 4096? by AKAImBatman · 2007-05-01 09:43 · Score: 4, Insightful

Parent is correct. Pretty much every paging-capable microprocessor in existence uses 4K memory blocks, thus why they're the natural size for a hard disk. In the x86 world, the next step up is 4MB blocks. Burst performance of modern hard disks is quite good, but I have to wonder if 4MB blocks would be helpful or harmful to overall system performance? It might reduce the number of pages, that's for sure.

--
Javascript + Nintendo DSi = DSiCade
Re:Why 4096? by geekoid · 2007-05-01 09:46 · Score: 5, Funny

yeah, sure. Give a logical AND knowledgable answer.
Way to ruin the curve.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:Why 4096? by 42forty-two42 · 2007-05-01 09:54 · Score: 3, Informative

Using 4MB blocks for everything would kill memory performance - and more specifically, mmap performance. Each library loaded in your system would require at least 4MB of ram - probably more, as they have code, data, and zeroed data segments. Additionally, each process would require another 4MB*n. There's no gain for doing this either, except under specialized circumstances, as the OS can already request a batch of sectors from the drive in one operation.
Re:Why 4096? by Afecks · 2007-05-01 10:58 · Score: 2, Funny

It might reduce the number of pages, that's for sure

You may be sure that it might but I'm unsure that it won't...
Re:Why 4096? by Scott+Wood · 2007-05-01 11:20 · Score: 3, Informative

No, x64 and ARM both use 4K pages (though ARM has 1K subpages that you can set permissions on individually). Alpha and sparc64 use 8K pages, though.

Thats a lot a bits by rambag · 2007-05-01 09:31 · Score: 5, Funny

Yeah why 4092 bytes? Why not 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 bytes? It seems to me to be the best option

Re:Sounds like a good idea to me. by WarwickRyan · 2007-05-01 09:31 · Score: 2, Interesting

> NOTHING is 512 bytes anymore.

Shortcuts can easily be 512 bytes long.

For example I've got a shortcut to a file on C:\, which is 391 bytes actual size, 4096 bytes on the disk.

Re:Oh noes! by drinkypoo · 2007-05-01 09:33 · Score: 2, Informative

Actually, they're going to take up eight times as much space... YOU FAIL IT! They will waste 3636b space unused in blocks, however, instead of only 112 bytes, so they'll be wasting over 32 times as much space. But then, won't ReiserFS already store multiple files in a single block in some cases?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:Sounds like a good idea to me. by garett_spencley · 2007-05-01 09:38 · Score: 2, Funny

NOTHING is 512 bytes anymore

Unless you've got a powerful fetish for ASCII pr0n

Re:Discussed Since 2000? by RingDev · 2007-05-01 09:39 · Score: 4, Insightful

Saying 4096 was probably the easy part. Of course someone probably had to research what the largest (time efficient) and smallest (space efficient) block size would give the greatest advantage in space/time for current average files. But eventually you get into the issue of working with Hard Drive manufacturers who likely have to redesign some circuits and controls _from scratch_, BIOS developers who have to recode to detect and support two different standards, and OS/Driver developers who also have to deal with any low level changes...

You're talking about interacting with likely hundreds of companies trying to come up with a single standard that 1) they can all agree on and 2) won't make any of them lose money. Good luck.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

Re:Sounds like a good idea to me. by Animaether · 2007-05-01 09:42 · Score: 2, Informative

In addition, it doesn't matter whether the file is less than 512 or, in this case, 4096 bytes. What matters is if the 'size % block_size' is non-zero. I.e. let's say the file is 4090 bytes. It will fit just fine, and you'll only waste 6 bytes. Now the file is 4100 bytes, only 4 bytes over. Except now you need 2 blocks, and thus waste 4092 bytes.

Sure, on a multi-GB file that's not going to matter too much, as even on a TB drive you can only have a few hundred of those, and who's going to miss that 1MB?
However, there's plenty of other files that hover between 1k and 10k, 10k and 100k, 100k and 1MB where those tiny fractions do add up.

That said, GP is still right. Say you do have a TB drive.. unless you only have a few free MB left, you're not going to worry too much about the losses from block sizes.

What about the MBR? by QuantumG · 2007-05-01 09:46 · Score: 4, Funny

Trying to fit an entire virus into 512 bytes was always a challenge.. but 4096 bytes? That's too easy!

--
How we know is more important than what we know.

Re:What about the MBR? by Godji · 2007-05-01 11:44 · Score: 3, Interesting

The parent raises a point though: now that we have bigger sectors, are we finally getting a standard for partition tables with more than 4 entries without using logical partitions?
Re:What about the MBR? by ottffssent · 2007-05-01 13:19 · Score: 3, Informative

The word you're looking for is GPT. It has nothing to do with 4k hardware sectors, but it does support up to 128 partitions. Which ought to be enough for anybody (says the man with a 1 average number of partitions per disk in his household).

--
High-speed Road Trip (18.000KPH)
Re:What about the MBR? by shawnce · 2007-05-01 13:21 · Score: 2, Informative

Extensible Firmware Interface (EFI) supports GUID Partition Table (GPT)

Plan for the future! by operagost · 2007-05-01 09:48 · Score: 2, Funny

These kinds of incremental standards are simply not forward-looking! I propose that the data block size be set to a minimum of 2^32 bytes.

--

Gamingmuseum.com: Give your 3D accelerator a rest.

Error correction better over larger blocks by EmbeddedJanitor · 2007-05-01 09:51 · Score: 4, Informative

If you're working with a certain number of ecc bits per data bit, then the number of corrections you can perform increases with an increased data block size. Oversimplifying, just for explanation here:

Let's suppose you can fix one error per 512 byte block or 6 errors per 4096 byte block. Intuitively that might seem like a step back because 6/8 is smaller than 1, but that is not so. If you have 512-byte blocks and get two errors in a 512-byte sequence then that block is corrupt. However if instead you're using 4096 byte blocks then a 512-byte sequence within that block can have two errors since we can tolerate up to 6 errors in the whole block.

Or put another way, consider a 4 k sequence of data, represented by a sequence of digits dependent on the number of errors in each 512 bytes. 00000000 means no errors, 03010000 means 3 errors in the second block and 1 in the fourth block (ie a total of 4 errors in the whole 4096 bytes). With a scheme that can fix only one error per 512 bytes, the block with 3 errors cannot be corrected (because 3 > 1), but in the system which fixes up to 6 errors per 4096, the errors can be fixed because 4 6. This means that the ECC is far more reliable.

--
Engineering is the art of compromise.

Re:Error correction better over larger blocks by hamanu · 2007-05-01 11:21 · Score: 2, Informative

OK, yes you COULD move the parity dta around but you'd get shitty performance. Hard drives are made so that each sector is independent of another. That makes each sector a seperate codeword on disk. What you are proposing is to introduce dependency between sectors, and that would mean having to read adjacent sectors in order to write a single sector, which means goin through 2 revolution of the disk instead of one.

--
every _exit() is the same, but every clone() is different.

CD error recovery unrelated to block size by _Shorty-dammit · 2007-05-01 09:56 · Score: 2, Informative

Block size has absolutely nothing to do with how much redundancy you can build in, and I fail to see the logic in assuming so. Makes absolutely no sense. The 2048 bytes stored on a sector of a CD only refers to your data, and absolutely none of them have anything to do with the CD's error-correction mechanisms. They add lots of extra bits to make up their error-correction, over and above your 2048 bytes of data. But, the point is it doesn't matter how much space you reserve to hold user data, you can arbitrarily reserve any amount of space you want for error-correction bits. You can have 16-byte sectors with 16MB of error-correction. Now, *that* would be a lot of redundancy. But certainly something you could do if you want to, and there's not going to be very many people arguing that those 16-byte sectors weren't covered by much redundancy. I doubt anyone would ever use that much redundancy, obviously, but it's just an outrageous example to show that the amount of redundancy has absolutely nothing to do with how much user data is stored per sector.

Re:CD error recovery unrelated to block size by hamanu · 2007-05-01 10:38 · Score: 2, Informative

the rate of a code measure how much redundundacy it has, correct. But why do you think block length doesn't matter? Just because you have high redundancy doesn't mean your errors are going to magically be recoverable. To actually recover the data you need enough distance between valid codewords so that when a codeword is perturbed by errors you can still see which valid codeword it is closest to. With short block lengths you get small decoding distances, and low error correcting power. If you learn information theory a bit better you'll see Claude Shannon's channel "capacity" theory assumes infinite block length, and it does that for a REASON.

--
every _exit() is the same, but every clone() is different.
Re:CD error recovery unrelated to block size by hamanu · 2007-05-01 10:48 · Score: 2, Informative

I guess I should pre-emptively point out that for a hard drive you want to be able to modify each sector atomically, which means that a single sector corresponds to a single codeword, and increasing areal density means you need longer codewords to maintain error correction. So either you decrease the rate of the code, and use extra redundancy, which lower capacity and defeats the purpose of increasing areal density, or you us longer codewords at the same rate, which means using longer sectors.

--
every _exit() is the same, but every clone() is different.
Re:CD error recovery unrelated to block size by ElecCham · 2007-05-01 15:56 · Score: 5, Interesting

I can speak with some authority on this - I work for one of those aforementioned hard-drive manufacturers, and have been doing a small amount of work on this exact thing.

The easy answer is this: in order to do ECC-like data checking on a larger set of data (say, a group of eight 512-byte sectors), it means that if you want to write sector three of that eight, you end up having to re-read the whole thing before you do anything else - thus basically giving you 4,096-byte "sector" anyway.

The other half of that answer is this: do you know what the "real" storage capacity of a CD is, without all the error checking? It's a bit less than double. Even most of the enterprise folks wouldn't accept a 40% hit in data density in return for what works out to not that big an increase in reliability (data redundancy doesn't buy you that much unless that data is on different spindles). They'd just rather get the whole data space and do a RAID, especially since that's what they're going to do anyway.

--
Sig broken, watch for .finger

Well you're already wasting your disk space.... by EmbeddedJanitor · 2007-05-01 09:57 · Score: 2, Funny

...if you have Windows loaded.

--
Engineering is the art of compromise.

Bootloader now 4096 bytes? by Kjella · 2007-05-01 10:01 · Score: 3, Interesting

Did the space for the bootloader just increase to 4096 as well? For those unaware, the BIOS loads just the first sector of the disk into memory, the bootloader takes it from there. It would certainly let them get a lot more resilient, now they only barf if things are not as expected.

--
Live today, because you never know what tomorrow brings

Re:Oh noes! by wexsessa · 2007-05-01 10:09 · Score: 2, Interesting

With some probably minor inconvenience, you could fix that by using a Zipped archive. And someone will likely come up with a low-impact solution based on that.

Longer != Better by snoyberg · 2007-05-01 10:16 · Score: 4, Funny

I have to disagree with the whole premise here. I know that people always say that longer is better when it comes to hard drives, but I've never had any reliability problems with my smaller one. Not only that, but I've had very fast transfer rates under all sorts of strenuous loads.

Wait, we're talking about storage devices? Never mind...

--
Thank God for evolution.

Re:Sounds like a good idea to me. by Anonymous+Cowpat · 2007-05-01 10:22 · Score: 2, Informative

My HTPC has hundreds of files that are an average of 1 gigabyte and quite often, twice that size.
So... 2 gigabytes?

--
FGD 135

Re:Oh great by avxo · 2007-05-01 10:40 · Score: 4, Informative

Now when I want to update just 256 bytes, instead of reading 512 bytes, changing 256 of them, and writing 512 back, I now have to do this with 4096 bytes. So I end up transferring 3584 more bytes than I otherwise needed to.

So, your O/S requires that you issue all read and write operations using the hard drive's native block size? That must suck. What else must you do? Setup DMA manually in your app? Solder a microcontroller onto the board perhaps? Sarcasm aside, you seem to have a fundamental misunderstanding of what this change achieves, who it will affect, and how. Other posters have addressed those very issues eloquently, so I won't go into that.

They really could do this transparently. Let the driver write anything in any range.

Sorry to burst your bubble but it already is done transparently. The O/S lets you write anything -- from a single byte, to gigabytes -- transparently; all you do is tell the O/S read n bytes of file F so and so into buffer at x, or write m bytes from buffer at y into file F, which is the interface that 99% of programmers use. And after what you wrote above, I find it hard to believe that you are writing the specialized software, low-level drivers and/or controller microcode that could potentially be affected by this change.

Re:Sounds like a good idea to me. by Dan+Ost · 2007-05-01 10:46 · Score: 4, Insightful

If that kind of lossage bothers you, use a file system that can pack multiple file tails into the same block (reiserfs for sure, ext4 will too, I think). If you've got lots of small files, the impact can be surprising (my portage tree shrunk by about 100MB just by moving it from ext3 to reiserfs!). I've never noticed a difference anywhere else, however.

--

*sigh* back to work...

blocks and clusters by ceroklis · 2007-05-01 11:43 · Score: 4, Informative

To all the posters complaining about the loss of space when they will be forced to use 4096 instead of 512 bytes to store their 20 bytes file:

The cluster size (unit of disk space allocation for files) need not be equal to the physical block size. It can be a multiple or even a fraction of the physical block size. It is fairly probable that you are already using 4K clusters (or bigger), so this will not change anything. This is for example the case if you have an NTFS filesystem bigger than 2GB.
Not all filesystems waste space in this manner. Reiserfs or EXT3 can pack several small files in a "cluster" .

Re:blocks and clusters by ceroklis · 2007-05-01 11:45 · Score: 2, Informative

s/EXT3/EXT4/

Re:Discussed Since 2000? by HtR · 2007-05-01 11:46 · Score: 4, Funny

Creating new standards takes time. After some searching, I found the minutes from their annual meetings since they started in 2000.

2001 Chair: "How about we double it?" Vote: Nay

2002 Chair: "How about we triple it?" Vote: Nay

2003 Chair: "How about 4x?" Vote: Nay

2004 Chair: "How about 5x?" Vote: Nay

(minutes from intervening years were tragically lost)

2007 Chair: "How's about 8x?" Vote: Yay

--
Have you tried turning it off and on again?

Re:Sounds like a good idea to me. by EvanED · 2007-05-01 11:51 · Score: 2, Informative

Ask Wikipedia

It's in the table "Allocation and layout policies". Look at both tail packing and block suballocation.

There are a few others that do, but not many. (JFS, QFS, NWFS, and VMFS are marked yes; NTFS and ZFS are marked partial.)

Thank you, Captain Obvious! by billcopc · 2007-05-01 12:09 · Score: 2, Interesting

It's about effing time!

512 bytes was good for floppy disks. I think we should have started upping the sector size around the same time as we hit the 528mb 1024-cylinder limit back in the early 90's. Considering that a modern hard drive has anywhere from one-half to two billion sectors, and that's some serious overhead for no reason. Error-correction is "easier" if it's spread over larger blocks. Why ? Because most files are quite large, and corrupting a 512 byte chunk is just as bad as corrupting a 4096 or 8192 byte chunk, because it's hosing the file either way. Might as well pool the ECC together and offer better protection for the large block, while still wasting less bits than the sum of all the small sectors' ECC. Even without the proposed ECC algorithm overhaul, larger blocks would allow more usable data per platter.

The downside is that we've had 512 byte sectors for so long, everyone's hardcoded the number in their apps and drivers. The biggest risk involved is to patch all that software... one little glitch could hose a ton of data.

--
-Billco, Fnarg.com

No, the logic is not flawed. by EmbeddedJanitor · 2007-05-01 12:56 · Score: 2, Informative

Consider it this way

Let's say you have 4096 bytes arranged as 8x512-byte blocks and each block can correct one error. Now lets say that we RANDOMLY (ie statisticly independently) introduce, say, 4 errors into that set of 8 blocks. Sometimes the errors will fall so that there are at most one error per block. That is correctable. Sometimes the errors will fall so that there are more than one per block. In that case data will be lost.

However, if we can correct up to, say, 6 arbitrarily placed errors per 4096 bytes we can then have 4 errors anywhere in that block and we won't lose data. It does not matter whether they are spread out or clustered together we can always handle those errors.

This makes for stronger correction.

--
Engineering is the art of compromise.

Slashdot Article in 2010 by /dev/trash · 2007-05-01 13:35 · Score: 2, Funny

Debian Finally Supports Long Block Data

For USB hard drives? by tepples · 2007-05-01 15:54 · Score: 2, Interesting

use a file system that can pack multiple file tails into the same block Which mainstream operating system can read such a file system? Or should I just abandon tail packing for use on a USB hard drive that will be used with multiple operating systems, at least one of which was made by Microsoft?

Re:Sounds like a good idea to me. by PlusFiveTroll · 2007-05-01 21:14 · Score: 2, Interesting

What's so hard about that?

Go read the Linux Kernel mailing list, and you'll find interactions between the block layer and the virtual memory are one of the most difficult things to make work right in an operating system. The size of the block on the hard disk matters most to the driver, its mostly transparent to the rest of the operating system. The only thing it changes on actual file systems is the minimum filesystem block should be 4K minimum.

Re:Oh great by Skapare · 2007-05-02 04:46 · Score: 2, Interesting

If it's already transparent, then what is the big deal? If what you say is true, they could make blocks/sectors as long as they want and we won't need to know (except the driver writers need to know what constraints exist in the interface to send the read and write commands to the drive).

Sorry to bust YOUR bubble, but I do know how the OS works, and how it's interface works. The issue depends on what blocksize the commands between driver and hardware require. If you cannot instruct the hardware at a finer resolution than 512 bytes at a time, then writing 256 bytes really does mean that the OS at some layer (maybe in the driver, maybe at a higher layer, depending on the OS) will read 512 bytes, update 256 of them, and write 512 back. If that interface is being changed to require 4096 bytes minimum per I/O operation, then it really does increase the needed transfer work going on between the driver and hardware.

My post was meant in part to be humorous, and in part to raise an issue. The issue is the transparency of the driver to hardware interface. I do know from random encounterings that for IDE it really does require the driver do I/O operations in multiples of 512. That really does affect the 256 byte I/O request from userspace, but it's not really a serious request due to the caching nature of modern operating systems. And there is no reason in the world why they cannot have been doing 4096 byte or longer blocks for years or decades. If they have been doing it, there's no news (but that's typical Slashdot, so how would we know). Given the I/O command interface is in units of 512 bytes, it's probably convenience that whatever long block size the drive uses be an exact whole multiple of 512 bytes, even that is not essential. A couple decades ago I wrote a "driver" for a mainframe OS that handled I/O requests in units of 4096 bytes, with an on-disk blocksize of 18432 bytes. If you do your arithmetic, you can see that is 4.5x. So some I/O request blocks end up spanning physical blocks on disk. No big deal.

Now I have not been following PC hardware technology very much, so I don't know how much has been added to the I/O interface capability. If they have added a new byte-level I/O command set, then fine. Do you even know if they have?

--
now we need to go OSS in diesel cars

Slashdot Mirror

Long Block Data Standard Finalized

47 of 199 comments (clear)