Linux Breaks 100 Petabyte Ceiling
*no comment* writes: "Linux has broken the barrier with the 100 petabyte ceiling, and
doing it at 144 petabytes." And this is even more impressive in pebibytes, too.
← Back to Stories (view on slashdot.org)
http://oss.sgi.com/projects/xfs/features.html
XFS is a full 64-bit filesystem, and thus, as a filesystem, is capable of handling files as
large as a million terabytes.
263 = 9 x 1018 = 9 exabytes
In future, as the filesystem size limitations of Linux are eliminated XFS will scale to the
largest filesystems
Just a side note: BeOS has support for files up to 18 exabytes, not 18 petabytes, as stated in the article. This is roughly 18,000 petabytes, or 2^64 bytes.
Just wanted to set the record straight.
- Mike
Before you start thumping your chest about how superior or cutting edge *Linux is, go look at these two links
A slashdot story pointing out how without the FreeBSD ATA code, the Linux kernel would be 'lacking'
The FreeBSD press release announcing the code is stable
If The Reg actually researched the story, Andy would have notice it is not a 'first' but more a 'dead heat' between the 2 leading software libre OSes. Instead, The Reg does more hyping of *Linux.
If it was said on slashdot, it MUST be true!
Well, according to the IEC standard, one petabyte is 10^15 (or 1e+15) bytes, while one pebibyte is 2^50 (or 1.125899e+15) bytes.
So 144 petabytes is 1.44e+17 bytes or 127.89769 pebibytes. Can't say that's more impressive tho. :P
SSH has done quite a bit of work to support +2GB files. As always, the following will and always has worked:
:-)
cat file | ssh user@host "cat > file"
More recent builds of SCP will also support +2GB, so:
scp file user@host:/path
or
scp file user@host:/path/file
will both work.
In fact, probably the best way for syncing two directories is rsync. Rsync's major weakness is that it's *tremendously* slow for large numbers of files, and I believe it has to read every byte of a large file before it can incrementally transfer it(so you're looking at 2GB+ of reading before transfering). The following will do rsync over ssh:
rsync -e ssh file user@host:/path/file
rsync -e ssh -r path user@host:/path
For incremental log transfers, I actually had a system built that would ssh into the remote side, determine the filesize of the remote file, and then tail from the total file size minus the size of the remote file. It was a bit messy, but it was incredibly reliable. Did have problems when the remote logs got cycled, but it wasn't too ugly to detect that remote filesize was smaller than localfilesize. Just a shell script, after all.
SFTP should, as far as I know, handle 2GB+ without a hitch.
Both SCP and SSH of course have compression support in the -C tag; alternatively you can pipe SSH through gzip.
Email me for further info; there's some SSH docs onto my home page as well. Good luck
--Dan
www.doxpara.com
Suppose you copy at full PCI bus speed: 133 Megabytes per second. Said backup would take about 34 years.
I'm already fed up of the time it takes to back up large disks to tape. Drive transfer rate has not improved at the rate of disk capacity in the last few years and is becoming a bottleneck. It was unimportant when the backup time of a single disk was well below one hour (our Ultrium tapes give about 40Gb/hour).
Just figure that if you want to transfer 144PB in about one day, you need a transfer rate of the order of 1TB/s. Electronics is far from there since it means about 10 terabits/second. Even fiber is not yet there. Barring a major revolution, magnetic media and heads can't be pushed that far. At least it is way further than the foreseeable future.
Don't get me wrong, it is much better to have more address bits than needed to avoid the painful limitations of 528 Mb, 1024 cylinders etc... But, as somebody who used disks over 1Gb on mainfranmes around 1984-1985, I easily saw all the limitations of the early IDE interfaces (with the hell of CHS addresses and its ridiculously low bit numbers once you mixed the BIOS and interface limitations) and insisted on SCSI on my first computer (now CHS is history thanks to LBA, but the transition has been sometimes painful).
However, right now big data centers don't always use the biggest drives because they can get more bandwidth by spread the load on more drives (they are also slightly wary of the greatest and latest because reliability is very important). Backing up starts to take too much time,
In short, the 48 bit block number is not a limit for the next 20 years or so. I may be wrong, but I'd bet it'll take at least 15 years, perhaps much more because it is too dependent on radically new technologies and the fact that the demand for bandwidth to match the increase in capacity will become more prevalent. Increasing the bandwidth is much harder since you'll likely run into noise problems, which are fundamental physical limitations.
This obviously mattered to the people who implemented it. If you'd rather see development move in a different direction, by all means, write some code that you feel is useful.
See, the people who implemented this probably don't give a damn what you feel is important, they care about what they feel is important.
It's really very simple, put up or shut up.