Linux Breaks 100 Petabyte Ceiling
*no comment* writes: "Linux has broken the barrier with the 100 petabyte ceiling, and
doing it at 144 petabytes." And this is even more impressive in pebibytes, too.
← Back to Stories (view on slashdot.org)
1e3 terabytes, or 1e6 gigabytes.
N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
http://oss.sgi.com/projects/xfs/features.html
XFS is a full 64-bit filesystem, and thus, as a filesystem, is capable of handling files as
large as a million terabytes.
263 = 9 x 1018 = 9 exabytes
In future, as the filesystem size limitations of Linux are eliminated XFS will scale to the
largest filesystems
Just a side note: BeOS has support for files up to 18 exabytes, not 18 petabytes, as stated in the article. This is roughly 18,000 petabytes, or 2^64 bytes.
Just wanted to set the record straight.
- Mike
FreeBSD had it first. For over a month. Read the committer CVS Logs and weep, penguin boys.
e v/ ata/ata-disk.c -> version 1.114
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/d
Well, it's good to see that Linux has caught up, but the article is not correct that Linux is the first OS to support 48-bit ATA; FreeBSD has had this support for over a month now.
See for example: this file which is one of the files containing the ATA-6r2 code, committed to FreeBSD on October 6.
http://www.cacr.caltech.edu/~roy/dataquan/
George W. Bush
President, United States of America
I'm looking at the Linux XFS feature page, which states:
My understanding is that the 2TB limit per block device (including logical devices) is firm (regardless of the word size of your architecture), and unrelated to what Mr. Hedrick did. Am I wrong? Does this limit disappear if you build the kernel on a 64-bit architecture?And, on 32-bit architectures, there's no way to get the buffer cache to address more than 16TB.
Actually you would go for FC (Fiber Channel) not SCSI. Go to http://www.fibrechannel.org for more information.
--
The world is divided in two categories:
those with a loaded gun and those who dig. You dig.
unless "recently" is 3 years ago... I can name at least on desktop OS which did that before.
BeOS.
The bank I work for currently stores 1.5 Tb a day worth of data. Almost none of it is ever looked at again, but a huge proportion of it is required by regulators. Of course this all goes on tape, since there is no requirement for speedy access.
2^48 blocks * 512 bytes/block = 144115188075855872 bytes
ayershome.org/users/eric
Before you start thumping your chest about how superior or cutting edge *Linux is, go look at these two links
A slashdot story pointing out how without the FreeBSD ATA code, the Linux kernel would be 'lacking'
The FreeBSD press release announcing the code is stable
If The Reg actually researched the story, Andy would have notice it is not a 'first' but more a 'dead heat' between the 2 leading software libre OSes. Instead, The Reg does more hyping of *Linux.
If it was said on slashdot, it MUST be true!
Well, according to the IEC standard, one petabyte is 10^15 (or 1e+15) bytes, while one pebibyte is 2^50 (or 1.125899e+15) bytes.
So 144 petabytes is 1.44e+17 bytes or 127.89769 pebibytes. Can't say that's more impressive tho. :P
SSH has done quite a bit of work to support +2GB files. As always, the following will and always has worked:
:-)
cat file | ssh user@host "cat > file"
More recent builds of SCP will also support +2GB, so:
scp file user@host:/path
or
scp file user@host:/path/file
will both work.
In fact, probably the best way for syncing two directories is rsync. Rsync's major weakness is that it's *tremendously* slow for large numbers of files, and I believe it has to read every byte of a large file before it can incrementally transfer it(so you're looking at 2GB+ of reading before transfering). The following will do rsync over ssh:
rsync -e ssh file user@host:/path/file
rsync -e ssh -r path user@host:/path
For incremental log transfers, I actually had a system built that would ssh into the remote side, determine the filesize of the remote file, and then tail from the total file size minus the size of the remote file. It was a bit messy, but it was incredibly reliable. Did have problems when the remote logs got cycled, but it wasn't too ugly to detect that remote filesize was smaller than localfilesize. Just a shell script, after all.
SFTP should, as far as I know, handle 2GB+ without a hitch.
Both SCP and SSH of course have compression support in the -C tag; alternatively you can pipe SSH through gzip.
Email me for further info; there's some SSH docs onto my home page as well. Good luck
--Dan
www.doxpara.com
Suppose you copy at full PCI bus speed: 133 Megabytes per second. Said backup would take about 34 years.
BTW, it may also re-open the debate:
Trolling using another account since 2005.
I'm already fed up of the time it takes to back up large disks to tape. Drive transfer rate has not improved at the rate of disk capacity in the last few years and is becoming a bottleneck. It was unimportant when the backup time of a single disk was well below one hour (our Ultrium tapes give about 40Gb/hour).
Just figure that if you want to transfer 144PB in about one day, you need a transfer rate of the order of 1TB/s. Electronics is far from there since it means about 10 terabits/second. Even fiber is not yet there. Barring a major revolution, magnetic media and heads can't be pushed that far. At least it is way further than the foreseeable future.
Don't get me wrong, it is much better to have more address bits than needed to avoid the painful limitations of 528 Mb, 1024 cylinders etc... But, as somebody who used disks over 1Gb on mainfranmes around 1984-1985, I easily saw all the limitations of the early IDE interfaces (with the hell of CHS addresses and its ridiculously low bit numbers once you mixed the BIOS and interface limitations) and insisted on SCSI on my first computer (now CHS is history thanks to LBA, but the transition has been sometimes painful).
However, right now big data centers don't always use the biggest drives because they can get more bandwidth by spread the load on more drives (they are also slightly wary of the greatest and latest because reliability is very important). Backing up starts to take too much time,
In short, the 48 bit block number is not a limit for the next 20 years or so. I may be wrong, but I'd bet it'll take at least 15 years, perhaps much more because it is too dependent on radically new technologies and the fact that the demand for bandwidth to match the increase in capacity will become more prevalent. Increasing the bandwidth is much harder since you'll likely run into noise problems, which are fundamental physical limitations.
This obviously mattered to the people who implemented it. If you'd rather see development move in a different direction, by all means, write some code that you feel is useful.
See, the people who implemented this probably don't give a damn what you feel is important, they care about what they feel is important.
It's really very simple, put up or shut up.
This limit is for a SINGLE IDE disk. Now, if you use Logical Volume Management (which is in the standard 2.4 kernel, no patches required) you can combine multiple disks into one.
Since my machine has 2 IDE controllers, with 2 buses each, and 2 drives per bus, you could make a system with 8 144 pB drives, put an XFS partition on it, and have 1152.92 pB of storage.
And for meaningless statistics sake: I make my MP3s (from CDs that I own, thankyouverymuch) at an average of 160 kb/sec. At that rate, the specified drive array would store 1826693 YEARS of MP3s. None of which would be Brittany Spears.
www.eFax.com are spammers
What I am really wondering is: is there at the current moment ANY company/application/whatever that required this amount of storage? I thought that even a large bank could manage with a few TB's
:-)
Not intended as a flame, just interested
I work at a large credit card bank (we're the largest issuer of VISA cards, and our analytic data store is in the top 500 supercomputer sites). Our main Oracle data warehouse has about 38 TB of tablespace in use. It'll be awhile before we need drives with PB capacity.
NATIONAL VIRTUAL OBSERVATORY TO PUT UNIVERSE ONLINE
The National Science Foundation has earmarked $10 million for the
development of a National Virtual Observatory (NVO), a single,
searchable database of astronomical knowledge culled from
observatories. The current total volume of astronomical information
comprises roughly 100 terabytes, and scientists predict this number
will swell to over 10 pentabytes by 2008. Caltech computer scientist
Paul Messina said that a single repository for this vast amount of
data is essential, otherwise, "we will end up like shipwrecked
sailors on a desert island, surrounded by an ocean of salt water
and unable to slake our thirst." The goal of the project is to be
able to conduct intricate computations by using the NVO to leverage
the computing power of 17 research databases.
(Newsbytes, 30 October 2001)
The Register updated their article. It now acknowledges FreeBSD as being the first Unix to support multi-petabyte filesizes.
However, NTFS 5.0 (the filesystem that is used by Windows 2000) has had 64-bit addressing since Windows 2000 was released. This yields a maximum capacity of 16 exabytes, which is 8388608 Petabytes. That's right, Windows has supported files eighty thousans times larger than Linux with an experimental patch for the past few years. Still, by the time people actually start needing this kind of storage, I don't think it'll actually matter much...