Slashdot Mirror


Linux Breaks 100 Petabyte Ceiling

*no comment* writes: "Linux has broken the barrier with the 100 petabyte ceiling, and doing it at 144 petabytes." And this is even more impressive in pebibytes, too.

23 of 330 comments (clear)

  1. Re:Forgot my Greek by BitwizeGHC · · Score: 3, Informative

    1e3 terabytes, or 1e6 gigabytes.

    --
    N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
  2. XFS by starrcake · · Score: 5, Informative

    http://oss.sgi.com/projects/xfs/features.html

    XFS is a full 64-bit filesystem, and thus, as a filesystem, is capable of handling files as
    large as a million terabytes.

    263 = 9 x 1018 = 9 exabytes

    In future, as the filesystem size limitations of Linux are eliminated XFS will scale to the
    largest filesystems

    1. Re:XFS by Anonymous Coward · · Score: 1, Informative

      You make it sound like XFS has been doing this for a while. But no:

      In future, as the filesystem size limitations of Linux are eliminated XFS will scale to the largest filesystems

      Before this, you couldn't access drives bigger than 128GB, and a 64-bit filesystem wouldn't have helped. You make it sound like this update was for a specific filesystem, but that's not true; this update was at the device level.

  3. Article got it wrong on BeOS - 18 EXAbytes! by Snard · · Score: 5, Informative

    Just a side note: BeOS has support for files up to 18 exabytes, not 18 petabytes, as stated in the article. This is roughly 18,000 petabytes, or 2^64 bytes.

    Just wanted to set the record straight.

    --
    - Mike
  4. FreeBSD had it first. by Anonymous Coward · · Score: 1, Informative

    FreeBSD had it first. For over a month. Read the committer CVS Logs and weep, penguin boys.

    http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/de v/ ata/ata-disk.c -> version 1.114

  5. Re:Ok... by kkenn · · Score: 3, Informative

    Well, it's good to see that Linux has caught up, but the article is not correct that Linux is the first OS to support 48-bit ATA; FreeBSD has had this support for over a month now.

    See for example: this file which is one of the files containing the ATA-6r2 code, committed to FreeBSD on October 6.

  6. Just to put this into perspective... by George+Walker+Bush · · Score: 2, Informative
    Just how much data IS 144 petabytes? It's hard to visualize it off the top of one's head, but this link may help to give you perspective at the sheer enormity of the amount:

    http://www.cacr.caltech.edu/~roy/dataquan/

    --
    George W. Bush
    President, United States of America
  7. Uh, no? by srichman · · Score: 3, Informative
    Correct me if I'm wrong, but isn't this very very misleading? The article states that the Linux IDE subsystem can now support single ATA drives up to 144 petabytes (i.e., Linux ATA now has 48 bit LBA support), but my understanding is that many other aspects of the the Linux kernel limit the maximum file size to much less.

    I'm looking at the Linux XFS feature page, which states:

    Maximum File Size
    For Linux 2.4, the maximum accessible file offset is 16TB on 4K page size and 64TB on 16K page size. As Linux moves to 64 bit on block devices layer, file size limit will increase to 9 million terabytes (or the system drive limits).

    Maximum Filesystem Size
    For Linux 2.4, 2 TB. As Linux moves to 64 bit on block devices layer, filesystem limits will increase.

    My understanding is that the 2TB limit per block device (including logical devices) is firm (regardless of the word size of your architecture), and unrelated to what Mr. Hedrick did. Am I wrong? Does this limit disappear if you build the kernel on a 64-bit architecture?

    And, on 32-bit architectures, there's no way to get the buffer cache to address more than 16TB.

  8. Re:OK this is great... by Nadir · · Score: 2, Informative

    Actually you would go for FC (Fiber Channel) not SCSI. Go to http://www.fibrechannel.org for more information.

    --
    --
    The world is divided in two categories:
    those with a loaded gun and those who dig. You dig.
  9. Re:Nice! by Anonymous Coward · · Score: 1, Informative

    unless "recently" is 3 years ago... I can name at least on desktop OS which did that before.

    BeOS.

  10. Re:OK this is great... by rhodespa · · Score: 2, Informative

    The bank I work for currently stores 1.5 Tb a day worth of data. Almost none of it is ever looked at again, but a huge proportion of it is required by regulators. Of course this all goes on tape, since there is no requirement for speedy access.

  11. Re:512? That can't be right. by ericzundel · · Score: 2, Informative
    The 2^48 figure is the number of blocks that can be accessed on the IDE disk from what I can gather.

    2^48 blocks * 512 bytes/block = 144115188075855872 bytes

  12. 1st desktop OS? Well, not quite. by mr · · Score: 5, Informative

    Before you start thumping your chest about how superior or cutting edge *Linux is, go look at these two links
    A slashdot story pointing out how without the FreeBSD ATA code, the Linux kernel would be 'lacking'
    The FreeBSD press release announcing the code is stable

    If The Reg actually researched the story, Andy would have notice it is not a 'first' but more a 'dead heat' between the 2 leading software libre OSes. Instead, The Reg does more hyping of *Linux.

    --
    If it was said on slashdot, it MUST be true!
  13. Pebibytes? by Rabenwolf · · Score: 4, Informative
    And this is even more impressive in pebibytes, too.

    Well, according to the IEC standard, one petabyte is 10^15 (or 1e+15) bytes, while one pebibyte is 2^50 (or 1.125899e+15) bytes.

    So 144 petabytes is 1.44e+17 bytes or 127.89769 pebibytes. Can't say that's more impressive tho. :P

  14. Re:working with large files by Effugas · · Score: 4, Informative

    SSH has done quite a bit of work to support +2GB files. As always, the following will and always has worked:

    cat file | ssh user@host "cat > file"

    More recent builds of SCP will also support +2GB, so:

    scp file user@host:/path
    or
    scp file user@host:/path/file

    will both work.

    In fact, probably the best way for syncing two directories is rsync. Rsync's major weakness is that it's *tremendously* slow for large numbers of files, and I believe it has to read every byte of a large file before it can incrementally transfer it(so you're looking at 2GB+ of reading before transfering). The following will do rsync over ssh:

    rsync -e ssh file user@host:/path/file
    rsync -e ssh -r path user@host:/path

    For incremental log transfers, I actually had a system built that would ssh into the remote side, determine the filesize of the remote file, and then tail from the total file size minus the size of the remote file. It was a bit messy, but it was incredibly reliable. Did have problems when the remote logs got cycled, but it wasn't too ugly to detect that remote filesize was smaller than localfilesize. Just a shell script, after all.

    SFTP should, as far as I know, handle 2GB+ without a hitch.

    Both SCP and SSH of course have compression support in the -C tag; alternatively you can pipe SSH through gzip.

    Email me for further info; there's some SSH docs onto my home page as well. Good luck :-)

    --Dan
    www.doxpara.com

  15. Re:Random statistics.... by Anonymous Coward · · Score: 4, Informative

    Suppose you copy at full PCI bus speed: 133 Megabytes per second. Said backup would take about 34 years.

  16. Example... by mirko · · Score: 3, Informative
    Here is a recent article which may answer your question:


    BTW, it may also re-open the debate:
    --
    Trolling using another account since 2005.
  17. Reality check... by Anonymous Coward · · Score: 5, Informative
    Does anybody realize, that, even with a data rate of the order of 1GB/s, much higher than what current platters can do, it takes about 5 years to fill such a disk.

    I'm already fed up of the time it takes to back up large disks to tape. Drive transfer rate has not improved at the rate of disk capacity in the last few years and is becoming a bottleneck. It was unimportant when the backup time of a single disk was well below one hour (our Ultrium tapes give about 40Gb/hour).

    Just figure that if you want to transfer 144PB in about one day, you need a transfer rate of the order of 1TB/s. Electronics is far from there since it means about 10 terabits/second. Even fiber is not yet there. Barring a major revolution, magnetic media and heads can't be pushed that far. At least it is way further than the foreseeable future.

    Don't get me wrong, it is much better to have more address bits than needed to avoid the painful limitations of 528 Mb, 1024 cylinders etc... But, as somebody who used disks over 1Gb on mainfranmes around 1984-1985, I easily saw all the limitations of the early IDE interfaces (with the hell of CHS addresses and its ridiculously low bit numbers once you mixed the BIOS and interface limitations) and insisted on SCSI on my first computer (now CHS is history thanks to LBA, but the transition has been sometimes painful).

    However, right now big data centers don't always use the biggest drives because they can get more bandwidth by spread the load on more drives (they are also slightly wary of the greatest and latest because reliability is very important). Backing up starts to take too much time,

    In short, the 48 bit block number is not a limit for the next 20 years or so. I may be wrong, but I'd bet it'll take at least 15 years, perhaps much more because it is too dependent on radically new technologies and the fact that the demand for bandwidth to match the increase in capacity will become more prevalent. Increasing the bandwidth is much harder since you'll likely run into noise problems, which are fundamental physical limitations.

  18. Re:Big deal by HalJohnson · · Score: 4, Informative
    Typically I wouldn't even waste time answering such an obvious troll, but maybe you haven't realized what open source is all about, let me make it succinct.

    This obviously mattered to the people who implemented it. If you'd rather see development move in a different direction, by all means, write some code that you feel is useful.

    See, the people who implemented this probably don't give a damn what you feel is important, they care about what they feel is important.

    It's really very simple, put up or shut up.

  19. Limit is for a single IDE disk by wowbagger · · Score: 3, Informative

    This limit is for a SINGLE IDE disk. Now, if you use Logical Volume Management (which is in the standard 2.4 kernel, no patches required) you can combine multiple disks into one.

    Since my machine has 2 IDE controllers, with 2 buses each, and 2 drives per bus, you could make a system with 8 144 pB drives, put an XFS partition on it, and have 1152.92 pB of storage.

    And for meaningless statistics sake: I make my MP3s (from CDs that I own, thankyouverymuch) at an average of 160 kb/sec. At that rate, the specified drive array would store 1826693 YEARS of MP3s. None of which would be Brittany Spears.

  20. Re:OK this is great... by Tall_Rob · · Score: 1, Informative

    What I am really wondering is: is there at the current moment ANY company/application/whatever that required this amount of storage? I thought that even a large bank could manage with a few TB's
    Not intended as a flame, just interested


    I work at a large credit card bank (we're the largest issuer of VISA cards, and our analytic data store is in the top 500 supercomputer sites). Our main Oracle data warehouse has about 38 TB of tablespace in use. It'll be awhile before we need drives with PB capacity. :-)

  21. Re:OK this is great... by warnerve · · Score: 1, Informative
    Here is an example of the need for a few pentabytes of storage:


    NATIONAL VIRTUAL OBSERVATORY TO PUT UNIVERSE ONLINE
    The National Science Foundation has earmarked $10 million for the
    development of a National Virtual Observatory (NVO), a single,
    searchable database of astronomical knowledge culled from
    observatories. The current total volume of astronomical information
    comprises roughly 100 terabytes, and scientists predict this number
    will swell to over 10 pentabytes by 2008. Caltech computer scientist
    Paul Messina said that a single repository for this vast amount of
    data is essential, otherwise, "we will end up like shipwrecked
    sailors on a desert island, surrounded by an ocean of salt water
    and unable to slake our thirst." The goal of the project is to be
    able to conduct intricate computations by using the NVO to leverage
    the computing power of 17 research databases.
    (Newsbytes, 30 October 2001)

  22. Article Updated by Jobby · · Score: 2, Informative

    The Register updated their article. It now acknowledges FreeBSD as being the first Unix to support multi-petabyte filesizes.

    However, NTFS 5.0 (the filesystem that is used by Windows 2000) has had 64-bit addressing since Windows 2000 was released. This yields a maximum capacity of 16 exabytes, which is 8388608 Petabytes. That's right, Windows has supported files eighty thousans times larger than Linux with an experimental patch for the past few years. Still, by the time people actually start needing this kind of storage, I don't think it'll actually matter much...