Slashdot Mirror


IBM's High Performance File System

HoosierPeschke writes "BetaNews is running a story about IBM's new file system, General Parallel File System (GPFS). The short and skinny is that the new file system attained a 102 Gigabyte per second transfer rate. The size of the file system is also astonishing at 1.6 petabytes (petabyte == 1,024 terabytes). IBM has up a page with more information and specs on the system.."

23 of 208 comments (clear)

  1. Nothing new here. Move along. by kperrier · · Score: 3, Informative

    There is nothing new about GPFS. Its been around for years.

    1. Re:Nothing new here. Move along. by Mes · · Score: 2, Informative

      I was working on this 5 years ago, and Im sure its been around much longer than that.

    2. Re:Nothing new here. Move along. by MoxCamel · · Score: 4, Informative
      Agreed. We've been using GPFS for 2 1/2 years. The long and short of it is that it's much more stable on AIX than it is on Linux. It's getting better on Linux, but it's still got a ways to go.

      Mox

    3. Re:Nothing new here. Move along. by Kadin2048 · · Score: 5, Informative

      I think the "news" is the transfer rate, not the file system.

      According to this article, the idea was just to see how fast a sustained transfer rate they could achieve. That rate was 102 GiB/s, which apparently is a record. The purpose of the project apparently has something to do with reducing the bottlenecking in parallel-computing interconnects. The machine they used, ASC Purple (a weapons-research system at Lawrence Livermore Labs) has about 10,000+ processors, so that's their obvious application.

      The filesystem itself doesn't seem to be anything new -- I have no idea why the poster fixated on that, since it's kind of a minor footnote in most of the articles I've read about this today.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  2. since the /. blurb doesn't explain it... by frankie · · Score: 5, Informative
    ...let's see if I can, never having heard of GPFS before 10 minutes ago:
    • GPFS is not new; GPFS 1.0 dates to 1998
    • IBM is touting its latest point update, v2.3
    • analogy: desktop PC is to BlueGene as RAID is to GPFS cluster

    It's basically data striping across 1000 disks. I suppose the hard part is coordinating all of that parallelism.

    So, could someone who actually knows this stuff tell me how well I did?

    1. Re:since the /. blurb doesn't explain it... by TRS-80 · · Score: 2, Informative
      You missed the fact that GPFS is non-Free (tm):
      The prices for GPFS for AIX 5L, GPFS for Linux on POWER, and GPFS for Linux on Multiplatform are based on the number of processors active on the server where GPFS is installed.
    2. Re:since the /. blurb doesn't explain it... by flaming-opus · · Score: 2, Informative

      yeah, except substitute 1000 disks with 10,000 disks. They almost certaintly are stiping across a bunch of mid-range IBM raids, each with ~100 disks, and probably getting around 1-2 GB/s.

      It's also striping across many machines in a cluster. Each of those nodes maxes out at 'only' 15 GB/s of I/O, so they wire up all the nodes to a bunch of fibre channel cards, and plug them all into the raids, to distribute the I/O access to the nodes. GPFS also lets you do the I/O over the cluster interconnect, but then your interconnect bandwidth usable by the application has to compete with the filesystem traffic.

      As for coordinating all the parallelism, there's a metadata node (actually a failover pair of nodes) that does the metadata operations (create, rename, remove, link) and each cluster node does file I/O directly to disk. Typically, each of the nodes write to seperate files , to avoid having to do concurrent I/O. You can have all the nodes write to different byte ranges within the same file, but you have to use special flags to enable this, and the application has to written to legitimately write to very distant parts of the file. Often it's simplest just to write to different scratch files for intermediate results, and then combind the output at the end of the run.

  3. Re:So what about JFS? by Tester · · Score: 3, Informative

    GPFS is a cluster file system.. its in a completely different category.

  4. GPFS Information and links by Anonymous Coward · · Score: 5, Informative
    GPFS FAQ - http://publib.boulder.ibm.com/infocenter/clresctr/ index.jsp?topic=/com.ibm.cluster.gpfs.doc/gpfs_faq s/gpfs_faqs.html

    GPFS Whitepaper - http://www-03.ibm.com/servers/eserver/pseries/soft ware/whitepapers/gpfsprimer.pdf

    "GPFS is a cluster file system providing normal application interfaces, and has been available on AIX® operating system-based clusters since 1998 and Linux operating system-based clusters since 2001. GPFS distinguishes itself from other cluster file systems by providing concurrent, high-speed file access to applications executing on multiple nodes in an AIX 5L cluster, a Linux cluster or a heterogeneous cluster of AIX 5L and Linux machines. The processors supporting this cluster may be a mixture of IBM System p5(TM), p5 and pSeries® machines, IBM BladeCenter(TM) or IBM xSeries® machines based on Intel® or AMD processors. GPFS supports the current releases of AIX 5L and selected releases of Red Hat and SUSE LINUX Enterprise Server distributions. See the GPFS FAQ1 for a current list of tested machines and also tested Linux distribution levels. It is possible to run GPFS on compatible machines from other hardware vendors, but you should contact your IBM sales representative for details.

    GPFS for AIX 5L and GPFS for Linux are derived from the same programming source and differ principally in adapting to the different hardware and operating system environments. The functionality of the two products is identical. GPFS V2.3 allows AIX 5L and Linux nodes, including Linux nodes on different machine architectures, to exist in the same cluster with shared access to the same GPFS file system. A cluster is a managed collection of computers which are connected via a network and share access to storage. Storage may be shared directly using storage networking capabilities provided by a storage vendor or by using IBM supplied capabilities which simulate a storage area network (SAN) over an IP network.

    GPFS V2.3 is enhanced over previous releases of GPFS by introducing the capability to share data between clusters. This means that a cluster with proper authority can mount and directly access data owned by another cluster. It is possible to create clusters which own no data and are created for the sole purpose of accessing data owned by other clusters. The data transport uses either GPFS SAN simulation capabilities over a general network or SAN extension hardware.

    GPFS V2.3 also adds new facilities in support of disaster recovery, recoverability and scaling. See the product publications for details2."

  5. GPFS is not new by flaming-opus · · Score: 2, Informative

    GPFS is one of the more entrenched parallel cluster filesystems available. (others include the classic vax cluster fs, Tru64 cfs, redhat gfs, adic stornext, lustre, Sanergy, polyserve, others) GPFS has been running on IBM's high performance clusters for a decade or more. I've used it, and it's as robust as any of the others I listed above.

    I'll caution everyone that you can get 100GB/s of throughput, only if you have a hundred million dollar collection of computers and disks like Livermore has.

  6. Re:Translation: by slackaddict · · Score: 2, Informative
    Yes:

    GPFS supports the current releases of AIX 5L and selected releases of Red Hat and SUSE LINUX Enterprise Server distributions. See the GPFS FAQ1 for a current list of tested machines and also tested Linux distribution levels.

    --
    ConsultingFair.com
  7. Tech details by MasterC · · Score: 2, Informative

    The article, as usual for news stories, are lacking juicy tech details. Here's some I found:

    The article says 102 GB/s transfer. This PDF about the ASC Purple says they have 11,000 SATA & fiber channel disks (amongst other neat stats). So cursory math says that's about 10 MB/s from each disk.

    My question is how useful is that transfer? Pulling in at 102 GB/s is fast and all, but if you can't consume it then it's just ego boosting. What kind of useful data transfer can you do on it? Surely it's for parallel processing (ASC = Advanced Simulation & Computing) of some kind so can this parallel app handle 102 GB/s collectively?

    --
    :wq
  8. unit correction by psbrogna · · Score: 2, Informative

    petabyte !== 1,024 terabytes

    petabyte == 1,000 terabytes

    ref: http://en.wikipedia.org/wiki/Petabyte

    Kibibytes is just so much more fun to say. Especially when it leads to "kibbles & bits."

  9. 1.6 petabytes isn't that big a deal by jm91509 · · Score: 4, Informative

    ZFS from Sun is 128-bit. According to this guy
    thats a whole load of data:

    "Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2^128 blocks = 2^137 bytes = 2^140 bits; therefore the minimum mass required to hold the bits would be (2^140 bits) / (10^31 bits/kg) = 136 billion kg.

    That's a lot of gear."

    1. Re:1.6 petabytes isn't that big a deal by rkww · · Score: 2, Informative
      1051 operations per second on at most 1031 bits


      That'll be 10^51 and 10^31...

  10. Re: 10 Tbytes? by Kadin2048 · · Score: 4, Informative

    From the articles I've read, this was accomplished using (some subset of) ASC Purple, which is full of a lot of either custom or IBM-proprietary stuff (or else stuff that nobody but IBM seems to be using).

    According to the published/unclassified spec sheet:

    "Purple has 2 million gigabytes of storage from more than 11,000 Serial ATA and Fibre Channel disks. ... Each login node has eight 10-gigabytes-per-second network connections for parallel file transfer protocol and two 1-gigabyte-per-second network connections for network file systems and secure shell protocol. The system has a three-stage 1,536 port dual plane Federation switch interconnect ..."

    I think that it was this last thing, the Federation interconnect, that they were pushing the data over in this test, since it forms the backbone of the machine and links the storage nodes to the login node controllers, which then connect to the login nodes themselves (of which there are apparently over 1,400 of, according to this). I couldn't find much information on Federation, as it seems to only be used in a few systems, of which Purple is the most notable. One reference I found seems to put it at 1.49 GB/sec (11.92 Gbit/s) bandwidth, although it's not clear if that's "dual plane" Federation or not. 4X SDR Infiniband is around 10 Gbit/sec, IIRC, so Federation's a little faster.

    It does sound a little like it was a case of "hey, what can we do with $230M worth of hardware? I know, let's break some records." So they did. I'm not sure that there's anything there that anyone else couldn't do, with different technologies, given the same investment of capital -- it's just a matter of who else wants to, and has the capability.

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  11. Re:Well.... by Firewalker_Midnights · · Score: 3, Informative

    "introducing OrigamiFS, you write it out on paper then fold it in half as many times as you can"

    Apparrently it can only be folded 12 times, at most. Unless M$ has created a new form of highly (unstable) foldable OS :D

    --
    I Lost My Virginity While Waiting for BSD to Compile.
  12. Re:binary prefixes by Richard+Steiner · · Score: 4, Informative

    The new SI prefixes are nice and all, but there are three or four decades of prior usage that have to be unlearned before some of us will use them intuitively. Or at all. :-)

    Context-sensitive conversion of SI prefixes isn't all that difficult. Really. It's commonly understood that data is stored in powers of 2, and the subject is only relevant if (1) you're a sales type, or (2) you are being overly pedantic about an unwanted and unneeded SI standard.

    --
    Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
    The Theorem Theorem: If If, Then Then.
  13. Function of Purple by Kadin2048 · · Score: 2, Informative

    The intended purpose of ASC Purple is nuclear weapons simulations.

    Since they can't actually do tests, either aboveground or below, by treaty anymore, they do simulations instead. I assume these have something to do with modeling how radioactive decay affects the weapons' usability and yield over time (since I don't think they're really in the business of designing new toys, but who knows really), so that you know that a bomb is going to go "pop" instead of "fizzle" when you want it to.

    I'd imagine that those kinds of simulations could easily produce tera- and petabytes of data, when run with the sort of precision and initial conditions that LLNL probably wants to use.

    I think BlueGene/L (No. 1 on the list of top supercomputers, Purple is 3) is used for the same purpose. Or at least, that was their reason/excuse for purchasing it; exactly what they do with it every day is anybody's guess.

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  14. No, the limits are much higher than that by FreeUser · · Score: 4, Informative

    "Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information

    Um, no, that's wrong.

    Bremmermann's Limit is the maximum computational speed in the physical universe (as defined by relativity and quantum mechanical limitations) and is approximately 2 x 10^47 bits per second per gram (or, for those who prefer sexagesimal, one jezend, 60^11, bits per second per gram).

    Bousso's covariant entropy bound also called the holographic bound is a theoretical refinement on the Bekenstein Bound that may define the limit of how compact information may be stored, based on current understanding of quantum mechanical limits, and is theorized to be equal to approximately one yezend (60^37, or ~10^66) bits of information contained in a space enclosed by a spherical surface of 1 sq. cm.

    Given this, 1 kg of matter can perform approximately 2 x 10^50 bit operations per second per kilogram, in a space much smaller than 1 liter of space. Of course, other physical constraints (non-quantum related) probably limits us to a couple of orders of magnitude less computation, in a couple of orders of magnitude more space, but of course what those limits might be is very speculative

    --
    The Future of Human Evolution: Autonomy
  15. my gpfs problem by krismon · · Score: 2, Informative

    We ran GPFS for about 10 months. It's great for it's primary purpose, and it was pretty stable on Linux, though we had a crash or two... but the biggest problem we ran across was with large number of files. We had > 150 million small files in 10000 directories, and gpfs couldn't handle the load. I'm sure with a smaller number of files, our experience would have been very different. Waiting 10 minutes for an ls in a directory wasn't really what I considered fun. :)

  16. NTFS by Jaime2 · · Score: 2, Informative

    NTFS has supported 16 exabytes since 1993. That's about 10,000 larger than this new system. I'm not saying that NTFS is great or that IBM's accomplishment is small. But the submitter really shouldn't have said that a 1.6 petabyte filesystem is anything to write home about. Most likely every modern filesystem is at least 64 bit(16 exabytes).

  17. Try six orders of magnitude by irritating+environme · · Score: 2, Informative

    Unless I forgot, a single order of magnitude is 10x, not 1000x.

    Peta = 1 000 Tera = 1 000 000 Giga = 1 000 000 000 Mega

    --


    Hey, I'm just your average shit and piss factory.