IBM's High Performance File System
HoosierPeschke writes "BetaNews is running a story about IBM's new file system, General Parallel File System (GPFS). The short and skinny is that the new file system attained a 102 Gigabyte per second transfer rate. The size of the file system is also astonishing at 1.6 petabytes (petabyte == 1,024 terabytes). IBM has up a page with more information and specs on the system.."
There is nothing new about GPFS. Its been around for years.
Atleast someone can make a new filesystem... *cough* Microsoft *cough*
bah!, i want my 25mb hdd from my Amiga 500+ back, at least i undestood the TLA's with that...
;)
But what kind of performance does this give on relatively small ( 10Tbytes) file systems? Petabyte arrays are still kind of out of reach for most.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Are there open source drivers for this FS that can perhaps be integrated into Linux or the *BSD projects?
to say that we can put a lot of porn in 1.6 petabytes ?
How many times is a group going to use G F and S in their filesytem name.
Theres GFS, GFS, GPFS, etc.
dammit.
Is this stuff available in a fashion where we might see it ported for use on standard x86 hardware? Is it GPL'd? I want this in my living room!
-1 Uncomfortable Truth
Wow that"s fast stuff, plus with the ability to slow light to save energy IBM should have some great new systems coming out!
Britney Simpson
What's going to happen to JFS? I was hoping it would get some attention - I am using it for my multimedia server and have been very pleased with the way it handles large DVD image files, not to mention power failures.
12:50 - press return.
I thought this article was going to be about IBM's HPFS from OS/2.
That aside, how do I get one for my TiVo?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
"That's nice, but will Linux run it?"
120 characters for a sig? That's bloody useless.
It's basically data striping across 1000 disks. I suppose the hard part is coordinating all of that parallelism.
So, could someone who actually knows this stuff tell me how well I did?
GPFS Whitepaper - http://www-03.ibm.com/servers/eserver/pseries/soft ware/whitepapers/gpfsprimer.pdf
"GPFS is a cluster file system providing normal application interfaces, and has been available on AIX® operating system-based clusters since 1998 and Linux operating system-based clusters since 2001. GPFS distinguishes itself from other cluster file systems by providing concurrent, high-speed file access to applications executing on multiple nodes in an AIX 5L cluster, a Linux cluster or a heterogeneous cluster of AIX 5L and Linux machines. The processors supporting this cluster may be a mixture of IBM System p5(TM), p5 and pSeries® machines, IBM BladeCenter(TM) or IBM xSeries® machines based on Intel® or AMD processors. GPFS supports the current releases of AIX 5L and selected releases of Red Hat and SUSE LINUX Enterprise Server distributions. See the GPFS FAQ1 for a current list of tested machines and also tested Linux distribution levels. It is possible to run GPFS on compatible machines from other hardware vendors, but you should contact your IBM sales representative for details.
GPFS for AIX 5L and GPFS for Linux are derived from the same programming source and differ principally in adapting to the different hardware and operating system environments. The functionality of the two products is identical. GPFS V2.3 allows AIX 5L and Linux nodes, including Linux nodes on different machine architectures, to exist in the same cluster with shared access to the same GPFS file system. A cluster is a managed collection of computers which are connected via a network and share access to storage. Storage may be shared directly using storage networking capabilities provided by a storage vendor or by using IBM supplied capabilities which simulate a storage area network (SAN) over an IP network.
GPFS V2.3 is enhanced over previous releases of GPFS by introducing the capability to share data between clusters. This means that a cluster with proper authority can mount and directly access data owned by another cluster. It is possible to create clusters which own no data and are created for the sole purpose of accessing data owned by other clusters. The data transport uses either GPFS SAN simulation capabilities over a general network or SAN extension hardware.
GPFS V2.3 also adds new facilities in support of disaster recovery, recoverability and scaling. See the product publications for details2."
It could also allow users to access their porn collections with much greater speed and efficiency.
There's a reason they call them _peta_bytes...
GPFS is one of the more entrenched parallel cluster filesystems available. (others include the classic vax cluster fs, Tru64 cfs, redhat gfs, adic stornext, lustre, Sanergy, polyserve, others) GPFS has been running on IBM's high performance clusters for a decade or more. I've used it, and it's as robust as any of the others I listed above.
I'll caution everyone that you can get 100GB/s of throughput, only if you have a hundred million dollar collection of computers and disks like Livermore has.
Not too shabby.
But then some noob application programmers will do something stupid like use C++ streams to do IO and give it all back.
You can't drive a system at its design limits without coding to the hardware design.
Will this mean that you can share storage more easily, maybe. It certainly seems to reduce sharks/ESS into an expensive interface for attaching discs (but there again there just a load of discs with a AIX box or 2 and SSA adapters to conenct the discs anyhow).
:).
Given the managment/maintenance levels of discs wil be more intergrated and distrubutable with this I cant help but think that OS/features and the trend in (and rightly so) resiliance,easy and sharing resources approach towards what Plan 9 was setout to be.
The more we move on the more we seem to get towards the lego-type appraoch to IT were you can just buy another box of bricks and add on and keep your older bricks instead of throwing the whole lot out and/or hacksawing the end of a brick of and gluing it onto the side of....
Storage wise this is a nice step forwards and having worked on AIX and its many filesystems and managment tools and the ease of getting the job done with the option to get clever if you wish (you chose and not forced) this looks funky albeit its RAID for SAN's in a way.
What I realy want is a FS that will propergate automaticaly and resiliantly in a way that accomodates network diversaty already and I still come down to me wanting, what is all intent a filesystem sat on a database sat on a p2p network, alas atm performance would suck, least today but you know how long code takes to get right and how fast hardware moves - remember alot of code in windows XP has origins to when it was written on a humble 386 cpu if not lower.
What this does show is how netowrk/storage interfaces have moved forward and I/O requests dont hammer CPU's as much as they used to, getting there
SCSI interface, you might be able to upgrade it to 25.
I'm not using it anymore...
Man, you really need that seminar!
They would have released their Google File System.
Doesnt seem revolutionary if its going to be proprietary like that. I'm more impressed with XFS. At least its usable for all Linux users. Oh, and it rocks, too.
The article, as usual for news stories, are lacking juicy tech details. Here's some I found:
The article says 102 GB/s transfer. This PDF about the ASC Purple says they have 11,000 SATA & fiber channel disks (amongst other neat stats). So cursory math says that's about 10 MB/s from each disk.
My question is how useful is that transfer? Pulling in at 102 GB/s is fast and all, but if you can't consume it then it's just ego boosting. What kind of useful data transfer can you do on it? Surely it's for parallel processing (ASC = Advanced Simulation & Computing) of some kind so can this parallel app handle 102 GB/s collectively?
:wq
petabyte !== 1,024 terabytes
petabyte == 1,000 terabytes
ref: http://en.wikipedia.org/wiki/Petabyte
Kibibytes is just so much more fun to say. Especially when it leads to "kibbles & bits."
that would help with porn; If I can't get it to slideshow through enough pictures within 10 seconds, it's over. Less time if i already have my pants down.
but does it run DOOM?
Does the system that stores petabytes of data store them in Petafiles or Pedafiles?
The submitter and editors need to learn their numeric prefixes. Come on! This web site is supposed to be for people who understand computer technology!
A petabyte == 1000 terrabytes
A pebibyte == 1024 terrabytes
Please see the NIST definition page:
http://physics.nist.gov/cuu/Units/binary.html
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
If it's scalable, there's no reason it couldn't scale up to 1.6 petabreads. And the fact that it runs on commodity (cheap) hardware means that you don't need "a hundred million dollar collection of computers and disks like Livermore has".
If not... what's the key difference between the two?
[Fuck Beta]
o0t!
ZFS from Sun is 128-bit. According to this guy
thats a whole load of data:
"Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2^128 blocks = 2^137 bytes = 2^140 bits; therefore the minimum mass required to hold the bits would be (2^140 bits) / (10^31 bits/kg) = 136 billion kg.
That's a lot of gear."
Since GPFS is basically RAID on speed, it should be easy for IBM to write a wrapper for Linux that would allow you to read/write GPFS, without needing to port GPFS per-se. As IBM sells Linux-based machines, being able to access GPFS partitions would seem "obvious", but I could understand them wanting to keep the best-of-the-best for systems they make more money off of.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
It would be nice to see comparisons to RedHat/Sistina's GFS, Lustre (backed by HP), and others listed here.
Also how does this compare to clustered storage that is not run on the hosts themselves like NetApp new Spinnaker based clustering. You also have folks like Isilon, Panasas, and Terrascale.
Anybody have an good data on this?
-Ack
-- soldack
Sounds like a windows crash error:
GPF Fault Error.
So, the important question: How many Libraries of Congress is that per second?
Typical porn movies per hour (TPMH)??
AT&ROFLMAO
the exact number in common practice could be either one of the following:
Real geeks use powers of two; powers of ten we're only introduced for marketing purposes, which real geeks eschew.
I thought this was going to be about OS/2's HPFS! You don't see too many technical articles on OS/2 anymore... bummer! :-(
... 1.6 PB ought to be enough for anybody.
GPFS (apparently -- I know only of what I've learned in the last few hours) is available for Linux, from IBM, right now.
Some people further up in the discussion have warned however that it's not as stable on Linux as it is on AIX, which is really its native platform.
From IBM's page on GPFS:
"GPFS is available as:
* GPFS for AIX 5L on POWER(TM)
* GPFS for Linux on IBM AMD processor-based servers and
IBM eServer® xSeries®
* GPFS for Linux on POWER"
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
We used GPFS in our production environment for about 9 months in 2004/2005. We chose it specifically because it allowed several machines to share the file system (like NFS) but with file locking. It was also supposed to be very fault tolerant with no single point of failure. We set it up using a fiberchannel SAN.
Unfortunately we had a lot of problems with it. For one, performance was quite bad in ceratin cases... doing an ls in a large directory would take a very long time. Doing finds would take a very long time. Once you had a specific file you wanted, opening and reading it was reasonable (though all disk ops were still on the slow side), but multi file operations lagged on the level of 10s of seconds or more. I think it was having to issue network checks to every machine in the set for each file or something.
Also, the CPU usage was very high across all our machines, primarly from lock manager communications. It really taxed the system. And perhaps worst of all, it would caused crashes sometimes. A single machine in the set would die (usually a GPFS assert), and though that didn't break the set permanently, a multi-minute freeze on all disk reads would take place until the set determined the machine was unavailable. We spoke with IBM about all this stuff... provided debugging output and everything, we used the latest patches. But we never got the issues resolved. It was a very rough few months indeed. I probably averaged 4 hours sleep per night.
When I say "slow" what am I comparing it to? In the end we switched to NFS and we came up with a somewhat clever way to avoid the need for file locking. NFS used the same SAN hardware, but had a single point of failure: the head server. We doubled up there with warm failover. The load on all servers dropped dramatically (I'm talking from ~40 load to ~.1 load). Disk operations were orders of magnitude faster. And we've not had a single NFS related lockup or failure in the past year and a half *knocks on wood*.
Anyways -- GPFS probably has some good uses. But I would not recommend it for a very high-volume (lots of files, lots of traffic) mission critical situation. Unless they've made some major improvements.
Cheers.
The intended purpose of ASC Purple is nuclear weapons simulations.
Since they can't actually do tests, either aboveground or below, by treaty anymore, they do simulations instead. I assume these have something to do with modeling how radioactive decay affects the weapons' usability and yield over time (since I don't think they're really in the business of designing new toys, but who knows really), so that you know that a bomb is going to go "pop" instead of "fizzle" when you want it to.
I'd imagine that those kinds of simulations could easily produce tera- and petabytes of data, when run with the sort of precision and initial conditions that LLNL probably wants to use.
I think BlueGene/L (No. 1 on the list of top supercomputers, Purple is 3) is used for the same purpose. Or at least, that was their reason/excuse for purchasing it; exactly what they do with it every day is anybody's guess.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Much better info here (pdf).
"Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information
Um, no, that's wrong.
Bremmermann's Limit is the maximum computational speed in the physical universe (as defined by relativity and quantum mechanical limitations) and is approximately 2 x 10^47 bits per second per gram (or, for those who prefer sexagesimal, one jezend, 60^11, bits per second per gram).
Bousso's covariant entropy bound also called the holographic bound is a theoretical refinement on the Bekenstein Bound that may define the limit of how compact information may be stored, based on current understanding of quantum mechanical limits, and is theorized to be equal to approximately one yezend (60^37, or ~10^66) bits of information contained in a space enclosed by a spherical surface of 1 sq. cm.
Given this, 1 kg of matter can perform approximately 2 x 10^50 bit operations per second per kilogram, in a space much smaller than 1 liter of space. Of course, other physical constraints (non-quantum related) probably limits us to a couple of orders of magnitude less computation, in a couple of orders of magnitude more space, but of course what those limits might be is very speculative
The Future of Human Evolution: Autonomy
640 terabytes should be enough for anybody.
We don't see the world as it is, we see it as we are.
-- Anais Nin
Surely I'm not the only one who sees "GPF" and thinks "General Protection Fault"?!
First it was OS2 (OS 2 ? does the "2" stand for 2nd rate? Is it your 2nd attempt? Is it just a big piece of "#2"?) and now it's this. Don't get me wrong, I think their products are great, but I really think they'd have a hard time marketing air on the Moon!
(Slightly more) seriously, IBM could stand to hire the same marketing folks the beer companies hire...Especially since their markets overlap so much.
This space intentionally left (almost) blank.
Stop it. You're making me hungry for Mediterranian food.
Program Intellivision!
Ok, color me cynical. The first thing that came to mind when I saw this was gee, just what the NSA needs to help them process their enormous collection of data on the day-to-day lives of law abiding American citizens...
Chuck Norris penis is so big that 1.6 petabyte can only store 4 seconds of Chuck Norris porn.
Fuck you.
The Kruger Dunning explains most post on
If GPFS is so great my did IBM put Luster on Blue Gene and not GPFS.
We ran GPFS for about 10 months. It's great for it's primary purpose, and it was pretty stable on Linux, though we had a crash or two... but the biggest problem we ran across was with large number of files. We had > 150 million small files in 10000 directories, and gpfs couldn't handle the load. I'm sure with a smaller number of files, our experience would have been very different. Waiting 10 minutes for an ls in a directory wasn't really what I considered fun. :)
Further evidence that "editor" is a misnomer 'round these parts.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
NTFS has supported 16 exabytes since 1993. That's about 10,000 larger than this new system. I'm not saying that NTFS is great or that IBM's accomplishment is small. But the submitter really shouldn't have said that a 1.6 petabyte filesystem is anything to write home about. Most likely every modern filesystem is at least 64 bit(16 exabytes).
1 petabyte is 1000 terabytes, not 1024 terabytes. Please read and understand SI units for binary prefixes.
Unless I forgot, a single order of magnitude is 10x, not 1000x.
Peta = 1 000 Tera = 1 000 000 Giga = 1 000 000 000 Mega
Hey, I'm just your average shit and piss factory.
I RTFA and the most I could glean off of it was some jazz about parallel reads. Now hold on, I thought a RAID's bread and butter were about parallel reads. Am I missing something?
Do they mean Parallel Reads off of a NAS setup? A bunch of NAS boxes, with some IBM magic, that shows up as a single volue?
They mention that the machine they are using is some massively parrallel monstrosity with multiple raids per unit. Does this FS aggregate all of them into a single volue? (If you haven't noticed, I'm looking for a clustering filesystem that won't fall appart.)
I tried JFS, and it handled power interruptions very poorly.
Essentially, I liked philosophically that the act of mounting and journal replay are separated, it really makes sense. Journal replay should be more an fsck option, thought that was neat. And when you mount read-only, you *mean* read-only, no journal reply or anything even on a 'dirty' filesystem.
However, I found all too frequently that after power failures, it would replay the journal and think everything was fine, until a few hours of usage later when it figures out that it left something in an inconsistant state and remounts read only all of a sudden. Then you fsck and watch lost+found get a few more files. As long as I could recognize the files, I could put them back fairly easily, but I haven't had issue with ext3 yet. Have had similar issues to this with XFS, and, admittedly, far worse with Reiser.
Anyway, returning to topic, GPFS is a filesystem for shared-storage SANs and for aggregating individual node storage into a potentially fault tolerant filesystem (or filesystems). Since they ditched the RSCT stuff a while back, I've found them to be fairly robust and not overly difficult to configure (Lustre I found significantly harder than new GPFS, but lustre is easier than old GPFS to get running). It is not suitable for desktop systems.
XML is like violence. If it doesn't solve the problem, use more.
Feh. That's all well and good and it SEEMS impressive. I have, however, a bunch of mostly old machines networked. I have them share files via samba (mostly to accomadate my girlfriend's windows machine), sometimes NFS and quite often, ssh. Most of the hard drives are small - together, there is a decent amount of room.
I move, by hand, files hither and fro across my little LAN. I write CDs and DVDs across this system. What I would love would be a simple filesystem that would let me tie a few of these together into one virtual network drive. It should be smart and not cut files, say, unless they are really large so that if a node goes down, you can just access the files directly anyway, and at worst, would need to join a few split files with a simple Linux "join" command. You should be able to give this system an order of preference, ie., try to copy files to node A first, and when node A gets to a percentage of full, start at node B instead, etc.
I don't need redundancy and I don't need striping. I would expect such a filesystem to run at the same speed as any network drives, with a very slight overhead as it shifts gears from one machine to the next. But this would be minor since the system would avoid fragmentation, like I said, except for the biggest files.
If would be no more or less dangerous to my data - right now, if a machine with a certain file goes down, I lose access to that file until I get it back up - this would be no different.
Something like unionFS perhaps, except that the files aren't only written to the system on the "right" or howver that thing works or perhaps something like RAID0, but applied to an abitrary folder, even network folders, rather than just a partition on the same machine.
fred
99.9% Vaporware. I work for IBM. This ain't flying. The only reason why I give the .01% credit is because a friend (before he was fired for jerking off in his cube) used to work on it, gave it clout in 2004, but its just not there or even CLOSE to release if it is.
It may be a new file system, but chances are that Sony-BMG has already got it Root-Kitted.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
1.6 petabytes ought to be enough for anyone......
"I bow to no man" - Riddick
Revisionists...
I have never heard someone say "kibibyte" without being beaten up by yet another nerd.
It's TERA not TERRA (earth?) The webpage you link to has it in plain sight too.
Yeah, revisionism sucks. Use the original meanings of the prefixes, the ones in use for centuries. Oh, and that is standard.
Using metric prefixes for powers of two is sloppy, but understandable in context. I can live with that. But it irritates me when kids label the proper use of the prefixes 'wrong', when it's the other way around.
By the way, everything relating to computers is not powers of two. Addresses have a natural connection to powers of two. Drive capacity does not; the number of platters and cylinders can be anything, and the number of sectors is not only unrelated to powers of two, but varies by cylinder. Communication rates do not. Frequencies do not.
I was wondering why my accounting book is so fixated on IBM! Well, now I know.