There's an alternative to multi-threading. Will we see MPI used in more than just supercomputing applications? The message-passing vs. shared memory arguments are as old as the hills in high-end computing, with most (but not all) applications using message passing. As the memory hierarchy on personal computers looks more and more like NUMA, message passing starts to look really attractive, even when the parallelism is across a single node.
I think his post is a little off the mark, but jd (1658) has the right idea -- if you are concerned about portable representation of your data, you might want to use a higher-level library. HDF5 and NetCDF are good choices.
Even beter might be Parallel-NetCDF. It has all the benefits of a high-level library (portable, self-describing data representation), but it has a much simpler interface than HDF5. Unlike serial NetCDF, you'll probably see much better performance as all processes can carry out I/O collectively instead of forwarding to a master .
The hardware sounds simlar to an ibook. I'm not sure if anybody ported mythtv to os x yet, but linux runs great on ppc.
You might want to wait a few months for any hardware support quirks to make their way into the linux kernel, but putting linux on this puppy and making it a mythtv box is a very exciting idea. If you add a bluetooth keyboard and mouse, you've got one nice setup.
That would mean it should compete on the level of OpenAFS, Intermezzo and CODA for fault tolerant network filesystems -- except it would have internode locking which the others don't at the moment.
That's an interesting thought, but at no time have we ever thought of ourselves as a replacement for those file systems. The ones you mention are general purpose file systems whereas PVFS2 is meant to be a fast file system for parallel applications.
except it would have internode locking which the others don't at the moment.
I'm not sure what you mean here. We have no locking anywhere -- which is exactly why we can deliver such high performance. Scientific applicaitons often don't need a locking subsystem getting in their way.
Is this project set on integrating with the mainline kernel? What has/will happen on that front?
There really isn't much for us *to* integerate into the kernel. We do have a VFS interface, but it acts primarily as a way to convert kernel system calls into userspace PVFS2 calls. Yes, there are lots of "file system in userspace" projects, but by making something that works just for PVFS2, we can get better performance.
This also looks perfect for an active/active LinuxHA failover cluster -- if it has redundancy, which any clustering filesystem should have. Right now the LinuxHA project is integrating GFS into their stack of interwoven sub-projects.
Funny you should mention LinuxHA. I spent some time this summer setting it up with PVFS2. If you really care about redundancy, you can invest in shared storage solutions (SCSI and firewire drives can be shared between two hosts simulaneously -- if you buy the really expensive stuff). With shared storage, you've got a way to tolerate node failure. You're still screwed if something eats your big expensive hard drive, granted. We're working on software replication.
PVFS2 looks like it has a similar archatecture to Lustre, except PVFS2 is developed openly.
Thanks for noticing! While I understand why CFS has taken the approach they have, we really feel that the HPC community (and Linux in general) needs a file system that's free software.
We don't encourage anyone to rely on PVFS2 to host the sole copy of their data. So it might not be the best idea to use PVFS2 as a "transparent storage network".
PVFS2's real sweet spot is for scratch space for scientific applications -- writing out checkpoints, reading in datasets.
I don't know if I'd call what PVFS2 has a "reliability problem". If you've got money, hardware-based failover solutions exist today and work well with PVFS2 (think heartbeat). In the not-so-distant future we've got people working on software based replication of data, but no matter how you slice it, there's going to be a performance hit. The trick is to find a way to replicate data while hiding the signs of that extra work from the clients. A final solution is a little ways away, but we're pretty confident we can eventually implement a good replcation method.
Your other item -- about wanting to add servers as needed -- is something we've heard from a lot of people. We think we can make that happen without a ton of effort, but it didn't make the 1.0 cutoff.
> * flexible and extensible data distribution modules, > * distributed metadata, > * stateless servers and clients (no locking subsystem),
Just to clarify... while we have distributed metadata, we don't have *replicated* metadata. At least, not yet.
If you have multiple metadata servers they will do load balancing. If you are working with lots and lots of small files, having a couple metadata servers might alieviate a possible bottleneck.
There's no reason why a similar hack couldn't be used to swap songs between two Rio Karmas. Remember, the Rio Karma comes with enough connectivity options to make a grown man weep (usb, ethernet, RCA). And with the Karma, you can do it out of the box without any 3rd party add-ons.
I just bought a Karma too and aside from the crappy linux java transfer software, it's great.
If the Karma, with a nice form factor and all the formats it supports, can't get more mindshare, I don't see how Sony has a chance of gaining any marketshare with their unique format....
What is GFS good for? Many things! It would be great for a large computational cluster that had a very large (multi-terabyte) dataset and high disk I/O requirements.
You can't use GFS in a computational cluster. I've tried. It's not pretty. It wasn't designed for scientific applications.
file-based locking: simultaenous writes to a file (parallel checkpointing, parallel image processing) will become serialized and kill your performance
scalability: both in cost and stability. The sweet spot for GFS is 1-10 nodes, and you need a super-expensive SAN to get that many.
Your post is 90% on the money, but for a computational cluster, please use a real parallel file system (one example is PVFS2
A lot of the message-passing is therefore avoided; and the performance costs that those message passes would incur.
Crap, if Apple has taken measures to improve the Darwin kernel performance, then I would submit that they have failed miserably.
Since we're talking kernels here, I don't feel bad bringing out lmbench numbers. here and hereare lmbench runs on the same hardware comparing linux to Darwin. The Linux kernel kicks the crap out of the Darwin kernel, in most cases by an order of magnitude.
The mach microkernel performs horribly. CPU-intensive work will do just fine on Darwin, but if you do anything that involves the OS, you'll pay a penalty.
Darwin is a very odd UNIX, as well (.dynlib, Mach-O binary format), so a more famvilar UNIX like Linux or NetBSD might be a fine choice for the excellent G5 hardware.
Heh. It's not *microkernels* that are the problem. It's specifically the Mach microkernel upon which Dawrin is based.
It would have been a lot of fun to see Apple use the l4 microkernel instead of the academically uninteresting and performance-poor Mach-based kernel now used.
Anyone know how OSX handles "root" access separately from other users? I'm just curious.
the user created when you install os x is an 'administrator'. so you run around as a normal user, but when you have to install a system update or something root-like, a dialog box pops up.
Kinda like this: the response to the gui-equivalent of install -m 755 foobinary/usr/bin isn't "permission denied", it's "this operation requires an administrator password: enter it here"
actually, i guess it is:
http://porting.openoffice.org/mac
os x makes a distinction between a shared library and a loadable module ( "plugin" ). It's quite a different platform to target. the open office team would love people familiar with it to help out.
the thought process is basically "why would i run photoshop in os x under emulation [ yes, that's the mindset] when i can run it natively under os 9?".
There's an alternative to multi-threading. Will we see MPI used in more than just supercomputing applications? The message-passing vs. shared memory arguments are as old as the hills in high-end computing, with most (but not all) applications using message passing. As the memory hierarchy on personal computers looks more and more like NUMA, message passing starts to look really attractive, even when the parallelism is across a single node.
You have to use the "big words" [re: ideas, terms, vocabulary beyond a 6th grade level] to be practical.
Oh yeah? how about "Albert Einstein's Theory of Relativity In Words of Four Letters or Less"
http://www.muppetlabs.com/~breadbox/txt/al.html
http://www.penny-arcade.com/comic/2005/05/27
I think his post is a little off the mark, but jd (1658) has the right idea -- if you are concerned about portable representation of your data, you might want to use a higher-level library. HDF5 and NetCDF are good choices.
Even beter might be Parallel-NetCDF. It has all the benefits of a high-level library (portable, self-describing data representation), but it has a much simpler interface than HDF5. Unlike serial NetCDF, you'll probably see much better performance as all processes can carry out I/O collectively instead of forwarding to a master .
The hardware sounds simlar to an ibook. I'm not sure if anybody ported mythtv to os x yet, but linux runs great on ppc.
You might want to wait a few months for any hardware support quirks to make their way into the linux kernel, but putting linux on this puppy and making it a mythtv box is a very exciting idea. If you add a bluetooth keyboard and mouse, you've got one nice setup.
That's a pretty old URL. You might want to update your bookmarks to take you here:
http://www.gcmtravel.com/gcm/maps_chicago.jsp
I use it every day before heading home to see if all those crazy side-road shortcuts will pay off "this time"
That would mean it should compete on the level of OpenAFS, Intermezzo and CODA for fault tolerant network filesystems -- except it would have internode locking which the others don't at the moment.
That's an interesting thought, but at no time have we ever thought of ourselves as a replacement for those file systems. The ones you mention are general purpose file systems whereas PVFS2 is meant to be a fast file system for parallel applications.
except it would have internode locking which the others don't at the moment.
I'm not sure what you mean here. We have no locking anywhere -- which is exactly why we can deliver such high performance. Scientific applicaitons often don't need a locking subsystem getting in their way.
Is this project set on integrating with the mainline kernel? What has/will happen on that front?
There really isn't much for us *to* integerate into the kernel. We do have a VFS interface, but it acts primarily as a way to convert kernel system calls into userspace PVFS2 calls. Yes, there are lots of "file system in userspace" projects, but by making something that works just for PVFS2, we can get better performance.
This also looks perfect for an active/active LinuxHA failover cluster -- if it has redundancy, which any clustering filesystem should have. Right now the LinuxHA project is integrating GFS into their stack of interwoven sub-projects.
Funny you should mention LinuxHA. I spent some time this summer setting it up with PVFS2. If you really care about redundancy, you can invest in shared storage solutions (SCSI and firewire drives can be shared between two hosts simulaneously -- if you buy the really expensive stuff). With shared storage, you've got a way to tolerate node failure. You're still screwed if something eats your big expensive hard drive, granted. We're working on software replication.
PVFS2 looks like it has a similar archatecture to Lustre, except PVFS2 is developed openly.
Thanks for noticing! While I understand why CFS has taken the approach they have, we really feel that the HPC community (and Linux in general) needs a file system that's free software.
We don't encourage anyone to rely on PVFS2 to host the sole copy of their data. So it might not be the best idea to use PVFS2 as a "transparent storage network".
PVFS2's real sweet spot is for scratch space for scientific applications -- writing out checkpoints, reading in datasets.
I don't know if I'd call what PVFS2 has a "reliability problem". If you've got money, hardware-based failover solutions exist today and work well with PVFS2 (think heartbeat). In the not-so-distant future we've got people working on software based replication of data, but no matter how you slice it, there's going to be a performance hit. The trick is to find a way to replicate data while hiding the signs of that extra work from the clients. A final solution is a little ways away, but we're pretty confident we can eventually implement a good replcation method.
Your other item -- about wanting to add servers as needed -- is something we've heard from a lot of people. We think we can make that happen without a ton of effort, but it didn't make the 1.0 cutoff.
Thanks for the feedback
> * flexible and extensible data distribution modules,
> * distributed metadata,
> * stateless servers and clients (no locking subsystem),
Just to clarify... while we have distributed metadata, we don't have *replicated* metadata. At least, not yet.
If you have multiple metadata servers they will do load balancing. If you are working with lots and lots of small files, having a couple metadata servers might alieviate a possible bottleneck.
You might want to evaluate pvfs2 (pvfs.org/pvfs2)
There don't seem to be any torrents or mirrors for this stuff, alas.
The Progney folks are sitting on a giant pipe. I doubt you'll be able to floor them.
There's no reason why a similar hack couldn't be used to swap songs between two Rio Karmas. Remember, the Rio Karma comes with enough connectivity options to make a grown man weep (usb, ethernet, RCA). And with the Karma, you can do it out of the box without any 3rd party add-ons.
Why is this special?
I just bought a Karma too and aside from the crappy linux java transfer software, it's great.
If the Karma, with a nice form factor and all the formats it supports, can't get more mindshare, I don't see how Sony has a chance of gaining any marketshare with their unique format....
You can't use GFS in a computational cluster. I've tried. It's not pretty. It wasn't designed for scientific applications.
Your post is 90% on the money, but for a computational cluster, please use a real parallel file system (one example is PVFS2
Crap, if Apple has taken measures to improve the Darwin kernel performance, then I would submit that they have failed miserably.
Since we're talking kernels here, I don't feel bad bringing out lmbench numbers. here and hereare lmbench runs on the same hardware comparing linux to Darwin. The Linux kernel kicks the crap out of the Darwin kernel, in most cases by an order of magnitude.
cat and girl explain it all
Darwin is a very odd UNIX, as well (.dynlib, Mach-O binary format), so a more famvilar UNIX like Linux or NetBSD might be a fine choice for the excellent G5 hardware.
lmbench numbers backing up my OS claim
Heh. It's not *microkernels* that are the problem. It's specifically the Mach microkernel upon which Dawrin is based.
It would have been a lot of fun to see Apple use the l4 microkernel instead of the academically uninteresting and performance-poor Mach-based kernel now used.
Yeah. Looks like you need 4.3.x
Hey, why not have the best of both worlds?
OS X as an end-user experience is nice. I won't argue that. The Darwin kernel, however, is a pice of ungodly shit.
So i put linux on my apple hardware. Here's why:
linux beats os x by a factor of 10 on most os-specific tasks
Your Xrender is too old. perhaps you are running Xfree-4.2.x?
0 03-May/000434.html
http://mail.fontconfig.org/pipermail/fontconfig/2
Anyone know how OSX handles "root" access separately from other users? I'm just curious.
/usr/bin isn't "permission denied", it's "this operation requires an administrator password: enter it here"
the user created when you install os x is an 'administrator'. so you run around as a normal user, but when you have to install a system update or something root-like, a dialog box pops up.
Kinda like this: the response to the gui-equivalent of install -m 755 foobinary
Hey! get back here!
actually, i guess it is:
http://porting.openoffice.org/mac
os x makes a distinction between a shared library and a loadable module ( "plugin" ). It's quite a different platform to target. the open office team would love people familiar with it to help out.
read the macslash discussion on this topic here
because most mac users *hate* that "slow start".
the thought process is basically "why would i run photoshop in os x under emulation [ yes, that's the mindset] when i can run it natively under os 9?".