Looking for Portable MPI I/O Implementation?
rikt writes "I am trying to implement MPI I/O for our CFD product. I am facing a problem with the portability of the generated data files. MPI2 interface describes a way to achieve this either by using 'external32' or user defined data representations. The problem is that ROMIO, the most widely available MPI I/O implementation, has not implemented support for any data representation other than 'native'. Do you know of any MPI I/O implementation that supports this, and is available on various platforms? I know IBM and Sun supports this, but I am looking for a solution on Linux and Windows (both 32 & 64 bit) as well."
I'm a geek who does administration and programming in Windows and Linux realms, am fairly aware of my acronym soup and yet this left me, um, cold. For those who don't feel like doing the research:
MPI: Message Passing Interface, a standard for parallel processing environment message passing.
MPI-2: Extended version of MPI.
MPI-IO: Parallel input/output extensions for MPI, included in MPI-2
ROMIO: An implementation of these extensions.
CFD: Computational Fluid Dynamics (a good candidate for parallel processing, thus the interest in the above).
Of course, the fact I had to look them up means I have no idea about implementations, but at least others won't have to wonder what all that was about.
Sig under construction since 1998.
Now, we move onto the portable I/O. The vast majority of scientific software (which is, in turn, the bulk of MPI-based software) uses the Heirarchical Data Format. There are two versions worthy of mention - HDF5 and Parallel HDF. Both support MPI in operations. Compile HDF5 with MPI support, and you have something that will support platform-independent atomic and compound data types.
Of all the options, HDF5 (from the NCSA) is the most widely used. I would say that the majority of scientific and distributed software out there that uses platform-independent typing uses HDF. So does the grid computing system Globus. The other platform-independent complex data typing libraries, CDF (from NASA) and NetCDF (from UniData), are rarely used. Indeed, the next generation of NetCDF - version 4 - will be built on top of HDF5. There's a link to the development site and the source code on Freshmeat.
Less-widely used, but still very significant, is the Transparent Parallel I/O Environment. I am not 100% sure if this supports MPI, it's been a while since I've used it and I never put in the dependencies on Freshmeat for it.
Depending on what is being done, PETSc may also be worth checking out. This supports MPI-based differential equations.
Globus can use MPI for communication and then handle the I/O directly. This means you only have to write your interface for one API, not one API per type of operation. Main problem is that Globus has a fairly large footprint, so you might not want to do that unless the project is large enough to warrant that kind of sophistication.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)