Slashdot Mirror


Looking for Portable MPI I/O Implementation?

rikt writes "I am trying to implement MPI I/O for our CFD product. I am facing a problem with the portability of the generated data files. MPI2 interface describes a way to achieve this either by using 'external32' or user defined data representations. The problem is that ROMIO, the most widely available MPI I/O implementation, has not implemented support for any data representation other than 'native'. Do you know of any MPI I/O implementation that supports this, and is available on various platforms? I know IBM and Sun supports this, but I am looking for a solution on Linux and Windows (both 32 & 64 bit) as well."

36 comments

  1. LAME by Anonymous Coward · · Score: 0

    Just explain a few of the TLAs, mmkay?

  2. Huh? by ThinkingInBinary · · Score: 0, Offtopic

    Am I the only one who thinks this is way too specific a question for Ask Slashdot, and that the subject at hand is not one that most people would know offhand? This seems much more appropriate to a topic-specific newsgroup or mailing list, not the general tech community.

    1. Re:Huh? by Usquebaugh · · Score: 2, Insightful

      Go away and read.

      I hate the drivel question asked on /. This type of question is great, it usually means I'll go off and read up on this stuff.

    2. Re:Huh? by Anonymous Coward · · Score: 0

      No; this type of question is wrong on slashdot. slashdot IS NOT a support site, and it IS NOT a specific newsgroup. If such a story is posted there're some conclusions: - the story poster tries to impress most other reader because they probably don't know what s/he is talking about - the story poster is too lazy/stupid to look up the correct support channel (be it a manufacturer support line, a news group, a specific web board..)

    3. Re:Huh? by curious.corn · · Score: 2, Funny

      so it's a place for socially impaired, would be geeks wasting their early adulthood modding cases with cold cathode tubes? Or is it a place for outrageous infomercial placements between the odd "geekly spinned" drivel on some old hashed, long gone by, fad. Or is it a place were, once again, would be IT geeks, dispense their judgements on the latest linux install screenshots... bitterly chastising whoever dares to perturb their self assessed expertise with questions and topics they can't even fathom? Your attitude is what makes up generalist TV, nothing too complicated to challenge the audience as it might be offended by it.

      --
      Mi domando chi à il mandante di tutte le cazzate che faccio - Altan
    4. Re:Huh? by OwlofCreamCheese · · Score: 1

      that rant was good right up the part where you decided to turn and go and rant elitist yourself today about the devil tv.

      --
      -You're wasting your time. Alfador only likes me.
    5. Re:Huh? by StrawberryFrog · · Score: 1

      If you write "I am trying to implement MPI I/O for our CFD" I would expect you to explain or otherwise link those terms. Explication is required for a general audience, even an intelligent one.

      --

      My Karma: ran over your Dogma
      StrawberryFrog

  3. Definitions by Godeke · · Score: 5, Informative

    I'm a geek who does administration and programming in Windows and Linux realms, am fairly aware of my acronym soup and yet this left me, um, cold. For those who don't feel like doing the research:

    MPI: Message Passing Interface, a standard for parallel processing environment message passing.
    MPI-2: Extended version of MPI.
    MPI-IO: Parallel input/output extensions for MPI, included in MPI-2
    ROMIO: An implementation of these extensions.
    CFD: Computational Fluid Dynamics (a good candidate for parallel processing, thus the interest in the above).

    Of course, the fact I had to look them up means I have no idea about implementations, but at least others won't have to wonder what all that was about.

    --
    Sig under construction since 1998.
    1. Re:Definitions by RabidMonkey · · Score: 1

      thank you - I've been on /. for a number of years, and this if the first story that I didn't understand one bit of.

      gobblety gook

      --
      We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
  4. Maybe I am missing something by sfcat · · Score: 2, Insightful
    But I think native is just a bit vector. So implementing your own messages on top of it is possible assuming that you won't have to move messages between little endian and big endian machines and the compilers for the various machines implments the structs (or other data structures) the same in memory.

    Think of it this way.

    /* for each type of message make sure int messageType; is the first element of each struct and messageSize is the second element of each struct*/

    typedef struct MessageStruct {

    int messageType;

    int messageSize;

    /* some message data*/

    } Message;

    ...

    /* we send the message here*/

    Message msg;

    msg.type = messageType;

    msg.size = sizeof(Message);

    msg.data = someData; /* repeat for each part of the message struct*/

    SendMessage(&msg, msg.size);

    /* we receive the message here*/

    MPIMessage msg;

    if (msg.size == sizeof(Message)) {

    Message *msg = &msg;

    /* do stuff with message */

    }

    I think something along these lines should work. Just make a struct for each type of message your app has. Then check the size and type elements of the structs to determine which type of message you have recieved. You can also just make a struct with just a type and size field and copy the first 8 bytes of the message into that and use that to determine the type of message. I'm sure I am missing some implementation details, but something like this should handle your problem.

    --
    "Those that start by burning books, will end by burning men."
    1. Re:Maybe I am missing something by Krellan · · Score: 3, Informative

      MPI is already very good at converting data between the various computers involved in a parallel MPI program.

      There's almost an absurd number of datatype declaration, conversion, etc. functions in MPI. If you properly set up MPI_Datatype types to hold your data, then the MPI library will be able to handle it all internally. Then, when sending and receiving messages, it will automatically do conversions as needed (between big-endian and little-endian machines, and so on).

      So the problem isn't one of sending/receiving data between machines of differing architecture. The problem is writing this data to a file, and then reading it in again at a later date, possibly on a different machine. This is a harder problem.

      The MPI I/O extensions (part of MPI-2) tried to address this somewhat. There is a file format "external32" in the spec, that was supposed to be universal, with a standard encoding for all datatypes, and so on. However, evidently it was never implemented fully, as I haven't been able to find it.

    2. Re:Maybe I am missing something by quasi_steller · · Score: 2, Informative

      When I did MPI projects for school I essentially did this when I wanted to send something in a struct. However, as one poster already pointed out, MPI takes care of the conversions between big and little endian. If you have a homogenious network, you'll probably be okay just sending a struct as a buffer. That said, if you want something a little more robust, MPI does have rather extensive user defined datatype creation capabilities.

      I learned a little about these capabilities when I wanted to know how to send a struct over MPI while doing a school project. (I wanted to do things "The Right Way" (TM). ) However, the definition of MPI datatypes seemed a little too in depth for a simple school project so I ended up just sending the struct as a buffer which worked fine. For a project that is a little bigger, and needs to be a little more robust, I would suggest learning how to create MPI datatypes. Funny thing is, when looking up this stuff on Google now, I'm finding better resources on sending structs over MPI than when I had my Parallel class last spring, dern it!

      Google search

      --
      ...interesting if true.
  5. If last resort try human-readable text by Krellan · · Score: 3, Informative

    I take it you've aleady read section 7.5 in MPIv2. If you haven't, now's the time!

    Unfortunately, I know of no other MPI I/O implementations, other than ROMIO, that can simply be plugged into an existing MPI stack. You might want to ask around at the new project OpenMPI, a new-from-the-ground-up MPI implementation that is currently in development. I'd be curious to learn the level of MPI I/O support that they claim!

    Assuming you are stuck with a MPI stack that only supports the "native" representation, the problem you face becomes one of data representation in general. As you know, there's bajillions of different ways of storing floating-point numbers, and if you write them to disk, the files will be only valid for exactly that CPU.

    As a last resort, a brute-force solution is to write the numbers as human-readable text, and then parse them in again accoringly. It's a waste of file space, but there's no ambiguity in the datatype representation, and it is very tolerant of floating point differences between machines.

    -1.2345234523452345
    2.345634563456365e+13
    -3.2121212121e-24
    And so on.

    This shouldn't be much of a hotspot in your code, since ideally it would only be done at start, stop, and checkpoint time. Also, if you need paralellism, and don't care about wasted file space or future precision improvements, you could use a fixed-length string for each number (with much padding), thus allowing you to read your numbers random-access instead of sequential.

    Hope this helps!

    Josh

    1. Re:If last resort try human-readable text by Anonymous Coward · · Score: 0

      Open MPI is using ROMIO for MPI-IO, so that isn't going to help in terms of external32 support.

      Glad to see someone actually interested in this...perhaps we'll finally implement it!

      Rob Ross
      Argonne National Laboratory

  6. This one's easy. by jd · · Score: 4, Informative
    First, you want to use Open MPI (the latest and greatest MPI implementation) or MPICH (which is not so good, but is solid and widely used, so will be easier to work with for portable I/O packages).


    Now, we move onto the portable I/O. The vast majority of scientific software (which is, in turn, the bulk of MPI-based software) uses the Heirarchical Data Format. There are two versions worthy of mention - HDF5 and Parallel HDF. Both support MPI in operations. Compile HDF5 with MPI support, and you have something that will support platform-independent atomic and compound data types.


    Of all the options, HDF5 (from the NCSA) is the most widely used. I would say that the majority of scientific and distributed software out there that uses platform-independent typing uses HDF. So does the grid computing system Globus. The other platform-independent complex data typing libraries, CDF (from NASA) and NetCDF (from UniData), are rarely used. Indeed, the next generation of NetCDF - version 4 - will be built on top of HDF5. There's a link to the development site and the source code on Freshmeat.


    Less-widely used, but still very significant, is the Transparent Parallel I/O Environment. I am not 100% sure if this supports MPI, it's been a while since I've used it and I never put in the dependencies on Freshmeat for it.


    Depending on what is being done, PETSc may also be worth checking out. This supports MPI-based differential equations.


    Globus can use MPI for communication and then handle the I/O directly. This means you only have to write your interface for one API, not one API per type of operation. Main problem is that Globus has a fairly large footprint, so you might not want to do that unless the project is large enough to warrant that kind of sophistication.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:This one's easy. by Salis · · Score: 2, Informative

      NetCDF is used more by non-computer scientists who need to store lots of data. It has an easier-to-use API. But, HDF5 can do more useful things. When NetCDF uses HDF5 as its underlying format, it'll get the best of both worlds: good API, good data structure.

      When I started a software project about two years ago, I looked at both NetCDF and HDF5 for data formats. I chose NetCDF and have had zero problems (and it's been very easy to the software working nicely). I think using HDF would have added another 6 months of development time.

      In the end, it won't matter: they'll both be equivalent.

      --
      Favorite /. tagline: "On the eighth day, God created FORTRAN." And it was good.
    2. Re:This one's easy. by Bill+Barth · · Score: 1

      Just so you know, the vast majority of scientific software does not use HDF5. We'd love it if they did, but I can guarantee you that nothing like 75% (i.e. a vast majority) of scientific codes use HDF (4 or 5). In fact, I'd wager that it's 20% or less (whether measured in numbers or cycles).

      --
      Yes...I am a rocket scientist.
    3. Re:This one's easy. by Anonymous Coward · · Score: 0

      When netCDF starts using HDF5 as its underlying format, we're likely to take a step backwards because HDF5 data structures are more complicated than necessary for netCDF.

      The HDF5 data structures are fine for serial I/O, but all the functionality in HDF5 makes for a very difficult time of providing efficient concurrent I/O. The netCDF file format is much more amenable to concurrent I/O. Those of us interested in performance at scale are quite happy with the netCDF file format just the way it is.

      Rob Ross
      Argonne National Laboratory

    4. Re:This one's easy. by UtucXul · · Score: 1

      I would love to use hdf5 for the software I use and maintain (ZEUS-MP but it only supports Fortran 90 (or C). So people like me stuck with Fortran 77 still have to use hdf4, which is nowhere near as nice as hdf5.

    5. Re:This one's easy. by Bill+Barth · · Score: 1

      If you really want to use HDF5 with with Fortran77, just make a few wrappers for the C APIs that convert pass-by-reference to pass-by-value where appropriate. You don't have to do it for the whole HDF5 API, just the parts of it that you need.

      --
      Yes...I am a rocket scientist.
  7. I second String representation by davecrist · · Score: 3, Informative

    I was going to suggest string representations, too... I am working an MPI project that deals with passing a lot of stuff around and found that the method of structure passing in MPI caused us to have to represent the structure specifically byte-by-byte anyway, so we have just stuck with doing everything as character arrays in specific formats...

    The main benefit for us was that our message passing code became generic and we got the side effect of passing large values between machines without respect for endianess or word size.

    hope that helps,


    dave

  8. My suggestion.. by Mr2cents · · Score: 3, Funny

    Have you tried reversing the polarity?

    --
    "It's too bad that stupidity isn't painful." - Anton LaVey
    1. Re:My suggestion.. by Profane+MuthaFucka · · Score: 1

      Haven't you heard that you should never apply a Star Trek solution to a Slashdot problem?

      --
      Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
    2. Re:My suggestion.. by ZeroEpoch · · Score: 0

      This is a common problem that blows many IC's internally. Though not related here kind of funny.

    3. Re:My suggestion.. by psykocrime · · Score: 1

      Haven't you heard that you should never apply a Star Trek solution to a Slashdot problem?

      In that case, maybe he could fix his problem with a Sonic Screwdriver?

      --
      // TODO: Insert Cool Sig
  9. If you are looking for a commercial solution by Leimy · · Score: 2

    http://www.verarisoft.com/

    I work there and I worked on our MPI-IO implementation. I'm sure we'd like to find a way to help you out if you aren't against paying for the software.

    1. Re:If you are looking for a commercial solution by sushant · · Score: 1

      I am looking for all possible options including the commercial one. I tried finding out information on your site earlier also, but could not find much information. Please tell me if your MPI-Pro supports this.

  10. fans by SporkLand · · Score: 1

    you just got added to my friends list, and I just got added to your fans.

  11. BINGO! by dpilot · · Score: 1

    Filled a row on my Buzzword Bingo card in just the summary, didn't even need to follow the links.

    Thanks for the definitions, now the summary makes a little more sense.

    --
    The living have better things to do than to continue hating the dead.
  12. Open MPI by jsquyres · · Score: 2, Informative
    Greetings. I'm one of the developers from Open MPI. We currently include ROMIO (just like everyone else), but we did two important things:
    1. We properly wrapped it such that I/O requests are of type MPI_Request, not MPIO_Request. Hence, you can actually progress IO requests, generalized requests, and point-to-point requests in a single MPI_WAITALL (or MPI_TESTALL, or any of the other variants)
    2. Our MPI-2 IO support is based on a component framework -- so replacing ROMIO is not only easy, it's encouraged! We had always intended ROMIO to be a stopgap soltuon until we could implement "something better" (as yet to be defined). We would love to have someone with expertise in this area to a) help define what a "better" component interface should be (our ROMIO interface is a simple one-to-one function mapping), and b) write one or more components to implement this in a generic and/or proprietary way.
    That's a long way of saying: "E-mail me and let's talk." :-)
    1. Re:Open MPI by Anonymous Coward · · Score: 0

      When exactly are we going to see a real release? SC04? Wait...

  13. do you really need external32? by rizzy · · Score: 1

    I think his post is a little off the mark, but jd (1658) has the right idea -- if you are concerned about portable representation of your data, you might want to use a higher-level library. HDF5 and NetCDF are good choices.

    Even beter might be Parallel-NetCDF. It has all the benefits of a high-level library (portable, self-describing data representation), but it has a much simpler interface than HDF5. Unlike serial NetCDF, you'll probably see much better performance as all processes can carry out I/O collectively instead of forwarding to a master .

  14. String representation = SLOW by Anonymous Coward · · Score: 0

    Strings are fine for mashalling/unmarshalling numbers if the remote computation is very long compared to the time it takes to parse/encode the numbers. But if your remote computation is relatively short lived then you could waste 90% of your CPU resources. Binary is the way to go if you can.

  15. XDR by Minna+Kirai · · Score: 1

    Nobody has mentioned XDR yet. Although it's probably not what you'd really prefer (as NetCDF seems to be very popular with CFD projects, and there's presumably a good reason), I'll describe it for completeness. EXternal Data Represenation it sits nicely in the midpoint between pure native data and human-readable text of the numbers (including XML).

    Free libxdr code is available everyplace, although often quite ancient (some written in 1982 or so). Just run your data structs through xdr calls, write it out with the MPI native interface, and apply reverse versions of the same calls when loading. It's a lot like the htonl/ntohl functions you may be familiar with (but more elaborate, of course, because it handles marshalling and padding in addition to just endianness).

    Advantage of XDR: files are portable, but hardly any bigger than the original data in RAM (prehaps even a little smaller)

    Disadvantage of XDR: files are not human-readable, decellerating the processes of debugging or informing others how to inspect your files. Plus, it simply lacks the cool-factor of XML.

    http://www.faqs.org/rfcs/rfc1832.html

  16. Thank you all by sushant · · Score: 1

    Hello friends Thank you all for your responses. I have looked into your suggestions. As for HDF5, I agree that would be a good one, but due to some other constraints, I had to choose MPI. Now, the question is, If I can't get a portable MPI I/O implementation, what can I do to make it portable.The software I am working upon, can do it by storing the macine information in the file and then having its own converter [for serial I/O ]. I wanted to avoid all that task and was relying on MPI interface to do the job. [ That is why the external32 representation. ] I am sorry if I am posting it too late, coz I get access to net only from my office.