Slashdot Mirror


Help/Opinions on Parsing OFX FIles?

innerweb asks: "I am looking for help and advice on using and parsing the OFX (Open Financial Exchange) file spec using C/C++ and/or Perl. I have read the standards, downloaded the DTD (ofx version 2), and tried to parse several files from different banks. They have all failed in my normal parsers (commercial and OSS), yet they load fine in Microsoft Money. It is not so complicated that I can not hand roll my own, and I have much of it working that way as a proof, but I would rather stick with something that is standards based, as this is a standard that in my opinion ought to work with standards based tools. Am I missing something here, or is this truly a file format that is broken as a feature?" "I know the files are malformed when they come down, as they are missing the normal XML and SGML file headers ?XML or !DOCTYPE to define the dtd to use to parse the file. I know that the document is not 'well formed' as I understand it, as most of the tags in the datafile are not closed (open tag, but no corresponding closing tag). When I fix these errors, the files seem to parse. yet, I know that from what I have seen, MS Money takes in the same raw data and parses it. Microsoft lists the OFX file format as XML in some places and SGML in others. The OFX website seems to be saying this is SGML, not XML (XML is a subset of SGML in most cases, but the way it is *used* sometimes it is not really SGML at all.)

I have been reading like mad for a few weeks on OFX format files and usage, but not getting much useful information. I have worked with SGML in the past and XML, so I am at least familiar with these *conventions*. I need to be pointed in the right direction, and or told what I am doing wrong/overlooking. I know it is probably something obvious, but somehow I am not getting it.

Thanks in advance for any help that you can throw my way."

6 of 49 comments (clear)

  1. Gnucash. by Anonymous Coward · · Score: 2, Informative

    You did check Gnucash's importers to see how they did it, Right?

    1. Re:Gnucash. by GigsVT · · Score: 3, Informative

      In addition to the other excellent replies, it's a misconception that code ever unwittingly "becomes GPL".

      If you use GPL code in your application in a way that violates the GPL, you have violated copyright law. That's it. The GPL doesn't control the remedies, the legal system does. Generally the remedy would be in the form of a cash settlement to the copyright owner of the software you violated the copyright on.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
  2. Um, the answer is in the link you posted. by joto · · Score: 5, Informative
    OFX is based upon SGML and, like XML, it is an attempt to take the best features of SGML and remove much of the associated complexity. OFX is not technically an XML application. The syntax of OFX differs from that set out for XML applications in that OFX omits end-tags

    It's SGML, not XML. Unless you insist on doing it the hard way with a real SGML parser, I can't see what's wrong with using your own hand-rolled one. As you've already recognized yourself, it shouldn't be too hard.

    Other alternatives would be to have an a preprocessor that converts it to XML, or maybe use some too-tolerant XML-parser. On the other hand, if the file format isn't XML, I can't see why it would be easier to treat it as if it were.

    Am I missing something here, or is this truly a file format that is broken as a feature?

    Mu. Yes, you are missing the distinction between XML and SGML. No, it's not broken as a feature, it just predates XML.

  3. check out libofx by UncleBoy · · Score: 2, Informative

    Get someone to show you how to use google.com while you're at it.

  4. LibOFX by Noksagt · · Score: 3, Informative

    They use LibOFX, available under the GPL

  5. Re:Is it an SGML application? by innerweb · · Score: 3, Informative
    I will be the first to admit it has been a while since I worked with SGML, but IIRC, in the DTD for an SGML doc, you mark optional tags with an O, so that it would look like this:
    <!ELEMENT elemname - O (#PCDATA) >
    where the - means the opening tag is required, and the O means the closing tag is not required.

    Whether or not I think I remember it, OFX has sent me back to books I have had in boxes for almost a decade now. Normally when I pull old books out like that, they are picture albums for the family, not old programming and data manuals.

    InnerWeb

    --
    Freud might say that Intelligent Design is religion's ID.