Help/Opinions on Parsing OFX FIles?
innerweb asks: "I am looking for help and advice on using and parsing the OFX (Open Financial Exchange) file spec using C/C++ and/or Perl. I have read the standards, downloaded the DTD (ofx version 2), and tried to parse several files from different banks. They have all failed in my normal parsers (commercial and OSS), yet they load fine in Microsoft Money. It is not so complicated that I can not hand roll my own, and I have much of it working that way as a proof, but I would rather stick with something that is standards based, as this is a standard that in my opinion ought to work with standards based tools. Am I missing something here, or is this truly a file format that is broken as a feature?"
"I know the files are malformed when they come down, as they are missing the normal XML and SGML file headers ?XML or !DOCTYPE to define the dtd to use to parse the file. I know that the document is not 'well formed' as I understand it, as most of the tags in the datafile are not closed (open tag, but no corresponding closing tag). When I fix these errors, the files seem to parse. yet, I know that from what I have seen, MS Money takes in the same raw data and parses it. Microsoft lists the OFX file format as XML in some places and SGML in others. The OFX website seems to be saying this is SGML, not XML (XML is a subset of SGML in most cases, but the way it is *used* sometimes it is not really SGML at all.)
I have been reading like mad for a few weeks on OFX format files and usage, but not getting much useful information. I have worked with SGML in the past and XML, so I am at least familiar with these *conventions*. I need to be pointed in the right direction, and or told what I am doing wrong/overlooking. I know it is probably something obvious, but somehow I am not getting it.
Thanks in advance for any help that you can throw my way."
I have been reading like mad for a few weeks on OFX format files and usage, but not getting much useful information. I have worked with SGML in the past and XML, so I am at least familiar with these *conventions*. I need to be pointed in the right direction, and or told what I am doing wrong/overlooking. I know it is probably something obvious, but somehow I am not getting it.
Thanks in advance for any help that you can throw my way."
You did check Gnucash's importers to see how they did it, Right?
It's SGML, not XML. Unless you insist on doing it the hard way with a real SGML parser, I can't see what's wrong with using your own hand-rolled one. As you've already recognized yourself, it shouldn't be too hard.
Other alternatives would be to have an a preprocessor that converts it to XML, or maybe use some too-tolerant XML-parser. On the other hand, if the file format isn't XML, I can't see why it would be easier to treat it as if it were.
Am I missing something here, or is this truly a file format that is broken as a feature?
Mu. Yes, you are missing the distinction between XML and SGML. No, it's not broken as a feature, it just predates XML.
Get someone to show you how to use google.com while you're at it.
They use LibOFX, available under the GPL
where the - means the opening tag is required, and the O means the closing tag is not required.
Whether or not I think I remember it, OFX has sent me back to books I have had in boxes for almost a decade now. Normally when I pull old books out like that, they are picture albums for the family, not old programming and data manuals.
InnerWeb
Freud might say that Intelligent Design is religion's ID.