Help/Opinions on Parsing OFX FIles?
innerweb asks: "I am looking for help and advice on using and parsing the OFX (Open Financial Exchange) file spec using C/C++ and/or Perl. I have read the standards, downloaded the DTD (ofx version 2), and tried to parse several files from different banks. They have all failed in my normal parsers (commercial and OSS), yet they load fine in Microsoft Money. It is not so complicated that I can not hand roll my own, and I have much of it working that way as a proof, but I would rather stick with something that is standards based, as this is a standard that in my opinion ought to work with standards based tools. Am I missing something here, or is this truly a file format that is broken as a feature?"
"I know the files are malformed when they come down, as they are missing the normal XML and SGML file headers ?XML or !DOCTYPE to define the dtd to use to parse the file. I know that the document is not 'well formed' as I understand it, as most of the tags in the datafile are not closed (open tag, but no corresponding closing tag). When I fix these errors, the files seem to parse. yet, I know that from what I have seen, MS Money takes in the same raw data and parses it. Microsoft lists the OFX file format as XML in some places and SGML in others. The OFX website seems to be saying this is SGML, not XML (XML is a subset of SGML in most cases, but the way it is *used* sometimes it is not really SGML at all.)
I have been reading like mad for a few weeks on OFX format files and usage, but not getting much useful information. I have worked with SGML in the past and XML, so I am at least familiar with these *conventions*. I need to be pointed in the right direction, and or told what I am doing wrong/overlooking. I know it is probably something obvious, but somehow I am not getting it.
Thanks in advance for any help that you can throw my way."
I have been reading like mad for a few weeks on OFX format files and usage, but not getting much useful information. I have worked with SGML in the past and XML, so I am at least familiar with these *conventions*. I need to be pointed in the right direction, and or told what I am doing wrong/overlooking. I know it is probably something obvious, but somehow I am not getting it.
Thanks in advance for any help that you can throw my way."
Fix the XML, then parse. Microsoft's parser is probably broken in a way that it doesn't look for closing tags (think, regular expression matching for the open tag, then extracting data until it hits the next tag). It's not beyond Microsoft to break their own software just to make it difficult for others to use the format, but it does make it harder on us developers to keep going.
So, I'd try finding a way to rebuild the XML file before parsing it, but only after detecting if it needs it.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush