Assuming that it is not in your power to change the material coming to you, then you must change how you process it.
Quite frankly, the most cost effective way to deal with this problem is to hire an intern, temp or clerk. Train this person to formal very plain HTML, to your liking (or XML, or XHTML or whatever you prefer). Then use your application to apply the style you like to the HTML the temp made.
If you want to involve more programming, you could whip up a parser to validate the intern's work. But the reality of the situation here is that unless you are working on a truly overwhelming volume of documents, it will be much cheaper to use human labor than to invest the programming time to automate the process.
Assuming that it is not in your power to change the material coming to you, then you must change how you process it.
Quite frankly, the most cost effective way to deal with this problem is to hire an intern, temp or clerk. Train this person to formal very plain HTML, to your liking (or XML, or XHTML or whatever you prefer). Then use your application to apply the style you like to the HTML the temp made.
If you want to involve more programming, you could whip up a parser to validate the intern's work. But the reality of the situation here is that unless you are working on a truly overwhelming volume of documents, it will be much cheaper to use human labor than to invest the programming time to automate the process.
-jr