Fulfilling the Promise of XML-based Office Suites?
brentlaminack asks: "Almost a year ago Tim Bray of XML fame
said 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.' Now that
MS has dropped the ball on the XML Office front, and
StarOffice has fulfilled its XML promise, where are all those 'wonderful new things?' Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML? Could this be an opportunity for Free/Open/Libre software to leapfrog MS Office in real productivity as XML proponents have promised all along?" What kinds of new and wonderful things can you come up with?
Well, I'm taking a break right now from generating new Excel graphs by copying old ones and changing the source data, which isn't so bad, and those fucking error bars, which is. Oh, and the scatter plot points are superimposed so you can't click on the back ones.
So if I could do a find&replace on a flat file, I'd have been done an hour ago.
Other than that, no, I can't imagine either. VBA exists now and it's not like we're all flying around with wings and harps.
What I'm listening to now on Pandora...
The next major release of KOffice is supposed to adobt the OO file formats as their own standard.
Learning HOW to think is more important than learning WHAT to think.
Take a look at Axkit's, OpenOffice filter.
I guess there's XML and there's XML and getting between them is not necessarily easy.
Microsoft made a big deal about the most recent versions of Office writing out XML, but that was because XML was a buzzword, sounded as if it might be more open than ".doc", and was essentially a selling point.
From what I've read, people have been underwhelmed with the XML coming out.
If only a similar set of transformations could be developed for OpenOffice to import and export the XML of the latest version of Microsoft Office. From what I understand, the schema is not documented and the formatting and rendering rules for documents are still kept a private affair, just as it has been for .doc files.
You're still locked-in, dude!
At this point, people should realize /. articles are mostly fretards talking out their ass. I too read this article, thinking: wft? As I am writing this comment, I'm looking at my (beta) Word 2003 file save dialog and an example XML doc I just made. It round-trips all formatting and junk in the XML format. It has a "save data only" checkbox in the saveas dialog, and can support xsl transforms (you supply the xsl) on export. If I cared, I think I could make it export OpenOffice format pretty easily. The high-fidelity XML file has a lot of junk, but it's all XML.
Once you learn how to do it, it is definitely possible from, say, a Java program to connect to a running OOo (OpenOffice.org==OOo) make it open a document and re-save it in Word format. You can even make OOo do this without flashing anything on the screen.
There is a definite learning curve. You need to learn Uno.
IMHO, despite the learning, this would be way easier than trying to extract the parts of code you need from OOo and building a "converter" program. Maybe I say this because I have spent the time learning Uno and can now program OOo functionality from multiple languages, and how to integrate it into a web server like this seems obvious to me.
I have personally programmed OOo to do things from: OOo-Basic, Java, Python and MS Visual FoxPro. I know from postings from others that it is most definitely possible to use Delphi and VB.
Just as an example of what can be done, I built a Maze generator in java. You can run the maze generator on a different computer. Even a different OS. It connects to a running OOo, and then creates a multi page drawing of complex mazes. (You can get it at www.OOoMacros.org or at www.OOoExtras.org.)
Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
I find the easiest way of getting usable XML out of Word is you use Word's save as HTML function and then running W3C TidyLib to get rid of all (most) of the M$ crap.
This leaves you with a HTML-esq document that you can feed to an XSL:T and get whatever XML you need.
I did consider using OO to open the Word document and to save them as XML however I had trouble with its API (I also had trouble with automating Word but here I had plenty of biter experience to draw on.).