Fulfilling the Promise of XML-based Office Suites?
brentlaminack asks: "Almost a year ago Tim Bray of XML fame
said 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.' Now that
MS has dropped the ball on the XML Office front, and
StarOffice has fulfilled its XML promise, where are all those 'wonderful new things?' Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML? Could this be an opportunity for Free/Open/Libre software to leapfrog MS Office in real productivity as XML proponents have promised all along?" What kinds of new and wonderful things can you come up with?
This is just a return to part of what made Unix so powerful in the first place: text formats that can be manipulated by the whole suite of command line tools. "Those who don't understand Unix are doomed to re-invent it, poorly" (Henry Spencer).
Back in the 70s we used nroff/troff for document formatting, producing in some cases professional-quality camera-ready books...but the source code was easily fed to spell checkers, formatting-command-strippers, sort, wc, etc etc etc.
XML is ok...not bad as a meta-format...but it's not some kind of new magic; it's just more of the same as what we always used to do.
The great step forward is moving away from the crud that happened in the middle: proprietary underdocumented binary formats that couldn't be fed to filter pipelines.
In this case, moving backwards is progress. But expecting something amazing to be invented is a bit much; it was already invented a long time ago.
P.S. pet peeve...people credit Knuth (admittedly an amazing guy for the Art of Computer Programming) for reinventing typesetting with TeX. Now, TeX is nicer than nroff/troff in multiple ways, but it's worse in some others (TeX is not set up for command line filters!), and in any case is only an incremental improvement, not a revolution over the older Unix tools. Credit is not properly being given.
Professional Wild-Eyed Visionary
I sure would like a apache module that can CSS and display native open and star office documents.
Got Code?
I created a PHP script a few months ago that allowed a client to upload StarOffice templates for company documents. Then the the script automatically generate documents by pulling data from a database and inserting it into the StarOffice document.
Was really easy, StarOffice documents are zipped files that contain the XML files. I just unzip'ed the file, inserted the appropriate data into the content.xml file and zipped it back up.
I was absolutely amazed by how easy the StarOffice files were to work with. I'm really excited about the possibilities that are in store for us, especially ones that are better than my little hack.
Brandon Petersen
The biggest dream that the financial world has ever had with an XML concept has been the concept of standardised financial reports.
Imagine a world where any finacial (excel based or otherwise) report from any public company can be compared with any other company report and we can all be sure of how the figures were calculated and what they mean.
AND they are fully comparable. And fully importable into any financial package. No longer is any one company dependant on one financial package. Come to think of it there is no way the vendors of such products will ever allow this to happen!!!
http://www.xbrl.org/
jech
If there was a way to render out the open office/star office documents on the command line it would explode in the reporting area. Being able to have the end user making a really nice template and have a perl script fill it then pass it off to a pdf or printer is key.
My team & I just got done building some billing software for one of our customers, and OpenOffice.org's XML based documents turned out to be perfect for generating reports. Our customer is able to open up the document and change the formatting of any report at will, and then we have some Ruby code on the backend that parses the XML document, fills in all the real data from the database and then uses the CLI interface to OpenOffice to render the document as postscript. It was a quick easy way to get powerful report generation with a format that non-technical people could edit that required just a little bit of glue code on the backend, and it's the XML format that made it all possible.
I bring it up because my organization paid Crystal reports $10,000 to be able to do this. If I could have written a little perl script that connects to the database and emits an OpenOffice doc, then I could have saved the organization ten thousand dollars, and saved myself a world of pain. (The only thing more evil than Crystal Reports is crystal meth.)
You might be wondering why I wouldn't just use HTML and some library that automatically creates chart PNG images -- the reason is we have to email the report to our board members because they're demanding like that. So we use Crystal to generate pretty PDFs with all the charts. We also let the board members log into our system to generate their own reports via the web, which they can then email to the group.
So having an XML-based document format for this would be wonderful, especially if OpenOffice would provide a command-line utility for converting from OO format to PDF.
Wow, a lucrative publishing contract! I don't have to be evil anymore. --Meteor
At my company, once a failed startup with new life under the wings of a huge corporate parent, we have been using a homebrewed Web publishing system that takes Word 2000 or XP documents, saves them in RTF format, then uses a utility created by Majix to transform the document to XML. From there we use perl, and some XSL to get the document into XHTML combined with some JSP to produce documents that we deploy on our production env. The good part: the system was entirely free of license fees (other than office and Windows of course). The bad: it was a pain in the behind to get all the parts together.
The steps to produce valid XML from Word are the biggest hack I have ever been a part of as an engineer. We had to write a custom VB DLL we run inside (what else) an IIS server which takes the documents uploaded by authors, then saves the documents as RTF. Control is then handed over to Tomcat, which takes the RTF and uses some custom classes that make Majix a server to transform the documents into XML. All in all we had to use VB, VBA, Java, JSP; two separate server configurations (IIS and Tomcat) and a bunch of really ugly glue to stich all the parts together.
I for one, and I am sure I speak for my entire team, would love a solution which saves us this ugly cludge.
XML does make it extremely easy to create documents on the fly, whether a plain old document or a slideshow presentation, all it needs is some template XML, original text, and some programming language to put it together.
I wrote a song lyric storage system using PHP and MySQL, and I had the idea to have it be able to be put onto a slideshow to teach it to a group of people (or whatever). With the XML format provided by OpenOffice.org, I was able to quickly put it together and show it off, impressing quite a few people in the process. Of course, those people think Word/PowerPoint run the world, and the file format is all but a mystery to them. Hence having something generated on the fly via a webpage has its cool factor, and not to mention it was a good chance to introduce this free word processing suite to them. Also a good chance to tell them that if I were to rely on ASP/PowerPoint it would have costed much, much more.
Open document format is the way to go in the future, because it definitely allows interoperability.
Please direct all bug reports to
I helped spec out a document management metadata database 18 months ago for an engineering firm that wanted to catalog its files. They started out wanting just to categorize their CAD drawings, then decided to include all types of project files.
Our solution was a tcl front end that forced the entry of a minimal amount of metadata *during file creation,* to be picked from preset categories and subcategories. We also provided for free text entry but that was to be used only after the other fields.
The points are
a) The general metadata categories were known; the engineering tasks weren't new.
b) No one is going to go back after the fact and enter the metadata. You have to integrate its entry into the new file work procedure.
c) It's got to be as easy as file/new in a GUI.
d) Its utility has got to be very very apparent when juxtaposed with a subdirectory / filename scheme.
Not me but I am writing C# apps that make use of Excel's XML format. I wrote about using XSLT on the Excel XMLSS format in my blog a few months ago when I had to update date values in certain columns. I also posted the XSLT stylesheet.
Disclaimer: I work on the XML team at Microsoft but not directly with Microsoft Office.
Well I don't know about Free/Open/Libre or XML development for Office... but I do know about the proprietary APIs Microsoft distributes for Office.
If you wanna give them a try sometime, assuming you got Windows, VB5+, and Office installed... just add Office to your references (try Microsoft Office in the Project References menu) and give it a whorl. It's fairly easy to program in if you've used Office... most of the concepts that make for a good Office user translate directly into programming concepts for the Office object model.
And yet Office Automation programmers are in scarce supply.
Microsoft even offers a cert specifically for Office Automation programmers!
But I haven't seen too many well written Office applications. My speculation is that its not for lack of tools, but that its for lack of concepts. Other than the obvious reporting needs that any large organization has, are there any compelling reasons to spend an afternoon coding an office application?
I think it is this lack of compelling reasons, and not a lack of easy-to-use programming tools that causes the lack of good free open add-ins...
I am disrespectful to dirt! Can you see that I am serious?!
I agree with you SOOO much. Often times, it seems applications are written by programmers/computer geeks FOR computer geeks. I work on a workflow-based web application (It uses oracle workflow). We recently completely redid the app to do away with the Oracle-generated web pages for "notifications" (stages in the workflow) to do our own and send messages to the engine via API. Why? Our users just didn't "get" the workflow concepts and we had to design vastly more complicated UI that had pictures, etc.
and yet we met with massive resistance from the other IT groups... "Why are you doing that, workflow does that" "that's a training issue (code phrase for 'the users are stupid') and "don't you know how to say no?" and (getting to your central point) "you've dumbed it down. Your application doesn't any of the powerful search, etc, features the workflow web interface has" (never mind NO ONE used these things).
I think it was a piece from Douglas Adams who told a story of someone he knew using word who wanted all the junk removed from Word's menus that he didn't use. He showed him how to remove menu items thru customization and he ended up with just Open, Save, Bold, Italic, Print and Spell check.
DO NOT DISTURB THE SE
On the other hand, OO.o's XML format + schema will be available even to competitors and theoretically beyond the life span of OO.o. One way for OO.o to encourage users to think in a structured is through style sheets. Style sheets and document templates can save a lot of wasted time and effort. But again, what would people do with the spare productivity if formatting were done in 5 minutes, instead of spending 2 days formatting manually and re-formating manually various reports and presentations?
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Maybe because its not a closed format, hence all the open-source pdf generation programs.
Frankly, I'd rather see more PDF generation than XML. If I sit down and spend hours designing a book or report it's more important to know that it will appear as designed than that it can be converted into a mass of raw data and presented in any half-arsed way by someone so primative that they still think PowerPoint is a pretty good idea.
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"