Slashdot Mirror


Fulfilling the Promise of XML-based Office Suites?

brentlaminack asks: "Almost a year ago Tim Bray of XML fame said 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.' Now that MS has dropped the ball on the XML Office front, and StarOffice has fulfilled its XML promise, where are all those 'wonderful new things?' Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML? Could this be an opportunity for Free/Open/Libre software to leapfrog MS Office in real productivity as XML proponents have promised all along?" What kinds of new and wonderful things can you come up with?

32 of 432 comments (clear)

  1. XML... by ewombatnet · · Score: 5, Insightful

    I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it

    1. Re:XML... by Trolling4Dollars · · Score: 5, Insightful

      There are two ways to look at this. ONe way is to make the assumption that the problem lies with the user and the other is that the problem lies within the computer. Even though computers have gotten easier to use, they aren't really easy at all for the average user. The barriers to ease of use are plenty:

      -Feature overload (many features that users will never use)
      -PCs are incredibly complex because they are so flexible and can do so many things.
      -User interfaces are pretty poorly designed and don't seem to be getting any better.
      -Humans don't "interface" well

      If the mode of interacting with computers was like interacting with another person, they would be considerably easier to use. I often joke with my wife that *I* and the ultimate user interface. If you think about it, the best interface for the average user would be a very human-like avatar. Yes, this interface would suck for someone like me (a real computer user), but that's not who it would be targetted at.

      Getting back to the XML subject, these same problems are what keep it from gaining any ground with the average user. The average user still doesn't "get" electronic documents. That's why they always resort to printing them out on paper. To be sure, there are times when a document SHOULD be printed on paper, but that's only really about 20% of the time. The other 80% a document is much better to keep in electronic format. With XML, so much the better. But if the average user has trouble understanding even a basic text file, the ultra-documents that XML can lead to will be completely bewildering. How do we solve this? I've argued this before over and over again: we need new input devices and now I will extend that to new output devices. If we had more variety with the output device, XML documents would be the next "great thing". The XML document has arrived too soon. If we had electronic paper that XML docs could be loaded into, there would be a revolution. It will happen, not just yet. And when it does happen, look for some big corporation to be backing something that looks a lot like XML, but it will have a different more friendly name and will be claimed as innovative.

    2. Re:XML... by chiasmus1 · · Score: 5, Insightful
      The important thing about XML is not the end users. As an end user I could care less about the formation of the document as long as I knew I would always have an application that could read the document.

      With XML documents, if the file format is well known, there will be filters for it. Major Office Suites will support well known file formats. If the file format is not as well known, but it is simple XML, there are high chances that smaller applications will also have filters for it.

      I like to write web software and I was discouraged when I discovered that I could not find a Perl library to create OpenOffice.org files, so I created one of my own. Granted it is not the best library, and is probably full of bugs, but it was easy to create and the research was painless. It does the job I made it for and I use it.

      Compare that to the time when at work my boss asked me to take a Pick Basic binary datebase file and extract the data from it. I had to play around a while to figure out which bytes meant what and how to get the information out.

      XML not only makes creation easy, but makes reverse engineering trivial. XML is not for the end users, it is for the developers why do not have the time to sit and read the 500 pages of the file format spec.

  2. standardization by Unregistered · · Score: 4, Insightful

    one missing thing is standardization accross OSS. When abiword (and koffice?) support oo files, then we might see more of this. Also, i personally can't think of a use offhand that oo.org can't already do. Once people begin to find uses for this, then more people will actually try to write scripts to take advantage of XML.

    1. Re:standardization by chill · · Score: 5, Informative

      The next major release of KOffice is supposed to adobt the OO file formats as their own standard.

      --
      Learning HOW to think is more important than learning WHAT to think.
  3. anything that will translate manager speak? by hattig · · Score: 5, Funny

    Maybe a script to de-buzzword meaningless missives from above?

    E.g., "We wish to engender a positive business atmosphere" => "Free beer at lunchtime"

    1. Re:anything that will translate manager speak? by bigdavex · · Score: 4, Funny

      #/usr/bin/perl
      print "We're doing more layoffs and getting more bonuses.";

      --
      -Dave
  4. Well... by Otter · · Score: 4, Informative
    ...when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.

    Well, I'm taking a break right now from generating new Excel graphs by copying old ones and changing the source data, which isn't so bad, and those fucking error bars, which is. Oh, and the scatter plot points are superimposed so you can't click on the back ones.

    So if I could do a find&replace on a flat file, I'd have been done an hour ago.

    Other than that, no, I can't imagine either. VBA exists now and it's not like we're all flying around with wings and harps.

    1. Re:Well... by BlueGecko · · Score: 4, Funny
      VBA exists now and it's not like we're all flying around with wings and harps.
      True, but after extensive work with VBA, I grew these sharp red horns and a big red tail with a spike on the end...
    2. Re:Well... by croddy · · Score: 4, Insightful
      MS won't stand for an XML file format -- it's human-readable. the last thing MS wants is for their file format to be easily convertible and transformable. it's a pity, because switching Office files to XML would quickly make them insanely useful.

      imagine you write an outline in word. file -> export as -> presentation... or in access you select some rows and export to a spreadsheet. this is where staroffice stands to beat them.

      but MS Office derives its profitability from incompatibility -- you have to use their products to get full use of their file format. so using MS Office will necessarily sacrifice this functionality.

  5. Not a big innovation by Doug+Merritt · · Score: 5, Interesting
    documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented

    This is just a return to part of what made Unix so powerful in the first place: text formats that can be manipulated by the whole suite of command line tools. "Those who don't understand Unix are doomed to re-invent it, poorly" (Henry Spencer).

    Back in the 70s we used nroff/troff for document formatting, producing in some cases professional-quality camera-ready books...but the source code was easily fed to spell checkers, formatting-command-strippers, sort, wc, etc etc etc.

    XML is ok...not bad as a meta-format...but it's not some kind of new magic; it's just more of the same as what we always used to do.

    The great step forward is moving away from the crud that happened in the middle: proprietary underdocumented binary formats that couldn't be fed to filter pipelines.

    In this case, moving backwards is progress. But expecting something amazing to be invented is a bit much; it was already invented a long time ago.

    P.S. pet peeve...people credit Knuth (admittedly an amazing guy for the Art of Computer Programming) for reinventing typesetting with TeX. Now, TeX is nicer than nroff/troff in multiple ways, but it's worse in some others (TeX is not set up for command line filters!), and in any case is only an incremental improvement, not a revolution over the older Unix tools. Credit is not properly being given.

    --
    Professional Wild-Eyed Visionary
    1. Re:Not a big innovation by kfg · · Score: 5, Insightful

      The great man himself gave you a clue to great wisdom. Not everyone has that chance.

      And you blew it, Grasshopper.

      The lesson was, "The right tool for the job."

      Sometimes the right tool, despite all the modern technolgical advances, is still a rock.

      KFG

    2. Re:Not a big innovation by sharkey · · Score: 4, Funny
      Sometimes the right tool, despite all the modern technolgical advances, is still a rock.

      When all you have is a rock, everything looks like Bill Gates' head.

      --

      --
      "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
  6. Apache module by codepunk · · Score: 5, Interesting

    I sure would like a apache module that can CSS and display native open and star office documents.

    --


    Got Code?
  7. PHP Script that generated reports by brandonp · · Score: 5, Interesting

    I created a PHP script a few months ago that allowed a client to upload StarOffice templates for company documents. Then the the script automatically generate documents by pulling data from a database and inserting it into the StarOffice document.

    Was really easy, StarOffice documents are zipped files that contain the XML files. I just unzip'ed the file, inserted the appropriate data into the content.xml file and zipped it back up.

    I was absolutely amazed by how easy the StarOffice files were to work with. I'm really excited about the possibilities that are in store for us, especially ones that are better than my little hack.

    Brandon Petersen

  8. Yes, Standardised Financial Reports by jechonias · · Score: 5, Interesting

    The biggest dream that the financial world has ever had with an XML concept has been the concept of standardised financial reports.

    Imagine a world where any finacial (excel based or otherwise) report from any public company can be compared with any other company report and we can all be sure of how the figures were calculated and what they mean.

    AND they are fully comparable. And fully importable into any financial package. No longer is any one company dependant on one financial package. Come to think of it there is no way the vendors of such products will ever allow this to happen!!!

    http://www.xbrl.org/

    jech

  9. Command line rendering by pirodude · · Score: 4, Interesting

    If there was a way to render out the open office/star office documents on the command line it would explode in the reporting area. Being able to have the end user making a really nice template and have a perl script fill it then pass it off to a pdf or printer is key.

  10. Reporting is a great use of OOo's XML format. by Gravatite · · Score: 5, Interesting

    My team & I just got done building some billing software for one of our customers, and OpenOffice.org's XML based documents turned out to be perfect for generating reports. Our customer is able to open up the document and change the formatting of any report at will, and then we have some Ruby code on the backend that parses the XML document, fills in all the real data from the database and then uses the CLI interface to OpenOffice to render the document as postscript. It was a quick easy way to get powerful report generation with a format that non-technical people could edit that required just a little bit of glue code on the backend, and it's the XML format that made it all possible.

  11. Automatic Generation of Pretty Reports by pjack76 · · Score: 5, Interesting
    You know, with charts and graphs and your corporate logo on them. The charts and graphs are populated from a database somewhere. Suitable for your board report.

    I bring it up because my organization paid Crystal reports $10,000 to be able to do this. If I could have written a little perl script that connects to the database and emits an OpenOffice doc, then I could have saved the organization ten thousand dollars, and saved myself a world of pain. (The only thing more evil than Crystal Reports is crystal meth.)

    You might be wondering why I wouldn't just use HTML and some library that automatically creates chart PNG images -- the reason is we have to email the report to our board members because they're demanding like that. So we use Crystal to generate pretty PDFs with all the charts. We also let the board members log into our system to generate their own reports via the web, which they can then email to the group.

    So having an XML-based document format for this would be wonderful, especially if OpenOffice would provide a command-line utility for converting from OO format to PDF.

    --

    Wow, a lucrative publishing contract! I don't have to be evil anymore. --Meteor

    1. Re:Automatic Generation of Pretty Reports by merlin_jim · · Score: 4, Funny

      The only thing more evil than Crystal Reports is crystal meth

      Funny you should mention that... I'm at work right now (10:00 PM local time; been here since 9:00 AM) for that very reason! And I'll give you a hint, I've never touched crystal meth

      --
      I am disrespectful to dirt! Can you see that I am serious?!
  12. Word to RTF to XML to HTML by PeterHammer · · Score: 5, Interesting

    At my company, once a failed startup with new life under the wings of a huge corporate parent, we have been using a homebrewed Web publishing system that takes Word 2000 or XP documents, saves them in RTF format, then uses a utility created by Majix to transform the document to XML. From there we use perl, and some XSL to get the document into XHTML combined with some JSP to produce documents that we deploy on our production env. The good part: the system was entirely free of license fees (other than office and Windows of course). The bad: it was a pain in the behind to get all the parts together.

    The steps to produce valid XML from Word are the biggest hack I have ever been a part of as an engineer. We had to write a custom VB DLL we run inside (what else) an IIS server which takes the documents uploaded by authors, then saves the documents as RTF. Control is then handed over to Tomcat, which takes the RTF and uses some custom classes that make Majix a server to transform the documents into XML. All in all we had to use VB, VBA, Java, JSP; two separate server configurations (IIS and Tomcat) and a bunch of really ugly glue to stich all the parts together.

    I for one, and I am sure I speak for my entire team, would love a solution which saves us this ugly cludge.

  13. Ease of XML Document Formats by DJ+Rubbie · · Score: 4, Interesting

    XML does make it extremely easy to create documents on the fly, whether a plain old document or a slideshow presentation, all it needs is some template XML, original text, and some programming language to put it together.

    I wrote a song lyric storage system using PHP and MySQL, and I had the idea to have it be able to be put onto a slideshow to teach it to a group of people (or whatever). With the XML format provided by OpenOffice.org, I was able to quickly put it together and show it off, impressing quite a few people in the process. Of course, those people think Word/PowerPoint run the world, and the file format is all but a mystery to them. Hence having something generated on the fly via a webpage has its cool factor, and not to mention it was a good chance to introduce this free word processing suite to them. Also a good chance to tell them that if I were to rely on ASP/PowerPoint it would have costed much, much more.

    Open document format is the way to go in the future, because it definitely allows interoperability.

    --
    Please direct all bug reports to /dev/null
  14. Agreed.. by msimm · · Score: 4, Insightful

    And before anyone try's to point out the cost/open source issue: In business that doesn't mean squat. Trying to sell something for free is the wrong attitude, businesses don't want to rely on good will. Kudo to all the dual licensed project out there that have learned how to play both sides of the fence.

    --
    Quack, quack.
  15. OMFG someone with sense by DrSkwid · · Score: 4, Funny

    Ron Minnich at lanl described this one also (though we weren't talking about XML)

    -----
    You want to make your way in the CS field? Simple. Calculate rough time of
    amnesia (hell, 10 years is plenty, probably 10 months is plenty), go to
    the dusty archives, dig out something fun, and go for it.

    It's worked for many people, and it can work for you.
    ----
    if you must

    So get ready for all the gee whizzery now the new kids have "found" plain text.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  16. You must manage, force use of limited metadata by g8orade · · Score: 4, Interesting

    I helped spec out a document management metadata database 18 months ago for an engineering firm that wanted to catalog its files. They started out wanting just to categorize their CAD drawings, then decided to include all types of project files.

    Our solution was a tcl front end that forced the entry of a minimal amount of metadata *during file creation,* to be picked from preset categories and subcategories. We also provided for free text entry but that was to be used only after the other fields.

    The points are
    a) The general metadata categories were known; the engineering tasks weren't new.
    b) No one is going to go back after the fact and enter the metadata. You have to integrate its entry into the new file work procedure.
    c) It's got to be as easy as file/new in a GUI.
    d) Its utility has got to be very very apparent when juxtaposed with a subdirectory / filename scheme.

  17. Microsoft Dropped the Ball? by Carnage4Life · · Score: 5, Interesting
    Now that MS has dropped the ball on the XML Office front,
    I'm curious, how did Microsoft drop the ball with respect to other XML-based Office suites? The linked article points to a report that the ability to import user-defined XML formats into a form that can be understood by the primary Office products is an Enterprise feature. However loading or saving documents using a default XML format is in the base versions of Office and in fact was in the last version of Office given that Excel had a documented XML Spreadsheet Format.

    Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML?
    Not me but I am writing C# apps that make use of Excel's XML format. I wrote about using XSLT on the Excel XMLSS format in my blog a few months ago when I had to update date values in certain columns. I also posted the XSLT stylesheet.

    Disclaimer: I work on the XML team at Microsoft but not directly with Microsoft Office.
    1. Re:Microsoft Dropped the Ball? by YouAreATool · · Score: 4, Informative

      At this point, people should realize /. articles are mostly fretards talking out their ass. I too read this article, thinking: wft? As I am writing this comment, I'm looking at my (beta) Word 2003 file save dialog and an example XML doc I just made. It round-trips all formatting and junk in the XML format. It has a "save data only" checkbox in the saveas dialog, and can support xsl transforms (you supply the xsl) on export. If I cared, I think I could make it export OpenOffice format pretty easily. The high-fidelity XML file has a lot of junk, but it's all XML.

  18. Yup, peeople are by amblin · · Score: 4, Informative

    Take a look at Axkit's, OpenOffice filter.

  19. Re:Two Things... by TummyX · · Score: 4, Insightful

    What are you talking about?

    CSV? LOL.

    Does CSV have a transformation language (XSLT)?
    Does CSV have an easy to use parser & object model (SAX, DOM)?
    Does CSV have an in document addressing language (XPATH)?
    Does CSV have a standard way of supporting hierarchical data?

    Just cause you think it's overhyped doesn't mean it isn't worth every bit of that hype. I've been using XML since 1998. I shudder when I think about the pre-XML days.

  20. Putting the cart before the horse... by EricTheGreen · · Score: 4, Insightful

    Bemoaning the lack of XML-based magic goodness in corporate document processing assumes that a corporate document base exists which a) follows predictable content and structural patterns to allow automated processing, and b) is structured and rigorous enough to do meaningful processing against, an assumption which frankly doesn't hold water in too many places.

    For most of the office document world (at least the world I work with regularly), most documents are unique in both structure and content and I as a programmer can make only the most basic of assumptions regarding what a program can expect to find within the content bundle. Sure the XML gives me a nice set of rules to rely on for breaking the document into parts and reading it in. But it doesn't do a whole lot to ensure that, say, two spreadsheets follow similar content assignment conventions. Most places can't get two managers to agree on the form and structure of a basic memo, or even get the same individual to repeatedly use a consistent structure in all his/her business communications.

    Most organizations need to work on a few things before this type of processing will be useful in the large. Two particular areas would be: a) consistent use of metadata within document definitions to facilitate querying and filtering, and b) more sophisticated use of template functionality beyond just ensuring every page has the same graphic in it's header.

  21. The two stages we haven't reached yet by Anonymous+Brave+Guy · · Score: 4, Insightful

    The parent post is right on the money here.

    Right now, I don't want flashy, XML-driven power apps. I'd settle for a word processor where I can produce my document with minimal fuss and good quality results. Apparently the vast majority of other word processor users agree with me, because I don't see any big uptake of ueber-powerful macro systems, manipulation tools based on super-flexible file formats, or any of the other much-promised stuff.

    The simple truth is that usability is nowhere near the point where these facilities add value yet. Before you can develop powerful extra tools, you have to get the basics right:

    • a clean but powerful UI (no, this is not impossible)
    • good basic navigation and editing capabilities
    • good basic structure and formatting controls
    • good basic tools (spell check, word count and mail merge would probably do for a very large subset of WP users).

    These are essential for a serious document preparation system, yet no currently popular WP, commercial or free, even comes close to doing them all well. The serious people universally use either DTP packages or typesetting systems, and there's a reason for that.

    When we reach the stage where a word processor can do these things well, without the user ignoring stylesheets because they're too awkward, having to look up the help every time they do a mail merge or finding that limitations in the document structure support prevent you doing what you want to at all in a non-trival document, then we'll be getting to the stage where more powerful "workflow" tools might be of real benefit.

    The second stage, of course, is developing the tools to create those workflow tools, and making them sufficiently usable themselves that people actually take advantage of the advanced capabilities. Right now, we have some awesome-sounding automation tools available, but who really uses them? Not many people, IME. Much of the problem is that the automation tools themselves are, like the applications within which they live, simply too much effort to bother with.

    Give me a usable basic WP and usable tools to automate it (XML-based or otherwise) and I will move the document creation world. Until then, don't call us...

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:The two stages we haven't reached yet by Pfhreakaz0id · · Score: 4, Interesting

      I agree with you SOOO much. Often times, it seems applications are written by programmers/computer geeks FOR computer geeks. I work on a workflow-based web application (It uses oracle workflow). We recently completely redid the app to do away with the Oracle-generated web pages for "notifications" (stages in the workflow) to do our own and send messages to the engine via API. Why? Our users just didn't "get" the workflow concepts and we had to design vastly more complicated UI that had pictures, etc.

      and yet we met with massive resistance from the other IT groups... "Why are you doing that, workflow does that" "that's a training issue (code phrase for 'the users are stupid') and "don't you know how to say no?" and (getting to your central point) "you've dumbed it down. Your application doesn't any of the powerful search, etc, features the workflow web interface has" (never mind NO ONE used these things).

      I think it was a piece from Douglas Adams who told a story of someone he knew using word who wanted all the junk removed from Word's menus that he didn't use. He showed him how to remove menu items thru customization and he ended up with just Open, Save, Bold, Italic, Print and Spell check.