Slashdot Mirror


Fulfilling the Promise of XML-based Office Suites?

brentlaminack asks: "Almost a year ago Tim Bray of XML fame said 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.' Now that MS has dropped the ball on the XML Office front, and StarOffice has fulfilled its XML promise, where are all those 'wonderful new things?' Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML? Could this be an opportunity for Free/Open/Libre software to leapfrog MS Office in real productivity as XML proponents have promised all along?" What kinds of new and wonderful things can you come up with?

15 of 432 comments (clear)

  1. XML... by ewombatnet · · Score: 5, Insightful

    I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it

    1. Re:XML... by Trolling4Dollars · · Score: 5, Insightful

      There are two ways to look at this. ONe way is to make the assumption that the problem lies with the user and the other is that the problem lies within the computer. Even though computers have gotten easier to use, they aren't really easy at all for the average user. The barriers to ease of use are plenty:

      -Feature overload (many features that users will never use)
      -PCs are incredibly complex because they are so flexible and can do so many things.
      -User interfaces are pretty poorly designed and don't seem to be getting any better.
      -Humans don't "interface" well

      If the mode of interacting with computers was like interacting with another person, they would be considerably easier to use. I often joke with my wife that *I* and the ultimate user interface. If you think about it, the best interface for the average user would be a very human-like avatar. Yes, this interface would suck for someone like me (a real computer user), but that's not who it would be targetted at.

      Getting back to the XML subject, these same problems are what keep it from gaining any ground with the average user. The average user still doesn't "get" electronic documents. That's why they always resort to printing them out on paper. To be sure, there are times when a document SHOULD be printed on paper, but that's only really about 20% of the time. The other 80% a document is much better to keep in electronic format. With XML, so much the better. But if the average user has trouble understanding even a basic text file, the ultra-documents that XML can lead to will be completely bewildering. How do we solve this? I've argued this before over and over again: we need new input devices and now I will extend that to new output devices. If we had more variety with the output device, XML documents would be the next "great thing". The XML document has arrived too soon. If we had electronic paper that XML docs could be loaded into, there would be a revolution. It will happen, not just yet. And when it does happen, look for some big corporation to be backing something that looks a lot like XML, but it will have a different more friendly name and will be claimed as innovative.

    2. Re:XML... by chiasmus1 · · Score: 5, Insightful
      The important thing about XML is not the end users. As an end user I could care less about the formation of the document as long as I knew I would always have an application that could read the document.

      With XML documents, if the file format is well known, there will be filters for it. Major Office Suites will support well known file formats. If the file format is not as well known, but it is simple XML, there are high chances that smaller applications will also have filters for it.

      I like to write web software and I was discouraged when I discovered that I could not find a Perl library to create OpenOffice.org files, so I created one of my own. Granted it is not the best library, and is probably full of bugs, but it was easy to create and the research was painless. It does the job I made it for and I use it.

      Compare that to the time when at work my boss asked me to take a Pick Basic binary datebase file and extract the data from it. I had to play around a while to figure out which bytes meant what and how to get the information out.

      XML not only makes creation easy, but makes reverse engineering trivial. XML is not for the end users, it is for the developers why do not have the time to sit and read the 500 pages of the file format spec.

    3. Re:XML... by rgigger · · Score: 3, Insightful

      I just had a thought. What I really want to do is generate some sort of office documents on the web. That way I can make word processing documents, spreadsheets, charts, graphs etc that my clients can download. Now I would love to just generate Open Office XML files and have them use those. The problem with that is that none of my clients use Open Office and they are not going to for the foreseeable future.

      Here however is my super cool idea that I just came up with:

      An open office server. If open office can export to MS Office Formats what's to stop me from doing the following (other than time).

      1) create my templates in open office XML format
      2) extract the parts of open office that import from the OO XML format to it's internal format, and export to MS Office format.
      3) Create a PHP extension (or maybe apache module) to expose this functionality to my web apps.
      4) Insert dynamic database driven content into my OO XML templates, convert them to MS Office format and stream them out to a client.

      Maybe not the product of an ideal world but given the fact that MS Office is both closed an ubiquitous this seems to be a great way to leverage the capabilities open office in handling XML and MS Office import/export.

  2. standardization by Unregistered · · Score: 4, Insightful

    one missing thing is standardization accross OSS. When abiword (and koffice?) support oo files, then we might see more of this. Also, i personally can't think of a use offhand that oo.org can't already do. Once people begin to find uses for this, then more people will actually try to write scripts to take advantage of XML.

  3. MS Office is required by generic-man · · Score: 3, Insightful

    XML is not a selling point for an office suite. Users expect a good user interface and an easy migration. OpenOffice is not there yet. Its help assistant spawns 1024x768 help windows to say as little as "I have automatically capitalized the first letter of your sentence." It has no integrated PIM software to unseat Microsoft Outlook. It has no easy migration path for the millions of users who open documents with useful macros and scripts. OpenOffice has no drop-in replacement for Microsoft Access-driven applications; primitive as Access is, many companies use it to develop simple database applications that would need to be recreated from scratch in another suite.

    At this point in time, there's no reason to switch from Microsoft Office to another office suite simply because this new suite uses XML. XML is best suited as a tool for the back-end developer, not an excuse to migrate to a product that has so many rough edges in its current form.

    --
    For more information, click here.
  4. Re:Not a big innovation by Anonymous Coward · · Score: 3, Insightful

    Now, TeX is nicer than nroff/troff in multiple ways, but it's worse in some others (TeX is not set up for command line filters!), and in any case is only an incremental improvement, not a revolution over the older Unix tools. Credit is not properly being given.

    I see your point. But have you tried doing mathematical formulas in groff? In (La)TeX they're a breeze (relative to just about everything else out there). Right tools for the right job I guess.

  5. Agreed.. by msimm · · Score: 4, Insightful

    And before anyone try's to point out the cost/open source issue: In business that doesn't mean squat. Trying to sell something for free is the wrong attitude, businesses don't want to rely on good will. Kudo to all the dual licensed project out there that have learned how to play both sides of the fence.

    --
    Quack, quack.
  6. Re:Well... by croddy · · Score: 4, Insightful
    MS won't stand for an XML file format -- it's human-readable. the last thing MS wants is for their file format to be easily convertible and transformable. it's a pity, because switching Office files to XML would quickly make them insanely useful.

    imagine you write an outline in word. file -> export as -> presentation... or in access you select some rows and export to a spreadsheet. this is where staroffice stands to beat them.

    but MS Office derives its profitability from incompatibility -- you have to use their products to get full use of their file format. so using MS Office will necessarily sacrifice this functionality.

  7. Re:Not a big innovation by kfg · · Score: 5, Insightful

    The great man himself gave you a clue to great wisdom. Not everyone has that chance.

    And you blew it, Grasshopper.

    The lesson was, "The right tool for the job."

    Sometimes the right tool, despite all the modern technolgical advances, is still a rock.

    KFG

  8. Re:Two Things... by TummyX · · Score: 4, Insightful

    What are you talking about?

    CSV? LOL.

    Does CSV have a transformation language (XSLT)?
    Does CSV have an easy to use parser & object model (SAX, DOM)?
    Does CSV have an in document addressing language (XPATH)?
    Does CSV have a standard way of supporting hierarchical data?

    Just cause you think it's overhyped doesn't mean it isn't worth every bit of that hype. I've been using XML since 1998. I shudder when I think about the pre-XML days.

  9. Putting the cart before the horse... by EricTheGreen · · Score: 4, Insightful

    Bemoaning the lack of XML-based magic goodness in corporate document processing assumes that a corporate document base exists which a) follows predictable content and structural patterns to allow automated processing, and b) is structured and rigorous enough to do meaningful processing against, an assumption which frankly doesn't hold water in too many places.

    For most of the office document world (at least the world I work with regularly), most documents are unique in both structure and content and I as a programmer can make only the most basic of assumptions regarding what a program can expect to find within the content bundle. Sure the XML gives me a nice set of rules to rely on for breaking the document into parts and reading it in. But it doesn't do a whole lot to ensure that, say, two spreadsheets follow similar content assignment conventions. Most places can't get two managers to agree on the form and structure of a basic memo, or even get the same individual to repeatedly use a consistent structure in all his/her business communications.

    Most organizations need to work on a few things before this type of processing will be useful in the large. Two particular areas would be: a) consistent use of metadata within document definitions to facilitate querying and filtering, and b) more sophisticated use of template functionality beyond just ensuring every page has the same graphic in it's header.

  10. The two stages we haven't reached yet by Anonymous+Brave+Guy · · Score: 4, Insightful

    The parent post is right on the money here.

    Right now, I don't want flashy, XML-driven power apps. I'd settle for a word processor where I can produce my document with minimal fuss and good quality results. Apparently the vast majority of other word processor users agree with me, because I don't see any big uptake of ueber-powerful macro systems, manipulation tools based on super-flexible file formats, or any of the other much-promised stuff.

    The simple truth is that usability is nowhere near the point where these facilities add value yet. Before you can develop powerful extra tools, you have to get the basics right:

    • a clean but powerful UI (no, this is not impossible)
    • good basic navigation and editing capabilities
    • good basic structure and formatting controls
    • good basic tools (spell check, word count and mail merge would probably do for a very large subset of WP users).

    These are essential for a serious document preparation system, yet no currently popular WP, commercial or free, even comes close to doing them all well. The serious people universally use either DTP packages or typesetting systems, and there's a reason for that.

    When we reach the stage where a word processor can do these things well, without the user ignoring stylesheets because they're too awkward, having to look up the help every time they do a mail merge or finding that limitations in the document structure support prevent you doing what you want to at all in a non-trival document, then we'll be getting to the stage where more powerful "workflow" tools might be of real benefit.

    The second stage, of course, is developing the tools to create those workflow tools, and making them sufficiently usable themselves that people actually take advantage of the advanced capabilities. Right now, we have some awesome-sounding automation tools available, but who really uses them? Not many people, IME. Much of the problem is that the automation tools themselves are, like the applications within which they live, simply too much effort to bother with.

    Give me a usable basic WP and usable tools to automate it (XML-based or otherwise) and I will move the document creation world. Until then, don't call us...

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  11. It's all about the parsers. by SuperKendall · · Score: 3, Insightful

    XML can more easily represent complex data structures than CSV, but that's not the main benefit.

    Nope, the real revolution was in creating standardized parsers. I spent many an hour with LEXX and YACC churning out parsers for many custom file formats. Even though XML may not seem the most efficient way to represent things, it's great not to have to write a new parser every time we have a new bit of information to represent in a file. It frees you to think about what data you want in a file instead of directing your file contents to things that will be easy to parse.

    That's why XML is every bit as valuable as it is made out to be, just not for the reasons usually given...

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  12. XSLT by wwi · · Score: 3, Insightful

    In about .5 hrs, I was able to
    extract the content from an
    OpenOffice text document, as
    well as a presentation, and feed them
    into other tools. This without
    trying to read any DTD's. Applying
    more effort would have yielded more
    functionality, but I was in a hurry,
    just trying to get some information
    out with some heirarchy to it.

    Now, extracting the style is a different
    challenge, and of course style
    means different things to different
    people. But it is simply madness to try
    to extract content from Word
    and Powerpoint files for use elsewhere.

    Oh yes, I used Saxon. Nice product.