Fulfilling the Promise of XML-based Office Suites?
brentlaminack asks: "Almost a year ago Tim Bray of XML fame
said 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.' Now that
MS has dropped the ball on the XML Office front, and
StarOffice has fulfilled its XML promise, where are all those 'wonderful new things?' Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML? Could this be an opportunity for Free/Open/Libre software to leapfrog MS Office in real productivity as XML proponents have promised all along?" What kinds of new and wonderful things can you come up with?
I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it
one missing thing is standardization accross OSS. When abiword (and koffice?) support oo files, then we might see more of this. Also, i personally can't think of a use offhand that oo.org can't already do. Once people begin to find uses for this, then more people will actually try to write scripts to take advantage of XML.
XML is not a selling point for an office suite. Users expect a good user interface and an easy migration. OpenOffice is not there yet. Its help assistant spawns 1024x768 help windows to say as little as "I have automatically capitalized the first letter of your sentence." It has no integrated PIM software to unseat Microsoft Outlook. It has no easy migration path for the millions of users who open documents with useful macros and scripts. OpenOffice has no drop-in replacement for Microsoft Access-driven applications; primitive as Access is, many companies use it to develop simple database applications that would need to be recreated from scratch in another suite.
At this point in time, there's no reason to switch from Microsoft Office to another office suite simply because this new suite uses XML. XML is best suited as a tool for the back-end developer, not an excuse to migrate to a product that has so many rough edges in its current form.
For more information, click here.
There a many uses, besides simply having a format that multiple programs can open. Besides, when new features are added to the format, the older software could ignore those tags, somewhat like HTML has been doing. Then you get the ability to still open newer variations on the format. Not to mention make it easier to covert between them, and add an XSLT to an older app to "update" it to support the newer fomat better.
few off the top of my head:
online services generating template documents; such as online resume creating websites.
Draw charts in a GOOD charting program instead of the crap these office programs have.
Generate presentations from outlines or databases, create videos from presentation files
For the small-time database software, the database could be imported into other database software, or converted to SQL or be translated into just about anything.
I see your point. But have you tried doing mathematical formulas in groff? In (La)TeX they're a breeze (relative to just about everything else out there). Right tools for the right job I guess.
And before anyone try's to point out the cost/open source issue: In business that doesn't mean squat. Trying to sell something for free is the wrong attitude, businesses don't want to rely on good will. Kudo to all the dual licensed project out there that have learned how to play both sides of the fence.
Quack, quack.
What do you have against TeX? .
TeX is god [ok maybe not $DIETY god , but fairly high up there]
TeX , along with latex , allows me to do wonderful things with documents generating into multiple formats. Although I have had some eps integration problems (who knew plot utils used some funky ass default font that know one has ever heard of before) it was my fault for not checking to make sure that I had the right fonts installed. TeX is wonderful for typesetting , it puts the control back in the user .
imagine you write an outline in word. file -> export as -> presentation... or in access you select some rows and export to a spreadsheet. this is where staroffice stands to beat them.
but MS Office derives its profitability from incompatibility -- you have to use their products to get full use of their file format. so using MS Office will necessarily sacrifice this functionality.
The great man himself gave you a clue to great wisdom. Not everyone has that chance.
And you blew it, Grasshopper.
The lesson was, "The right tool for the job."
Sometimes the right tool, despite all the modern technolgical advances, is still a rock.
KFG
er...? Python superior?
... now, there's a great language.
Any language where white space is important to determining the blocking structure (e.g. Make leaps to mind) is badly broken. You don't want to totally ignore white space (FORTRAN leaps to mind) but you don't want the number of spaces/tabs before a statement to indicate anything significant.
Of course, any programming that looks like line noise (e.g. APL or TECO) is also badly broken. Since Perl can look like line noise, I think this applies.
Java
- David
I fondly do remember WordPerfect's Reveal Codes feature. While this is more a reflection on the simplistic nature of WordPerfect (and other word processors of the day), being able to see all of the formatting codes as they appeared in a document was great help when trying to format a document to look a certain way, but have it turn out completely different. Also, if I remember correctly, you could even type in the codes exactly where you wanted them to appear.
Children in the backseats don't cause accidents. Accidents in the back seats cause children.
There's a huge difference between XML and CSV-type files. There's a huge range of stuff you can do in XML that are impossible in CSV type files.
Specifically, XML allows some really interesting data structuring plus validation that's really powerful (DTD's and Schema's).
- David
An explanation for what the tags mean? Sure, Office saves out XML. But god knows what exactly the tags mean. MS sure doesn't document them fully. Usually the most you can recover is the unformatted text...
I'd rather a fully documented binary format to undocumented XML, personally.
Also, last I checked, MS was still saving out in "MSXML", and scattering wierd [blah..] constructs throughout the XML.
What are you talking about?
CSV? LOL.
Does CSV have a transformation language (XSLT)?
Does CSV have an easy to use parser & object model (SAX, DOM)?
Does CSV have an in document addressing language (XPATH)?
Does CSV have a standard way of supporting hierarchical data?
Just cause you think it's overhyped doesn't mean it isn't worth every bit of that hype. I've been using XML since 1998. I shudder when I think about the pre-XML days.
Only problem is that it doesn't import any metadata. hyperlinks, bookmarks, etc...It's just a cold rip of the pages. That limits it's usefullness because you can't do anything with the resultant PDF [i.e. HR manual, reports, manuals] just look at it. That's severly limiting for corperate use.
Bemoaning the lack of XML-based magic goodness in corporate document processing assumes that a corporate document base exists which a) follows predictable content and structural patterns to allow automated processing, and b) is structured and rigorous enough to do meaningful processing against, an assumption which frankly doesn't hold water in too many places.
For most of the office document world (at least the world I work with regularly), most documents are unique in both structure and content and I as a programmer can make only the most basic of assumptions regarding what a program can expect to find within the content bundle. Sure the XML gives me a nice set of rules to rely on for breaking the document into parts and reading it in. But it doesn't do a whole lot to ensure that, say, two spreadsheets follow similar content assignment conventions. Most places can't get two managers to agree on the form and structure of a basic memo, or even get the same individual to repeatedly use a consistent structure in all his/her business communications.
Most organizations need to work on a few things before this type of processing will be useful in the large. Two particular areas would be: a) consistent use of metadata within document definitions to facilitate querying and filtering, and b) more sophisticated use of template functionality beyond just ensuring every page has the same graphic in it's header.
Hey SlashLords! I humblely request We need a "-2 GrammerNazi" to get rid of these!
The parent post is right on the money here.
Right now, I don't want flashy, XML-driven power apps. I'd settle for a word processor where I can produce my document with minimal fuss and good quality results. Apparently the vast majority of other word processor users agree with me, because I don't see any big uptake of ueber-powerful macro systems, manipulation tools based on super-flexible file formats, or any of the other much-promised stuff.
The simple truth is that usability is nowhere near the point where these facilities add value yet. Before you can develop powerful extra tools, you have to get the basics right:
These are essential for a serious document preparation system, yet no currently popular WP, commercial or free, even comes close to doing them all well. The serious people universally use either DTP packages or typesetting systems, and there's a reason for that.
When we reach the stage where a word processor can do these things well, without the user ignoring stylesheets because they're too awkward, having to look up the help every time they do a mail merge or finding that limitations in the document structure support prevent you doing what you want to at all in a non-trival document, then we'll be getting to the stage where more powerful "workflow" tools might be of real benefit.
The second stage, of course, is developing the tools to create those workflow tools, and making them sufficiently usable themselves that people actually take advantage of the advanced capabilities. Right now, we have some awesome-sounding automation tools available, but who really uses them? Not many people, IME. Much of the problem is that the automation tools themselves are, like the applications within which they live, simply too much effort to bother with.
Give me a usable basic WP and usable tools to automate it (XML-based or otherwise) and I will move the document creation world. Until then, don't call us...
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Does XML really grant us that much beyond what CSV and good databases behind the scenes really help that much???
Yes because XML fits in places where databases aren't even worth considering. If you think XML is a replacement for relational databases then you're a bit lost IMO.
How many generic CSV parsers are there? Are the fields (tabs?) self describing?
Think of an OS and applications today and the various files they use. Think of configuration files, shortcut files, bookmark files, document files, project files etc. Think of all those files that have until recently all been stored in proprietry, hard to interpret and sometimes buggy binary files.
Yeesh.
XML is a huge step forward.
magine you write an outline in word. file -> export as -> presentation... or in access you select some rows and export to a spreadsheet. this is where staroffice stands to beat them.
This is what Office does (rather) well. Use an xls as a data source for an MDB, a word doc, and a presentation, all at the same time. Or link database info to a remote presentation.
And while Office prefers Office, you CAN link to and from bare text files. Whether delimited or fixed length.
Way back with Office95 we were pulling backend data off a UNIX box into a VB/Access frontend. Seamless to the user.
XML can more easily represent complex data structures than CSV, but that's not the main benefit.
Nope, the real revolution was in creating standardized parsers. I spent many an hour with LEXX and YACC churning out parsers for many custom file formats. Even though XML may not seem the most efficient way to represent things, it's great not to have to write a new parser every time we have a new bit of information to represent in a file. It frees you to think about what data you want in a file instead of directing your file contents to things that will be easy to parse.
That's why XML is every bit as valuable as it is made out to be, just not for the reasons usually given...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
In about .5 hrs, I was able to
extract the content from an
OpenOffice text document, as
well as a presentation, and feed them
into other tools. This without
trying to read any DTD's. Applying
more effort would have yielded more
functionality, but I was in a hurry,
just trying to get some information
out with some heirarchy to it.
Now, extracting the style is a different
challenge, and of course style
means different things to different
people. But it is simply madness to try
to extract content from Word
and Powerpoint files for use elsewhere.
Oh yes, I used Saxon. Nice product.
So, what have you implemented that's being used by thousands of businesses across the world? Pot. Kettle. Black, Mr failed AI expert.
Adding metadata to webpages is deceased. It has been for over half a decade (Yes it is 2003 this year). Its a dead donkey, no need to flog it any more.
Utterly useless. Listing a series of dates does nothing a simple perl script can extract. Now linking a date to an actual place - now that's something useful. And your above example fails that simple relationship. Screenscraping ain't gonna save you - its far too brittle for practical real world use.