Fulfilling the Promise of XML-based Office Suites?
brentlaminack asks: "Almost a year ago Tim Bray of XML fame
said 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.' Now that
MS has dropped the ball on the XML Office front, and
StarOffice has fulfilled its XML promise, where are all those 'wonderful new things?' Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML? Could this be an opportunity for Free/Open/Libre software to leapfrog MS Office in real productivity as XML proponents have promised all along?" What kinds of new and wonderful things can you come up with?
I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it
um... don't titanium powerbooks come with 802.11 built in?
one missing thing is standardization accross OSS. When abiword (and koffice?) support oo files, then we might see more of this. Also, i personally can't think of a use offhand that oo.org can't already do. Once people begin to find uses for this, then more people will actually try to write scripts to take advantage of XML.
Maybe a script to de-buzzword meaningless missives from above?
E.g., "We wish to engender a positive business atmosphere" => "Free beer at lunchtime"
Well, I'm taking a break right now from generating new Excel graphs by copying old ones and changing the source data, which isn't so bad, and those fucking error bars, which is. Oh, and the scatter plot points are superimposed so you can't click on the back ones.
So if I could do a find&replace on a flat file, I'd have been done an hour ago.
Other than that, no, I can't imagine either. VBA exists now and it's not like we're all flying around with wings and harps.
What I'm listening to now on Pandora...
"Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML"
I am sure there are many. But Microsoft will continue using propriety formats for MS-Office. Why will they open their format and loose all the market share?
And also it sucks to work with.
I still can't understand why people invest so much time and money into that half-assed idea that is XML.
Better alternatives have existed for a long time.
A message from the system administrator: 'I've upped my priority. Now up yours.'
They lied ;-p
Seriously though, koffice will use the exact same fileformat as staroffice. Is that wonderful enough for you ?
This is just a return to part of what made Unix so powerful in the first place: text formats that can be manipulated by the whole suite of command line tools. "Those who don't understand Unix are doomed to re-invent it, poorly" (Henry Spencer).
Back in the 70s we used nroff/troff for document formatting, producing in some cases professional-quality camera-ready books...but the source code was easily fed to spell checkers, formatting-command-strippers, sort, wc, etc etc etc.
XML is ok...not bad as a meta-format...but it's not some kind of new magic; it's just more of the same as what we always used to do.
The great step forward is moving away from the crud that happened in the middle: proprietary underdocumented binary formats that couldn't be fed to filter pipelines.
In this case, moving backwards is progress. But expecting something amazing to be invented is a bit much; it was already invented a long time ago.
P.S. pet peeve...people credit Knuth (admittedly an amazing guy for the Art of Computer Programming) for reinventing typesetting with TeX. Now, TeX is nicer than nroff/troff in multiple ways, but it's worse in some others (TeX is not set up for command line filters!), and in any case is only an incremental improvement, not a revolution over the older Unix tools. Credit is not properly being given.
Professional Wild-Eyed Visionary
OpenOffice documents are, ironically, not as desirable to automate the production of as PDF documents, I think.
With XML libraries maturing at their current rate, and transformation schemes abounding (XSL, scripting, etc.), I think that XML being the format of any word processing document format is simply less in-demand these days. Those that need to can certainly build OpenOffice documents quite easily, but I think most people are generating HTML, man pages and PS/PDF documents from source DocBook or simple YAML sources.
In a nutshell, it's not that OpenOffice isn't living up to the hype, it's that so much is crashing down on Microsoft Office in so many different ways that looking only at OpenOffice will not give you the whole story.
XML is not a selling point for an office suite. Users expect a good user interface and an easy migration. OpenOffice is not there yet. Its help assistant spawns 1024x768 help windows to say as little as "I have automatically capitalized the first letter of your sentence." It has no integrated PIM software to unseat Microsoft Outlook. It has no easy migration path for the millions of users who open documents with useful macros and scripts. OpenOffice has no drop-in replacement for Microsoft Access-driven applications; primitive as Access is, many companies use it to develop simple database applications that would need to be recreated from scratch in another suite.
At this point in time, there's no reason to switch from Microsoft Office to another office suite simply because this new suite uses XML. XML is best suited as a tool for the back-end developer, not an excuse to migrate to a product that has so many rough edges in its current form.
For more information, click here.
I sure would like a apache module that can CSS and display native open and star office documents.
Got Code?
(yes, I said 6... I am exaggerating)
I created a PHP script a few months ago that allowed a client to upload StarOffice templates for company documents. Then the the script automatically generate documents by pulling data from a database and inserting it into the StarOffice document.
Was really easy, StarOffice documents are zipped files that contain the XML files. I just unzip'ed the file, inserted the appropriate data into the content.xml file and zipped it back up.
I was absolutely amazed by how easy the StarOffice files were to work with. I'm really excited about the possibilities that are in store for us, especially ones that are better than my little hack.
Brandon Petersen
The biggest dream that the financial world has ever had with an XML concept has been the concept of standardised financial reports.
Imagine a world where any finacial (excel based or otherwise) report from any public company can be compared with any other company report and we can all be sure of how the figures were calculated and what they mean.
AND they are fully comparable. And fully importable into any financial package. No longer is any one company dependant on one financial package. Come to think of it there is no way the vendors of such products will ever allow this to happen!!!
http://www.xbrl.org/
jech
If there was a way to render out the open office/star office documents on the command line it would explode in the reporting area. Being able to have the end user making a really nice template and have a perl script fill it then pass it off to a pdf or printer is key.
My team & I just got done building some billing software for one of our customers, and OpenOffice.org's XML based documents turned out to be perfect for generating reports. Our customer is able to open up the document and change the formatting of any report at will, and then we have some Ruby code on the backend that parses the XML document, fills in all the real data from the database and then uses the CLI interface to OpenOffice to render the document as postscript. It was a quick easy way to get powerful report generation with a format that non-technical people could edit that required just a little bit of glue code on the backend, and it's the XML format that made it all possible.
I did take some time and decompress a StarOffice document -- I was attempting to write a couple of modules for manipulating StarDraw images to create dynamic flowcharts.
It took some time to get up to speed, as the compressed XML is split across four different files (content, meta data, settings and styles). Mostly, I was concerned with modifying the content document.
Each of the documents is written with space in mind, and for the document I was dealing with, the content was 20K on a single line. I had to process the XML just so I could understand the physical structure. Once that was done, it really wasn't that difficult to manipulate the doc by hand, re-zip the content and open in StarOffice.
(Unfortunately, I didn't have the time to even start, much less complete, the modules. Damn day job).
XML developers and Web designers are now able to work on some XML-to-HTML transformer that matches closely what the average office user is spending his time creating with the WYISWYG Writer program. This could be a nice alternative to Frontpage, for example.
Of course, OpenOffice 1.1 already comes with a nice HTML tool, but that doesn't stop anyone from trying to do better.
I bring it up because my organization paid Crystal reports $10,000 to be able to do this. If I could have written a little perl script that connects to the database and emits an OpenOffice doc, then I could have saved the organization ten thousand dollars, and saved myself a world of pain. (The only thing more evil than Crystal Reports is crystal meth.)
You might be wondering why I wouldn't just use HTML and some library that automatically creates chart PNG images -- the reason is we have to email the report to our board members because they're demanding like that. So we use Crystal to generate pretty PDFs with all the charts. We also let the board members log into our system to generate their own reports via the web, which they can then email to the group.
So having an XML-based document format for this would be wonderful, especially if OpenOffice would provide a command-line utility for converting from OO format to PDF.
Wow, a lucrative publishing contract! I don't have to be evil anymore. --Meteor
But I don't use office suites. I have plenty of perl scripts to play with, reformat (so to speak - convert articles to slides etc.), and produce LaTeX, which has been a readily available option for years. I'm sure if I end up using StarOffice or OpenOffice.org then I may well be inclined to produce useful scripts for those - in the meantime though I'm quite happy with what I've got.
Jedidiah.
Craft Beer Programming T-shirts
I like the idea of XML but I can't find one good source of all the XML data lists.
SOMEONE FOR THE LOVE OF GOD give me a list of XML sites so I can actually finish my app that uses it (hence the "it's still beta" in my sig for about 2 years now)
later
Offtopic, true.
But what's this bias people have for the inferior Perl? More and more people realize that Python is superior in almost every possible way...
At my company, once a failed startup with new life under the wings of a huge corporate parent, we have been using a homebrewed Web publishing system that takes Word 2000 or XP documents, saves them in RTF format, then uses a utility created by Majix to transform the document to XML. From there we use perl, and some XSL to get the document into XHTML combined with some JSP to produce documents that we deploy on our production env. The good part: the system was entirely free of license fees (other than office and Windows of course). The bad: it was a pain in the behind to get all the parts together.
The steps to produce valid XML from Word are the biggest hack I have ever been a part of as an engineer. We had to write a custom VB DLL we run inside (what else) an IIS server which takes the documents uploaded by authors, then saves the documents as RTF. Control is then handed over to Tomcat, which takes the RTF and uses some custom classes that make Majix a server to transform the documents into XML. All in all we had to use VB, VBA, Java, JSP; two separate server configurations (IIS and Tomcat) and a bunch of really ugly glue to stich all the parts together.
I for one, and I am sure I speak for my entire team, would love a solution which saves us this ugly cludge.
I just tried out the RC for OpenOffice 1.1 and it rocks. It would be nice if OpenOffice text document generated links for index and table. It's probably just user error on my part.
XML (in word processors, at least) is nothing really new. Remember WordPerfect? It had a feature called "Reveal Codes" which when activated displayed the underlying "markup" behind the document. One could argue that this was a primitive XML format. I argue that while it was great and all, such an accessible format worked well but didn't inspire great advances in unimaginable new ideas.
BTM
That was the turning point of my life--I went from negative zero to positive zero.
First Off
Microsoft did not drop the ball with XML. Microsoft disappointed the slashdot crowd by not going completely open... geee...... big shock there. Microsoft maintains dominance to their office suite by controlling the file formats behind it. Opening that up, without reason would be absolutely stupid from a business point of view. Granted, its an un-popular stance, but that doesnt make it any less true. MS played along with the XML game to be able to use XML as a buzz word... and in some ways, they truly have embraced XML... just not in their holy cash cow called Office. Take a look at Visual Studio (dot) Net, and you will see how strongly MS has infact embraced XML.
Secondly...
XML is perhaps one of the most over hyped technologies ever. Self describing datatypes are nothing new. The only really remarkable thing about XML is how embraced by the industry it was. In all honesty... the difference between XML and CSV files really isnt that signifigant. Granted... XML is far beyond anything a CSV ever did, but they all present the same result. In the current work environment I am in, all our enterprise systesm support input/output now via CSV. In addition, im in the auto industry, so the whole hype of Webservices+XML really isnt that special either. RIght now, they have ANX and EDI... granted... XML + Web Services would be much more straight forward... but in 20+ years of evolution... has it really come that far?
Sorry for the anti-status-quo opinion, but I cant help but believe that XML is way overhyped. Useful... sure... but definatly overhyped!
Someone should use this new-fangled XML thingy to make a universal markup language that people could use to define and deliver structured data to any application in a standard consistent way.
How about making an XSL style sheet for resumes in OpenOffice?
....
"tags" like Name, address, education, jobs, skills. Then break them down...
example: education -- uniName, gradYear, gpa, major, minor,
If the things are in standard drop-down boxes like "heading 1" "heading 2" "normal" , etc.. are now...
XML does make it extremely easy to create documents on the fly, whether a plain old document or a slideshow presentation, all it needs is some template XML, original text, and some programming language to put it together.
I wrote a song lyric storage system using PHP and MySQL, and I had the idea to have it be able to be put onto a slideshow to teach it to a group of people (or whatever). With the XML format provided by OpenOffice.org, I was able to quickly put it together and show it off, impressing quite a few people in the process. Of course, those people think Word/PowerPoint run the world, and the file format is all but a mystery to them. Hence having something generated on the fly via a webpage has its cool factor, and not to mention it was a good chance to introduce this free word processing suite to them. Also a good chance to tell them that if I were to rely on ASP/PowerPoint it would have costed much, much more.
Open document format is the way to go in the future, because it definitely allows interoperability.
Please direct all bug reports to
There a many uses, besides simply having a format that multiple programs can open. Besides, when new features are added to the format, the older software could ignore those tags, somewhat like HTML has been doing. Then you get the ability to still open newer variations on the format. Not to mention make it easier to covert between them, and add an XSLT to an older app to "update" it to support the newer fomat better.
few off the top of my head:
online services generating template documents; such as online resume creating websites.
Draw charts in a GOOD charting program instead of the crap these office programs have.
Generate presentations from outlines or databases, create videos from presentation files
For the small-time database software, the database could be imported into other database software, or converted to SQL or be translated into just about anything.
I mean, come on. It's just a standardised file format. That's all it is, OK?
Invoicing, Time Tracking, Reporting
Yeah it's flamebait... I couldn't resist...
d ex.html
You sir are an ID10T! Cutting shielding, removing the drive, bending the case! I would sue the crap out of you if you F*cked up my PowerBook!
I've installed plenty of Airport cards into Mac laptops and yup, it's a bit of a pain; but if you can't get it installed without mucking it up then you are complete hack.
Besides you should have read the installation document! It's extremely clear and has lot's of pictures and even videos for half-wits like you!
Customer Installable Parts Reference:
http://www.info.apple.com/usen/cip/in
I downloaded staroffice 7 yesterday. however, i cannot save nor open xml documents (no option). Also, a friend of mine got a preview of the upcomming office 2003 and that one did save as xml. Of course, I could only open the xml and view the tags, but taht was about it. No other word processor was able to view the document (StarOffice, OpenOffice, Abiword). Tuco
Here's someone who actually did do something with these. This proof-of-concept shows that you can easily convert the xml files to a browser-readable format.
if(!cool) exit(-1);
why dont they just build openoffice from latex/lyx? i just apt-get'ed it yesterday and it seems to have everything i need for documentation....
my blog
And before anyone try's to point out the cost/open source issue: In business that doesn't mean squat. Trying to sell something for free is the wrong attitude, businesses don't want to rely on good will. Kudo to all the dual licensed project out there that have learned how to play both sides of the fence.
Quack, quack.
Ron Minnich at lanl described this one also (though we weren't talking about XML)
-----
You want to make your way in the CS field? Simple. Calculate rough time of
amnesia (hell, 10 years is plenty, probably 10 months is plenty), go to
the dusty archives, dig out something fun, and go for it.
It's worked for many people, and it can work for you.
----
if you must
So get ready for all the gee whizzery now the new kids have "found" plain text.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
I helped spec out a document management metadata database 18 months ago for an engineering firm that wanted to catalog its files. They started out wanting just to categorize their CAD drawings, then decided to include all types of project files.
Our solution was a tcl front end that forced the entry of a minimal amount of metadata *during file creation,* to be picked from preset categories and subcategories. We also provided for free text entry but that was to be used only after the other fields.
The points are
a) The general metadata categories were known; the engineering tasks weren't new.
b) No one is going to go back after the fact and enter the metadata. You have to integrate its entry into the new file work procedure.
c) It's got to be as easy as file/new in a GUI.
d) Its utility has got to be very very apparent when juxtaposed with a subdirectory / filename scheme.
and we've had it since most /.ers were born
then there was postscript
now XML
whee, I have candyfloss in my hair
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
A bit like the PDF to Text command line stuff that already exists. Lots of power there if it can be tapped.
Office document gets parsed by a script, the images extracted and run through mogrify for scaling and branding, then the text gets translated to xhtml for posting on a DB-driven site somewhere.
Even better -- a BOFH can scan through the network of shared documents, catalog any and all confidential information, grep them for anything particularly interesting, and maybe post a few names into alt.social.deviants or whatnot. All from a small script instead of half a day wading through mundane memos and accounting info. That's efficiency!
How in the hell am I going to use MS Office in Debian Linux? When I need to print out an envelope or mailing label or little letter, I fire up OO and get the damn job done. Its great, and it doesn't have all the idiotic quirks that MS Office has which presumes that I am some kind of moron like you who needs my dick held for me every time I need to take a piss. Get back under the bridge, you evil MS ass-troll.
Clickety Click
and go full fucking circle
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
obviously, if you havn't noticed, the entire "new world" promise is an empty promise made by some stupid overzealous open source freak who thought it made a difference. First of all, if you need dynamic shit, you don't use a staroffice or MS Word document to spit out dynamic content, you use HTML like the smart people do. Secondly, why the fuck would you need to have open standards for word processing? There is absolutely no good reason -- only the anti-MS zealot who says "competition!" . But really when you think about it, there is absolutely no good reason in the world to need competition for word processing formats, its the the frontend that you need its the god damn features like line spacing and other aesthetically specific needs. If you want an open format, look into HTML you dumb shits.
Not me but I am writing C# apps that make use of Excel's XML format. I wrote about using XSLT on the Excel XMLSS format in my blog a few months ago when I had to update date values in certain columns. I also posted the XSLT stylesheet.
Disclaimer: I work on the XML team at Microsoft but not directly with Microsoft Office.
If Microsoft is successful at deploying its DRM scheme, then interoperability will likely go out the window, no pun intended. Just as planned?
It's an overrated system with way too many features and having it be scriptable should be held up as proof of that.
This is my sig.
Take a look at Axkit's, OpenOffice filter.
perhaps a free (as in beer) Word plugin?
I use OpenOffice.org suite for Windows instead of Microsoft Word for Windows, you Insensitive Clod(tm).
I guess there's XML and there's XML and getting between them is not necessarily easy.
Microsoft made a big deal about the most recent versions of Office writing out XML, but that was because XML was a buzzword, sounded as if it might be more open than ".doc", and was essentially a selling point.
From what I've read, people have been underwhelmed with the XML coming out.
If only a similar set of transformations could be developed for OpenOffice to import and export the XML of the latest version of Microsoft Office. From what I understand, the schema is not documented and the formatting and rendering rules for documents are still kept a private affair, just as it has been for .doc files.
You're still locked-in, dude!
I am working on the ability to read the Solver files and import them into database via JDBC
:-P
so be patient
There are plugins ("products" in Zope-speak) that let you save star/open office documents to a zope server, and automatically make them into content for your web site and integrate with content management workflow (if you have one).
Just like... oh.... Microsoft sharepoint portal server and Microsoft Office...
Only infinitely cheaper....
Now, I'm not too keen on Zope (I HATE its OODBMS - why not just use a relational backend? The relational model can do everything OO does, and more, then again Zope APE might make the point irrelevant...), but the content management framework is pretty sweet, anyway.
And it would pan out, too, if MS didn't drop the ball.
If MS didn't drop the ball, we'd have offices full of non-IT people creating XML documents without realizing it. A mass of structured data would build and become grist for the mill that is the office geek.
Unless OpenOffice/StarOffice has some huge market share that I'm not aware of, I'm not expecting to see any remarkable perl scripts for parsing office docs soon.
My Karma was at 49, then they switched to words. All that work for nothing!
I've been using these XSLT OOo <-> Docbook-XML filters for a little while.
They work pretty well (if you can manage to get them installed with the broken install instructions) but only for a limited subset of Docbook. There's no support for the programlisting tag, and lists are currently broken.
If anyone out there has superior XSLT kung fu, getting those two things working would be most appreciated : )
(I know the basics, but I don't yet have time at work to justify it. Maybe if this project gets done on time...)
What kinds of new and wonderful things can you come up with?
rm *.xml
The issue with MS office files has been more with the ability to present it back to the user the same, not reading the file. Various programs have been able to "read" (grab text from) ms office formats for ages, the issue is that noone has been able to write a word processor that shows a moderately complex document/spreadsheet/powerpoint back to the user closely enough to the same. Don't get me wrong, some are close, but if you're tweaking your fonts and whatnot for say, an investor, you don't want OOo to go and convert everything so that it mucks up the tables and converts all the fonts to 12pt arial.
.doc/.xls/.ppt that is the "standard"
For a programmer or geek, or even someone just using it (OOo or similar) as a word processor to write letters to mom, not a big deal. But in a corporate environment, it's gotta be exact. At work (in the education industry, and therefor with lots of macs) everyone uses PDFs, but in the non-mac world, it's
Well I don't know about Free/Open/Libre or XML development for Office... but I do know about the proprietary APIs Microsoft distributes for Office.
If you wanna give them a try sometime, assuming you got Windows, VB5+, and Office installed... just add Office to your references (try Microsoft Office in the Project References menu) and give it a whorl. It's fairly easy to program in if you've used Office... most of the concepts that make for a good Office user translate directly into programming concepts for the Office object model.
And yet Office Automation programmers are in scarce supply.
Microsoft even offers a cert specifically for Office Automation programmers!
But I haven't seen too many well written Office applications. My speculation is that its not for lack of tools, but that its for lack of concepts. Other than the obvious reporting needs that any large organization has, are there any compelling reasons to spend an afternoon coding an office application?
I think it is this lack of compelling reasons, and not a lack of easy-to-use programming tools that causes the lack of good free open add-ins...
I am disrespectful to dirt! Can you see that I am serious?!
"Apache module"? Can't XML-supporting web browsers use some sort of XSLT filter and do this displaying on the client side?
Will I retire or break 10K?
...Of course, not very well--but it's pretty easy to compile, say, the Docbook 4.1 DTD in Wordperfect and edit moderately complicated documents. Or import... The limitations are that it uses its own formating system, rather than XSLT; and it uses DTDs instead of schemas, because the technology derives from SGML (which wordperfect also supports). Arguably, WordPerfect has better support than any of the alternatives within the word processing space (i.e. discounting pure editors such as EMACS).
Author of Permanence and Ventus, co-author of The Claus Effect and The Complete Idiot's Guide to Publishing SF.
"There were 2469 documents found. Did you find what you wanted?"
I should say we have.
Linden Hall School converted completely to OOo and StarOffice two years ago and haven't looked back since.
Maybe you should consider your rising taxes and the cost of MS product before blindly recommending our schools continue using it, eh?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
Bemoaning the lack of XML-based magic goodness in corporate document processing assumes that a corporate document base exists which a) follows predictable content and structural patterns to allow automated processing, and b) is structured and rigorous enough to do meaningful processing against, an assumption which frankly doesn't hold water in too many places.
For most of the office document world (at least the world I work with regularly), most documents are unique in both structure and content and I as a programmer can make only the most basic of assumptions regarding what a program can expect to find within the content bundle. Sure the XML gives me a nice set of rules to rely on for breaking the document into parts and reading it in. But it doesn't do a whole lot to ensure that, say, two spreadsheets follow similar content assignment conventions. Most places can't get two managers to agree on the form and structure of a basic memo, or even get the same individual to repeatedly use a consistent structure in all his/her business communications.
Most organizations need to work on a few things before this type of processing will be useful in the large. Two particular areas would be: a) consistent use of metadata within document definitions to facilitate querying and filtering, and b) more sophisticated use of template functionality beyond just ensuring every page has the same graphic in it's header.
No, I'm New Here
I still don't get this thing about MS dropping the ball. I've played with Office 2003, and the XML features in particular (mostly Word & Infopath, not the other programs) and I think they are quite well done.
/. goes on about mostly. Yes, it's pretty ugly XML, but you are trying to represent non-structured data in a structured format - of course it's going to be ugly. But it is documented & there is a publicly available XSLT from Microsoft to work with it. The other mode is to import and XSD and tag up the document as you like. You can save this in "rich" mode (with all the office formatting - unstructured again) or "clean" mode in which the XML is as pure as your XSD is.
Word has two different modes. One is where you can save an ordinary word document in an XML format. This is the one
InfoPath simply rocks. Where else can you create a end user friendly UI that outputs clean XML (with XHTML islands if you choose) and will submit directly into a web service & make the whole thing start to end in a few minutes (for a simple form, of course).
I just don't get it. Seems like mindless MS bashing to me.
Read reviews of shopping cart software
If you want an open format, look into HTML
Print comes on pages. Few if any HTML viewers support the CSS extensions for paged media. Until CSS3 support becomes widespread, word processing programs' data formats fill the gap.
I won't answer the rest of the troll.
Will I retire or break 10K?
I have an accounting system for SME's using Apache/PHP/MySQL intranet model.
I am currently adding OOffice classes so that the accounting system can generate nicely formatted invoices and other customer related documents by generating an sxw document with the correct letterhead and layout. Fairly simple and effective, but I would not bother trying this without XML.
The client can edit their standard template for each document, and PHP just fills in the blanks.
Another one is for the debtors and creditors aged balances to generate an OOffice spreadsheet, complete with formulas, for projecting cashflow. I have yet to see any accounting software provide cashflow budgetting as simple and effective as a spreadsheet - so spreadsheet generation it is.
Anyone else developing PHP functions to read/write OO docs ?? If so, we should create a sourceforge project and collaborate.
The parent post is right on the money here.
Right now, I don't want flashy, XML-driven power apps. I'd settle for a word processor where I can produce my document with minimal fuss and good quality results. Apparently the vast majority of other word processor users agree with me, because I don't see any big uptake of ueber-powerful macro systems, manipulation tools based on super-flexible file formats, or any of the other much-promised stuff.
The simple truth is that usability is nowhere near the point where these facilities add value yet. Before you can develop powerful extra tools, you have to get the basics right:
These are essential for a serious document preparation system, yet no currently popular WP, commercial or free, even comes close to doing them all well. The serious people universally use either DTP packages or typesetting systems, and there's a reason for that.
When we reach the stage where a word processor can do these things well, without the user ignoring stylesheets because they're too awkward, having to look up the help every time they do a mail merge or finding that limitations in the document structure support prevent you doing what you want to at all in a non-trival document, then we'll be getting to the stage where more powerful "workflow" tools might be of real benefit.
The second stage, of course, is developing the tools to create those workflow tools, and making them sufficiently usable themselves that people actually take advantage of the advanced capabilities. Right now, we have some awesome-sounding automation tools available, but who really uses them? Not many people, IME. Much of the problem is that the automation tools themselves are, like the applications within which they live, simply too much effort to bother with.
Give me a usable basic WP and usable tools to automate it (XML-based or otherwise) and I will move the document creation world. Until then, don't call us...
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
That probably sounds icky and scary, but should not be all that hard.
I don't know what the formats are, but there's a whole pile of flexibility in XSL and FOP so building a very accurate version could take some fiddling. But producing a close approximation is probably very straightforward.
>> What kinds of new and wonderful things can you come up with?
...more like [The Gimp] ones, including script-fu things.
Plugins.
Not just like Mozilla plugins (but, hey, good idea!)...
Dewd, it's gonna be a party. It will be like Word macros, but, this time, done right.
A sizable percent of WWW users use IE 6. Most of those who refuse to use IE typically use something with a bit more XSLT-fu than Netscape 4.x.
Look at the statistics from Google Zeitgeist. Red, blue, and lavender lines indicate IE 6, IE 5.5, and Gecko respectively. Notice that except for IE 5.0 (orange), the three CSS-savvy classes of browsers I mentioned dominate the client side. The lavender isn't very high yet, but it's getting there, behind only IE.
Solution: Sniff user agents and point IE 5 users at Mozilla Firebird and Windows Update.
Besides, you seem to suggest some sort of mod_xslt, but what would you translate it to?
Will I retire or break 10K?
Formatting can be handled by whatever.
The strength is in the meta-data. By using XML the doc can be formatted by anything that can understand it. But formatting is not the point.
The docs can then be referenced in a relational database - searched,indexed & importantly shared and migrated to other indexing systems or stripped.
The XML 'magic' is very simple. The use of the data is whatever you want it to be. Do you want to restrict access, provide access, record access, implement version control and X-referencing - then using this technology is for you.
It has sfa to do with troff/groff/cat/echo/print and everything to do with document collaboration and sharing.
Is there a way to generate a PS file from OO's XML?
(without using OO)
I have being looking for this for a while.
Messing with OO XML format is not difficult and if you just
play with OO saved file to see what changes in the XML, it is easy
to create reports from a scripting language (aka perl).
The problem is that (AFAIK), there is no way to direct print
(or generate a PDF) without entering OO itself.
It should not be difficult to write a command line utility to do so,
if someone who knows the API points to what would have to be done.
So, IMO, this is the missing key to Office XML perl Heaven!!!
I don't want an "Office Suite" shoved down my throat. I want to use the graphing tool I think is best, I want my favorite email app, I want to use the word processor I like, and the spreadsheet I like, etc. I want to be free to try the newest software without converting everything I might need in the future. If the "office productivity programs" all used xml file formats, I could interchange files for one app to the next easily. I would NOT be locked into a single vendor's "suite" or programming HELL.
If the apps were using XML, easy migration would be a given, and programmers could spend time "enhancing" the user interface.
. there used to be a sig here.....
Nice troll, bet you are still living in your momma's basement. Hope someday you come out to the real world.
At the company I'm working for, they use as antiquated a system as you can use electronically to track bugs, etc: a spreadsheet. Since I use Linux, my spreadsheet is modified in StarOffice. I have brought in CVS and my development produces .deb files, so my dream (in my oh-so-prodigious free time) is to write a script that parses my CVS commits for debian/changelog entries that say "Closes SCR#xxx) and automatically modifies the bug spreadsheet. Ideally, I'd like to write a bug tracking software program (something not as complex as Bugzilla and more engineering-quality based) to also generate documentation, reports, test procedures, etc. This can pretty much only be done now by modifiying XML directly (which is really kinda ugly) or something like an Excel COM object (which I am loathe to do, can't do using Linux, but would be relatively easy and generally very cool to do...)
..it's not even that standardized.
It's a meta-format, giving you means to *create* a standardized format, once you start communicating with the other people in your industry who might want to use the same standards.
Not really any different from any other EDI format, except now coders can move their skill set from one corporation to another.
That Jesus Christ guy is getting some terrible lag... it took him 3 days to respawn! -NJ CoolBreeze
I wanted to save some time documenting servers, so I wrote Accudoc to automatically generate server documentation for (currently Red Hat) Linux systems.
Its written in shell, and just uses a bunch of shell functions I made to create the documents.
You can download a copy here if you want. It's open source, and if you're a SysAdmin you might find it useful to produce written reports of servers you manage.
Over the summer I assisted the editing staff at the local university press with some of the more mundane/trivial aspects of assembling an encyclopedia they're putting out next April. Although all their documents were in a variety of versions of Word, I was still able to whip up some perl scripts to do some text processing; enough to suit my purposes. Although my script crashed on certain documents for no apparent reason.. And it took way too long to code considering how simple it was..
MS Office is required because users expect MS Office and will fight like hell the smallest changes in their comfy environment.
But the right kind of XML editor or even WYSIWYG (though I think most WYSIWYG editors are really WYSIWYGBYTATGSAAAAFTB (what you see is what you get but you've thrown all the good stuff away and added awful formatting to boot)) with a meaningful XML back end (with markup like "to", "to-address", "date", "title", "section", and so on and with domain specific markup as needed) would really change how documents are used, stored and generally manipulated.
One document style could really be used across an organization (in large companies this sometimes happens - in small to midsize ones rarely). Documents could be indexed meaninfully or even stored with minimal indexing, but fancy XQuery based search capabilities. Layouts could be changed to accomodate different types of paper (and I'm not just thinking letter vs legal or A3 vs A4, but things like letterhead changes etc). Documents could be stored (even transmitted) as the minimal xml markup needed to regenerate that (meaning maybe a couple Kb rather than a dozen or three Mb).
Documents could leave out boilerplate since large chunks of boilerplate could be inserted with a single tag (<patent-claim-from-hell> could expand to three pages of standard write-once legal nonsense <sig> could expand to your signature...).
It ain't gunna happen. Users love their Word, they love being able to set up their own ugly-as-shit document layouts, being able to lose documents easily, being able to spend hours tweaking a font here or there instead of doing real work.
And even the vaguest mention that XML would be good is enough to generate a storm of protest. "Oh, but you can do that in MS Word already." But few people do and XML makes much more possible.
I sometimes think it would be interesting to make secretaries pay for MS Word themselves (not unreasonable - mechanics usually pay for their own tools) and for the disk space used by its documents.
Having an XML representation of a Word (MS, Open, whatever) document as a stream is really no more useful to me than RTF: I can parse them both.
The better part is when you can structure your document. Not just a heading surrounding a bunch o' paragraphs, but a (to use the stuff I have to work with) Research Report contains a Title Page, a Synopsis, an Introduction, Materials Section, etc. You can't put tables and figures on the title page or Introduction, you can in the Synopsis and Materials Section. TOCs and things like that are created as part of rendition, between the Synopsis and Introduction, without the user messing with it.
Now even more than storing those sections (which would, in the HTML world, be DIVs and SPANs), I want control over the UI: disable that table button in the title page, even down to where bold and italics can be used.
Office 2003 has some facility to implement this, but it's kind of awkward -- it's an extension of how their SmartTags work. Generally pretty ugly, to control everything.
I don't want to use an XML editor, my users know Word, are used to Word Processors, and they cost 1/5 of XML editors, less in bulk licenses.
I'd be implementing this now, if it weren't for two things: a) I work for a big corporation that never buys into new releases for a couple of years, and 2) they're laying me off -- closing all the facilities in Chicago (sigh).
Design for Use, not Construction!
I'd really like to send the open office docs straight to the printer for hardcopy output. We've been struggling with that forever.
Users always seek a better user interface. And as far as you use an office suite with a proprietary format, you will have little choice.
With an open standard format, you can pick another office suite with suitable features (including interface of course) at any time. Faster the migration to an open standard, less closed-format documents you will have and brighter the future.
I do not say XML is the best open standard. I even do not think that the most of users care what XML is. What matters is that it is an open standard. It may not be the best but it is easy to make conversion from one open standard format to another later. What dissapointed me most is MS's unwillingness to compete in an open ground and to make users choose a better office suite. I would like to say, "let users decide."
The point of the migration in time depends on what a user needs an office suite for. In most of tasks, OpenOffice.org is quite sufficient. Its interface is surely going to be further improved, and it is about time for an average user to consider freeing oneself from MS.
It seems to be the hot topic. Where are all the neat toys that go with XML format files for office suites? Well, the simple answer is that they are not needed yet. Sure, I love the idea. Even something as simple as using a spreadsheet you can edit in your handy convenient office application to generate dynamic web-based content including graphs and summary datasheets through a php script sounds like alot of fun, but the honest truth is how often do the end users actually USE XML as their format? You have millions and millions of documents already out there in the WordPerfect and Microsoft Office formats (and believe it or not, some still in Microsoft Works formats) and when you open one of those documents, your application doesn;t tell you "this format is old - you should convert it" so it stays as it was.
Once the bulk of active documents are XML data, the scripts that parse them will become more prolific, and I think most of those scripts will be web based, such as php, perl and java.
Let's see, and, companies are going to make it easy for everyone to compare their prices without adequately describing the subtleties of their value proposition.
It's not even utopion, it's stupid. The relentless march to standardization for the sake of standardization is the Unix crowd's lemming version of everyone just buying Microsoft. You aren't changing the way of thinking, you just want people to think the same way about your open source stuff rather than Bill Gate's closed source stuff. There's no difference between Torvalds and Gates, except one begs for money and the other earns it.
This is my sig.
Microsoft maintains dominance to their office suite by controlling the file formats behind it. Opening that up, without reason would be absolutely stupid from a business point of view.
I think you meant to say something like
"Microsoft maintains their illegal monopoly by controlling the file formats behind it. Opening that up, with good reason, would be the ethical and economically competitive thing to do."
I don't want MS to open up their standard just because I believe in open standards. I want MS to open up their standard because they have an illegal monopoly and have therefore stolen my money, your money, and the business of better competitors everywhere.
Let's get that fact straight.
XML can more easily represent complex data structures than CSV, but that's not the main benefit.
Nope, the real revolution was in creating standardized parsers. I spent many an hour with LEXX and YACC churning out parsers for many custom file formats. Even though XML may not seem the most efficient way to represent things, it's great not to have to write a new parser every time we have a new bit of information to represent in a file. It frees you to think about what data you want in a file instead of directing your file contents to things that will be easy to parse.
That's why XML is every bit as valuable as it is made out to be, just not for the reasons usually given...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Come on, people. I've never seen anything worse than Crystal in my life. The reports are mediocre, and the charts suck so hard, I fail to convey it verbally.
Anyone who paid $10K for this junk needs immediate psychiatrical attention.
The program also indexes Word and Excel files using Apache's POI library. I haven't looked at the size of that, but something makes me think it is a bit bigger than out little hack.
I know there is much hype around XML and in the end it is only half a syntax. But there are good applications of XML around and I think OOo is one of them.
Peter
-- CAUTION: Don't read this posting.
You can configure most office suites to display the document properties dialog on save. I'm sure you could also build templates with macros that would check and update these. Yes, it's a real problem and most businesses do not have strategies to address it. It's a document management issue very few address.
It's a similar problem with web publishing; there is little or no metadata to identify documents. I've always thought that the Dublin Core set would serve as a very good repository for a kind of CVS on the status of documents. Have wanted to build a back end to something like Apache/Cocoon using this model, which would also serve as the data repository for populating both the metadata in the web documents and also all the other data for semantics and accessibility, all done on the fly out of a DC metadata repository.
Being a text format, XML would at least bring documents out of the binary world and allow diffs and things that use diffs, like CVS.
Imagine actually being able to use source control to track documents!
Unfortunately OO defaults to gzipping the XML, which brings us right back to binary.
The Securities and Exchange Commission is essentially powerless. Any stock governance will be pointless until shareholders of companies have real rights.
Companies are not necessarily designed to be responsive to their shareholders, and, they are not designed to be competitive, and, they do not have to be honest.
This is my sig.
One thing the Open Source office suites don't (yet) have much of an answer for is an XML data collection/management system along the lines of Microsoft Office InfoPath. A natural standard for such applications is W3C XForms.
Read all about it--fullly GFDL and online now--from the O'Reilly book at my site.
.micah
--- Learn XForms today: http://xformsinstitute.com
With a bit of polish, Mozilla composer could be a good word processor. It generates XML (xhtml), and it's available for a bunch of platforms. Plus it comes with a web browser and an email client. That's most of an office suite right there. Most non-technical users don't use spreadsheets, databases, or presentation programs anyway. They want word processing, web, email.
Why is no one complaining this much about Adobe Acrobat?
In about .5 hrs, I was able to
extract the content from an
OpenOffice text document, as
well as a presentation, and feed them
into other tools. This without
trying to read any DTD's. Applying
more effort would have yielded more
functionality, but I was in a hurry,
just trying to get some information
out with some heirarchy to it.
Now, extracting the style is a different
challenge, and of course style
means different things to different
people. But it is simply madness to try
to extract content from Word
and Powerpoint files for use elsewhere.
Oh yes, I used Saxon. Nice product.
Ahhh bugger... i must have missed the article on MS dumping XML support. I saw this as a good thing because the content of the document could be morphed to fit into any display the user wanted it to. Like a browser, PDA, Phone or some other mystical device.
I must say, its less interesting now that MS have droped it, and im not sure if the fact that Star ofice has it, pleases anyone. I Say that because, MS Office has the most market share, and enabling XML documents would have allowed better operibility amoung other wordprocesses and MS document Apps. I was hoping Star OFfice, Gnome Office and KDE office could all contribute to a set of libraries to parse the XML Word Documents, which would benefit everyone, but looks like that will never happen.
But other than displaying XML data differently, XML also has other advantages. I've read several articles on how it could help the searching of ducments for specific combinations of text etc...
Theres so many cool things that i cant even imagine would have been able to be done with XML word documents. The fact that Star office is doing it doesnt interest me as much as if MS word was doing it.
MS Office is a better product, which much more market share in my opinion.
Giving IE users a taste of their own medicine since 2005 - http://pods.-is-a-geek.net/
"and a bit of intelligence"
Using a MS Word template, ActiveState Perl, and a number of modules including Win32::OLE I created a documentation generation system that pulled information from a database and created a Word document with dynamic headers, footers, formating, content, etc. I used it to created 1000+ password protected, pre-formatted Word documents that we provided to the client. Anytime the format needed updating or any data needed to be changed all I had to do was rerun the Perl script rather than update all of those docs.
I'm not going to say that this was easy by any means, it took quite a bit of research and tweaking to finally get right. XML would, no doubt, make this task easier but I don't necessarily think it is the panacea that will FINALLY permit us to automate docs and reports that need to be generated and shared. My point is that with "a Perl script and a bit of intelligence" document automation is something that can be done now.
I'm a doctoral student. The output of my experiments are converted via a messy perl script into OpenOffice XML format. When I'm done running my experiments I simply pop them open and see a table and a graph.
Doesn't sound like a big deal, but I run hundreds of experiments and this just saves me a hell of a lot of time. I can easly convert these to XL format so that I can share them with my advisor etc.
Did Sun get all those people using MS Office to convert their documents to StarOffice / OpenOffice XML format, which they can't even use with MS Office ?
It just doesn't make sense. Maybe it has something to do with Chewbacca ?
PJRC: Electronic Projects, 8051 Microcontroller Tools
my_plot(my_data)
Xix.
"Everything is adjustable, provided you have the right tools"
On the other hand, OO.o's XML format + schema will be available even to competitors and theoretically beyond the life span of OO.o. One way for OO.o to encourage users to think in a structured is through style sheets. Style sheets and document templates can save a lot of wasted time and effort. But again, what would people do with the spare productivity if formatting were done in 5 minutes, instead of spending 2 days formatting manually and re-formating manually various reports and presentations?
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
> Is anybody out there writing Perl/Java/whatever programs to
> take advantage of StarOffice XML?
Yes, actually I started doing that yesterday: I'm using Perl and XSLT to build documents in StarOffice XML (or actually OpenOffice.org XML), converting some 500 XHTML pages into one huge OpenOffice.org document. It's amazingly easy!
- Conversion from Word to OpenOffice. This is currently done in manual mode until I get someone to write me a batch process based on the OO APIs.
- Conversion to Simplified DocBook with OOo2sDbk. Works perfect for me.
- Analysis with Lucene to find often / rarely used words
- Presentation of a subset of these words to the user for definition as important or unimportant for the project
- Based on the user decision, the documents are connected in a structure remotely similar to a mindmap.
There are a few more steps possible but these are currently only in planning or not fully implemented, so I'll ignore them here.Once the map's done, it's all refinement of the mappings through user interaction, gradually refining the map by adding of abstractions (WebSphere here, WebLogic there, abstract to ApplicationServer, etc.) and adding or removing relations, documents, etc.
The result is a hyperindex of the documentation.
It's not really revolutionary in that such a thing as never been done before, but I shudder at the thought to do that with Microsoft Office as a base.
Just take one example: mail-merging. Every word processor on the planet can import mail-merge data in CSV, especially if you put field names in the first line. CSV is a pain because there's no way of representing fields that contain more than one line and there's no consistency on how to deal with the quotemark within a field.
I had hopes for XML. But no-one's designed an XML-based format (remember, XML isn't a data format, just a basis for designing them) for the most common single data transfer operation. I installed the Word 2003 beta - "XML support", great, I thought, MS will have invented an XML-based data transfer format and everyone can standardize on it. But... nothing. Mailmerge import is still CSV or ODBC.
So don't knock CSV: it really does have a standard way for transferring tabular data, and XML doesn't.
The gigantic propaganda campaign about the "wonderful new things" that semantic markup would make possible was always just a masturbatory fantasy by people who'd never implemented anything, encouraged by SGML contractors who saw an opportunity to broaden their target market.
At the root of this delusion is what I call "Goldfarb's conjecture"-- the claim that document styles are superficial representations of underlying semantics. If Goldfarb were right, then tagging document semantics would be no harder than tagging styles, so this sort-of-works for titles and highlighting.
But hardly any other semantics have associated styles, so tagging them becomes sheer drudgework for almost no payoff. It's absurd to have to tag every name as a name, every place as a place, etc. This metadata belongs in headers, not as embedded tags.
So the real outcome of the XML-scam is that the effort to add metadata to webpages has been set back at least five years. What should have been emphasized was META headers for: Yahoo topic-category, DMoz topic-category, list of persons, list of places, list of companies, list of things, dates discussed, document type (eg timeline, image gallery, biography, etc).
My employer and Sun currently have a cooperation up and running. Basically it's storing StarOffice XML data via WebDAV into our XML database. You can search structured documents in the DB and on the publishing side there ain't nothing a Coccoon framework couldn't do. I'm talking about an XML based document repository. No binary data means we're wide open to any kind of server side application.
Dunno why this solution isn't neither marketed by Sun nor us since all you need is our DB and Staroffice out of the box.
20 minutes into the future
Maybe because its not a closed format, hence all the open-source pdf generation programs.
Frankly, I'd rather see more PDF generation than XML. If I sit down and spend hours designing a book or report it's more important to know that it will appear as designed than that it can be converted into a mass of raw data and presented in any half-arsed way by someone so primative that they still think PowerPoint is a pretty good idea.
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
.TH Who understands troff, or postscript now? .B Maybe if there were nice WYSIWYM troff editors it would be a different matter, with XML you have something easily parsable and transformable with easy availability of good programming language parsing and manipulation libraries, which is a dream to integrate with RDBMSs.
You want to make troff useful? Write a GUI or TUI editor which uses it as it's default format. Then write a set of free to use libraries for use with all of the major languages which make it a doddle to parse, generate, manipulate troff data.
Government of the people, by corporate executives, for corporate profits.
There has been such an explosion of dtd's that it's far more easier to get the job done the old way than with an xml based approach.
Most people don't have the time to learn 50 new ways to reformat their data, let alone learn 50 new api's to process each of those ways when the old way worked just fine.
Unless xml is applied to a totally new domain, and not just a re-doing of what is already done in a different way, it's pretty much a waste.
We have an application, that recently underwent a major GUI update. Unfortunately, we had to drop an old and seldom used feature.
Now older clients would loose information, if they had used that feature. So we developed some filter via XSLT (which is builtin in our application now) to migrate that info to XML, that is OOo and M$ office formats.
Its great; you write documentation in xml markup, and then you can run a host of tools to generate pdfs, webpages, text files ... any format you want basically. And everything is taken care of for you; a contents page is generated automatically, all the text formating (bold, italic, headers) is done for you ...
I see no reason why someone couldn't write a word processor to edit doctype; instead of applying bold, etc you would have menus to make selected text "A command line", "A file name", etc
I think Sun use it for a lot of their documentation; PHP and others use it for their web documentation.
50% is not enought because MS' margin on Office is more than 70% and if there are users which were forced to buy Office because "other have it" then they should get 100% refund plus damages (loses caused by working with Office - for example loses caused by viruses?).
[Note: This is offtopic to "Fulfilling the Promise of XML-based Office Suites?".]
hany
Hey Linux masters: what about adding yet ANOTHER layer of complexity to the "simple" and "quick" editors every Linux-system on this planet has installed.
.conf with bloated, feature-laden but useless "editors" (some would call them "operating systems") that no one can use without ten man months worth of reading documentation.
.conf in console mode now and then. And yes, I know Google and I used it trying to find my perfect stupid editor but I found none.
:)
As you now have managed to make it an utterly frustrating experience to edit one fcking
This is my only serious criticism about Linux. Have been using that for quite a year, I still couldn't figure out how to use these damn tools properly. I would BUY a decent simple editor that resembles the plain old dos-edit.com in terms of simplicity and stupidity. So if you like it or not: I AM stupid and therefore I want a stupid editor. I don't want anything else in a text-editor in console mode than a string search mode and some kind of clipboard to copy&paste some lines of text.
I don't want to do fancy macro-stuff, I don't need to have supercomplex features, I just want to locally edit a stupid
Changing graphic hardware is a royal pita when you find out the driver's not working and you have to reenable the old one in xf86.conf when all you have available is a bloated VI. Call me Joe (L)User if you like, but I don't want to learn VI, I want to replace it. It doesn't matter if VI is installed on a million other *nix-systems or if Linus Torvalds' mother could understand vi, I don't. And I want to scrap this program as I have a personal hate against it.
I just wanted to tell you that, you vi-pimpz
This is an even better example of why Star Office and OpenOffice.org will overtake MS Office, as Sun only now bundles a cripple-ware database app, and OpenOffice has none at all.
OpenOffice can create and use DBF files natively, but this functionality is not obvious. You have to create an empty directory to hold the DBF files (the database) then setup that directory as a data source. You can then right-click on "Tables" in the data source navigator and select "New Table Design". It will allow you to design the DBF using an interface similar to MS Access.
Besides, most desktop "databases" are actually spreadsheets. Most users don't know enough about databases to be able to take advantage of them, even if they had enough data to make it worthwhile to learn.
I find the easiest way of getting usable XML out of Word is you use Word's save as HTML function and then running W3C TidyLib to get rid of all (most) of the M$ crap.
This leaves you with a HTML-esq document that you can feed to an XSL:T and get whatever XML you need.
I did consider using OO to open the Word document and to save them as XML however I had trouble with its API (I also had trouble with automating Word but here I had plenty of biter experience to draw on.).
Our customers generate .pdf's and then, later, have to report on the data therein contained. .pdf's represent the reportable data already satisfying the business rules. To be able to get to _that_ with xml.... and not have to hire AcroEinstein....
Then, on reporting, they use more cycles to take it (a second time) back from the *RDMS* paradigm to the business rules paradigm. The
-b
Sorry that I have to post anonymously and be a bit vague about this, but some branches of the German federal government are starting to automate document generation using the OpenOffice/StarOffice APIs.
Our customers generate contracts in .pdf format.
Later on, they want to generate reports. These reports go back to the database and (for a second time) transfer from the *RDBMS* paradigm to the user paradigm. If I could get to the data on those contracts (without having to find AcroEinsteins to do it)....
-b
To me their seems to be two reasons to use XML with a word processing application:
These have differing requirements and are unlikely to be met by one XML file.
To facilitate interchange you need to use standards, and the first rule with standards is you CAN'T create your own because you don't like the existing ones. DocBook, HTML, RTF, .DOC have their problems but are a lot more interchangeable then OO's format which can't yet be opened by anything.
To facilitate procession and automatic formatting is much more tricky. You really want to extract the schematic structure of the document not its current formatting. OO's goals don't (yet?) seem to be to create a 'tagless editor' that allows the WYSIWYG editing of true structured XML documents (Using your own DTD or Schema).
This sounds more critical then I mean as I think OO have made the correct decision in going for a proprietary (even if they now want it to become 'the' standard) document format and concentrating of then needs of the vast majority of users who just want to be able to save and load richly formatted documents.
If I want to interchange documents then I use RTF, if I want to edit XML then I use an XML editor, if I want to convert a document to XML for further processing then I export it to XHTML (from whichever word processor). I use OO (well StarOffice) because it is the best word processor not because somewhere behind the scenes it is using the latest buzz technology
I recently wrote Perl script to download multiple congregation church membership directories from our churches website and manipulate them into comma-delimited, tab-delimited, and nicely formatted OpenOffice Calc (spreadsheet) and Writer (word-processor) formats directly from the Perl script. Because the Microsoft formats are closed, I could not output into those formats directly from the script, nor do I feel like reverse engineering the formats to figure out how.
I then used OpenOffice to save the files as Word and Excel formats for those who don't have access to OpenOffice, but I included a reminder that OpenOffice is free and included a link to the website.
This would have been impossible without OpenOffice, and I thank them for their work. The final output has headers, footers, special formatting and prints out like a professional document, not roughly formatted text output in courier.
I am leveraging XML all around in gcompris
We have inline documentation in xml and is is being translated to HTML or OO with xslt.
Look at:
html version
oo version
Yes it is great, yes it works.
I like the idea to convert every document readable by Openoffice.org to Latex. writer2latex will do the job but is still beta. Rememeber that Openoffice.org reads Microsoft Word documents also.
PS: KOffice 1.4 will use the openoffice file format by default.
One could analyze a document to generate meta-data about it. This could then be fed into "Storage" - the file system with natural language querys. One big problem with Storage would seem to be creating the database, but making it easier to read documents could help.
soffice -headless -p
It runs OO in the background, prints and exits.
HTH.
oh brave new world, that has such people in it!
"MS won't stand for an XML file format -- it's human-readable. the last thing MS wants is for their file format to be easily convertible and transformable. it's a pity, because switching Office files to XML would quickly make them insanely useful."
You people are so biased. Now Office has suddenly "dropped the ball." Of course, that meme will permeate through all Slashbots' thinking, whether or not they've even tried Office 2003.
Here is a sample XML file. The original message said "This is a <b>test</b> of <b><i><font face="verdana" size="24">XML</font></i></b>."
NOTE:  ; Slashcode adds random semicolons and other garbage for some reason.
<?mso-application progid="Word.Document"?>
<w:wordDocument w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve">
<o:DocumentProperties>
<o:Title>This is a test of XML</o:Title>
<o:Author>Preston Sumner</o:Author>
<o:LastAuthor>Preston Sumner</o:LastAuthor>
<o:Revision>1</o:Revision>
<o:TotalTime>1</o:TotalTime>
<o:Created>2003-09-18T15:29:00Z</o:Created>
  ; <o:LastSaved>2003-09-18T15:30:00Z</o:LastSaved>
<o:Pages>1</o:Pages>
<o:Words>3</o:Words>
<o:Characters>20</o:Characters>
  ; <o:Company>White Goat Studios</o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:CharactersWithSpaces>22</o:CharactersWithSpaces >
<o:Version>11.5604</o:Version>
</o:DocumentProperties>
<w:fonts>
<w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/>
<w:font w:name="Verdana">
<w:panose-1 w:val="020B0604030504040204"/>
<w:charset w:val="00"/>
<w:family w:val="Swiss"/>
<w:pitch w:val="variable"/>
<w:sig w:usb-0="20000287" w:usb-1="00000000" w:usb-2="00000000" w:usb-3="00000000" w:csb-0="0000019F" w:csb-1="00000000"/>
</w:font>
</w:fonts>
<w:styles>
<w:versionOfBuiltInStylenames w:val="4"/>
<w:latentStyles w:defLockedState="off" w:latentStyleCount="156"/>
<w:style w:type="paragraph" w:default="on" w:styleId="Normal">
<w:name w:val="Normal"/>
<w:rPr>
<wx:font wx:val="Times New Roman"/>
<w:sz w:val="24"/>
<w:sz-cs w:val="24"/>
<w:lang w:val="EN-US" w:fareast="EN-US" w:bidi="AR-SA"/>
</w:rPr>
</w:style>
<w:style w:type="character" w:default="on" w:styleId="DefaultParagraphFont">
<w:name w:val="Default Paragraph Font"/>
<w:semiHidden/>
</w:style>
</w:styles>
<w:docPr>
<w:view w:val="normal"/>
<w:zoom w:percent="100"/>
<w:doNotEmbedSystemFonts/>
<w:proofState w:spelling="clean" w:grammar="clean"/>
<w:attachedTemplate w:val=""/>
<w:defaultTabStop w:val="720"/>
<w:characterSpacingControl w:val="DontCompress"/>
<w:optimizeForBrowser/>
<w:validateAgainstSchema/>
<w:saveInvalidXML w:val="on"/>
<w:ignoreMixedContent w:val="off"/>
<w:alwaysShowPlaceholderText w:val="off"/>
<w:compat>
<w:breakWrappedTables/>
<w:snapToGridInCell/>
<w:wrapTextWithPunct/>
<w:useAsianBreakRules/>
<w:useWord2002TableStyleRules/>
</w:compat>
</w:docPr>
<w:body>
<wx:sect>
<w:p>
<w:r>
<w:t>This is a </w:t>
</w:r>
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
"Sufferin' succotash."
I find in word saving as RTF then using AceHTML and saving as results in pretty clean HTML.
Amongst the Open Source word processor projects, I think KWord and AbiWord should standardize on one file format (OO already has an XML format) or maybe they should all share the same format.
... and there's nothing better ;) ) than do it for everyone else. Corporate environments dread not being able to read client's files which are e-mailed to them.
Over the years of writing essays for University, I've written documents in Word Perfect, Word and Open Office. While I first compose them in plain text and save the final draft in plain text, there's nothing worse than trying to open a document in a different word processor and having - all - the formatting thrown off.
Since the KWord, AbiWord and OO are all open source it would be nice to have standard file formats. Makes sharing much easier. If not for me (blantant selfishness
Ask again in three.
Healthcare article at Kuro5hin
Micro$oft didn't "drop the ball." It popped the ball.
Face it, Ballmer and Gates never had any intention of actually going to an open spec with their office documents: they have too much to lose. They simply wanted to gain some goodwill from the tech community, so they feigned an interest in that direction.
Sorry, but without Redmond on board and willing to hand over the keys to the store, this idea's another interesting-but-fruitless venture, doomed to fail (or, in this case, doomed to be adopted by such a small percentile of users as to render it useless).
"Don't matter how New Age you get, old age is gonna kick your ass." - Utah Phillips
Well I for one would really really welcome such a thing. One of the biggest problems I have faced during the development cycle is the PHB insisting that all documents should be in M$ Word and latex is a strict no no ( for whatever braindead reason ) I see a very welcome atmosphere where I write a bunch of perl scripts to actually generate templates for my requirements Func specs, and design documents and actually derive one document from another.
I also could deploy a bunch tools to actually derive requirement tracing matrices and other metrics for a particular project directly from my documents and also maintain the documents in CVS. And the PHB gets everything in M$ Word. Its a perfect world !
XML allows you to create your own tags in your own format. Just because it's ascii doesn't mean anything. If you don't know what each element is meant to do, then who cares?
;->
It makes storing it in a database easier, but so what. XML structure is standard, but not the definition extentions.
Just something new for programmers to do.
We had a requirement for some of our applications to do "all sorts of wonderful new things" with MSWord documents using a web application.
We basically had MSWord documents that need to be uploaded and downloaded to and from a web application. In the document itself, we needed some areas to be editable by user and other areas rendered by the application, like a report. We also needed to extract parts of the document to populate a database. Obviously we could not use a document in MSWord format.
We also could not use StarOffice because this is an application that will be used across the enterprise. That is potentially tens of thousands of people and MSWord is what is used across the enterprise, like it or not.
As a result we opted to use RTF documents, which are read and writeable by MSWord. It seemed like a logical choice.
What we did is mark areas of the document using XML like tags, using word's hidden text feature. As long as they typed between the tags, we could extract the text. We also marked other areas with similar tags to be rendered by the web application.
There are actually two applications that do this: one uses Java and JSP, the other uses ColdFusion. Because RTF is text, we were able to render portions of it like we would HTML, using JSP or CFML. To process the tags we would simply use pattern matching using regular expression capabilites of either language.
Does it work? Well, mostly it does. It can be problematic because the RTF that MSWord generates is about as ugly as the HTML it generates. Also, if a user accidently removes one of the tags, it breaks. However, for the most part it works, I guess.
The only real problem with the electoral college is that most states implement it poorly. Nowhere does it say that the votes have to be "winner take all". There's no constitutional reason why California's 54(?) votes all have to go to one candidate. A more sensible scheme would be that each candidate gets one electoral vote for each district where they win, with the last two going to (say) the candidate who takes the most districts.
Of course, Republicans and Democrats have no incentive to do that. Under the current scheme, most of the states are assumed to go one way or the other, and they only have to campaign in a few "swing" states. It's a lot more efficient for them, and it makes it all the harder for third-party candidates to get taken seriously ("S/he never got so much as a single electoral vote!").
I pretty much would have to agree with all your points. I'm not saying the OS stigma is well reasoned (although it is still a little ways from being childs play to find reliable support) and I do believe that open source is (and will continue) finally begining to find acceptability in the mainstream business place (thanks in part to the large corperations lending credibility and in part to the OS community starting to 'get it').
With functioning DRM on the horizon and Microsoft's determination to stamp out piracy its going to continue to get a whole lot more interesting in the open source community.
Quack, quack.
I've searched for a while how to generate documents from a template and data in a database to publish on paper the schedule of my hikers association. A requirement was to be able to reformat the document with a word processor to tweak the page setup after the extraction (so PDF was not a solution).
Finally I developped a solution generating OpenOffice.org Writer documents using Java and Velocity (jakarta.apache.org) -based templates for content.xml/styles.xml of the OOo document. The Java code expose an object model (built from Java classes that handle database extraction) to the Velocity templates.
For more information you can contact me: dolmen bigfoot com (email).
The MS Roadmap for XML and Office is here: http://www.microsoft.com/office/using/column21.asp
every time I hear about XML as some kind of standard, all I can think about is TeX.
Not only is TeX vastly more simple than XML, and superior to anything else out there in terms of the quality of output it produces, but it is also very compressable.
Futhermore, TeX is clearly the best piece of software in existence. It is essentially bug-free. Despite the author of TeX offering a reward for any bug in TeX found, no bug has been found in it for a very very very long time.
Finally, TeX has a superb record of backwaards compatability, and will always have a superb record. Something written in TeX today will output the same now as it does 100 years from now, because the TeX engine has been frozen.
social sciences can never use experience to verify their statemen
To allow a computer to provide a user interface that can provide an interprative level of interaction ie: what it "thinks" it wants you to do requires a considerable amount of power plus an extended period of interaction with the user. Catch 22 - use the computer a lot so that it can "learn" how you want to use it, so that you can use the computer. Not to mention most end users I know, dont react well when the computer fails to do what they wanted it to do, regardless of how illogical the instructions they gave to it in the first place. How many times have you had to do more than just say you were sorry to your wife for failing to understand what she expected you to do regardless of what she told you to do. Computers dont need better interfaces just a credit card so when they muck things up they can buy you dinner, regardless of whose fault it was.
Chaos - everything, everywhere, everywhen
You don't get paid well for teaching. Or for learning or doing. And industry looks on time spent teaching as wasted time even though I (personally) could easily code circles around many industry geeks.