Slashdot Mirror


Fulfilling the Promise of XML-based Office Suites?

brentlaminack asks: "Almost a year ago Tim Bray of XML fame said 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.' Now that MS has dropped the ball on the XML Office front, and StarOffice has fulfilled its XML promise, where are all those 'wonderful new things?' Is anybody out there writing Perl/Java/whatever programs to take advantage of StarOffice XML? Could this be an opportunity for Free/Open/Libre software to leapfrog MS Office in real productivity as XML proponents have promised all along?" What kinds of new and wonderful things can you come up with?

11 of 432 comments (clear)

  1. Well... by Otter · · Score: 4, Informative
    ...when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.

    Well, I'm taking a break right now from generating new Excel graphs by copying old ones and changing the source data, which isn't so bad, and those fucking error bars, which is. Oh, and the scatter plot points are superimposed so you can't click on the back ones.

    So if I could do a find&replace on a flat file, I'd have been done an hour ago.

    Other than that, no, I can't imagine either. VBA exists now and it's not like we're all flying around with wings and harps.

  2. Re:standardization by chill · · Score: 5, Informative

    The next major release of KOffice is supposed to adobt the OO file formats as their own standard.

    --
    Learning HOW to think is more important than learning WHAT to think.
  3. It's called troff by DrSkwid · · Score: 2, Informative

    and we've had it since most /.ers were born

    then there was postscript

    now XML

    whee, I have candyfloss in my hair

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  4. Yup, peeople are by amblin · · Score: 4, Informative

    Take a look at Axkit's, OpenOffice filter.

  5. XML and MS Office by Mr.+Ophidian+Jones · · Score: 3, Informative

    I guess there's XML and there's XML and getting between them is not necessarily easy.

    Microsoft made a big deal about the most recent versions of Office writing out XML, but that was because XML was a buzzword, sounded as if it might be more open than ".doc", and was essentially a selling point.

    From what I've read, people have been underwhelmed with the XML coming out.

    If only a similar set of transformations could be developed for OpenOffice to import and export the XML of the latest version of Microsoft Office. From what I understand, the schema is not documented and the formatting and rendering rules for documents are still kept a private affair, just as it has been for .doc files.

    You're still locked-in, dude!

  6. Missing some of the points by evil_roy · · Score: 2, Informative

    Formatting can be handled by whatever.

    The strength is in the meta-data. By using XML the doc can be formatted by anything that can understand it. But formatting is not the point.

    The docs can then be referenced in a relational database - searched,indexed & importantly shared and migrated to other indexing systems or stripped.

    The XML 'magic' is very simple. The use of the data is whatever you want it to be. Do you want to restrict access, provide access, record access, implement version control and X-referencing - then using this technology is for you.

    It has sfa to do with troff/groff/cat/echo/print and everything to do with document collaboration and sharing.

  7. Re:Microsoft Dropped the Ball? by YouAreATool · · Score: 4, Informative

    At this point, people should realize /. articles are mostly fretards talking out their ass. I too read this article, thinking: wft? As I am writing this comment, I'm looking at my (beta) Word 2003 file save dialog and an example XML doc I just made. It round-trips all formatting and junk in the XML format. It has a "save data only" checkbox in the saveas dialog, and can support xsl transforms (you supply the xsl) on export. If I cared, I think I could make it export OpenOffice format pretty easily. The high-fidelity XML file has a lot of junk, but it's all XML.

  8. Re:XML... by ReelOddeeo · · Score: 3, Informative

    Once you learn how to do it, it is definitely possible from, say, a Java program to connect to a running OOo (OpenOffice.org==OOo) make it open a document and re-save it in Word format. You can even make OOo do this without flashing anything on the screen.

    There is a definite learning curve. You need to learn Uno.

    IMHO, despite the learning, this would be way easier than trying to extract the parts of code you need from OOo and building a "converter" program. Maybe I say this because I have spent the time learning Uno and can now program OOo functionality from multiple languages, and how to integrate it into a web server like this seems obvious to me.

    I have personally programmed OOo to do things from: OOo-Basic, Java, Python and MS Visual FoxPro. I know from postings from others that it is most definitely possible to use Delphi and VB.

    Just as an example of what can be done, I built a Maze generator in java. You can run the maze generator on a different computer. Even a different OS. It connects to a running OOo, and then creates a multi page drawing of complex mazes. (You can get it at www.OOoMacros.org or at www.OOoExtras.org.)

    --

    Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
  9. Re:Word to HTML to XML to HTML by bWareiWare.co.uk · · Score: 3, Informative

    I find the easiest way of getting usable XML out of Word is you use Word's save as HTML function and then running W3C TidyLib to get rid of all (most) of the M$ crap.

    This leaves you with a HTML-esq document that you can feed to an XSL:T and get whatever XML you need.

    I did consider using OO to open the Word document and to save them as XML however I had trouble with its API (I also had trouble with automating Word but here I had plenty of biter experience to draw on.).

  10. Actually, I have written a perl script by davidbailey · · Score: 2, Informative

    I recently wrote Perl script to download multiple congregation church membership directories from our churches website and manipulate them into comma-delimited, tab-delimited, and nicely formatted OpenOffice Calc (spreadsheet) and Writer (word-processor) formats directly from the Perl script. Because the Microsoft formats are closed, I could not output into those formats directly from the script, nor do I feel like reverse engineering the formats to figure out how.

    I then used OpenOffice to save the files as Word and Excel formats for those who don't have access to OpenOffice, but I included a reminder that OpenOffice is free and included a link to the website.

    This would have been impossible without OpenOffice, and I thank them for their work. The final output has headers, footers, special formatting and prints out like a professional document, not roughly formatted text output in courier.

  11. Here's why your wrong by Overly+Critical+Guy · · Score: 2, Informative

    "MS won't stand for an XML file format -- it's human-readable. the last thing MS wants is for their file format to be easily convertible and transformable. it's a pity, because switching Office files to XML would quickly make them insanely useful."

    You people are so biased. Now Office has suddenly "dropped the ball." Of course, that meme will permeate through all Slashbots' thinking, whether or not they've even tried Office 2003.

    Here is a sample XML file. The original message said "This is a <b>test</b> of <b><i><font face="verdana" size="24">XML</font></i></b>."

    NOTE:&nbsp ; Slashcode adds random semicolons and other garbage for some reason.

    <?mso-application progid="Word.Document"?>
    <w:wordDocument w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve">
    <o:DocumentProperties>
    <o:Title>This is a test of XML</o:Title>
    <o:Author>Preston Sumner</o:Author>
    <o:LastAuthor>Preston Sumner</o:LastAuthor>
    <o:Revision>1</o:Revision>
    <o:TotalTime>1</o:TotalTime>
    <o:Created>2003-09-18T15:29:00Z</o:Created>
    &nbsp ; <o:LastSaved>2003-09-18T15:30:00Z</o:LastSaved>
    <o:Pages>1</o:Pages>
    <o:Words>3</o:Words>
    <o:Characters>20</o:Characters>
    &nbsp ; <o:Company>White Goat Studios</o:Company>
    <o:Lines>1</o:Lines>
    <o:Paragraphs>1</o:Paragraphs>
    <o:CharactersWithSpaces>22</o:CharactersWithSpaces >
    <o:Version>11.5604</o:Version>
    </o:DocumentProperties>
    <w:fonts>
    <w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/>
    <w:font w:name="Verdana">
    <w:panose-1 w:val="020B0604030504040204"/>
    <w:charset w:val="00"/>
    <w:family w:val="Swiss"/>
    <w:pitch w:val="variable"/>
    <w:sig w:usb-0="20000287" w:usb-1="00000000" w:usb-2="00000000" w:usb-3="00000000" w:csb-0="0000019F" w:csb-1="00000000"/>
    </w:font>
    </w:fonts>
    <w:styles>
    <w:versionOfBuiltInStylenames w:val="4"/>
    <w:latentStyles w:defLockedState="off" w:latentStyleCount="156"/>
    <w:style w:type="paragraph" w:default="on" w:styleId="Normal">
    <w:name w:val="Normal"/>
    <w:rPr>
    <wx:font wx:val="Times New Roman"/>
    <w:sz w:val="24"/>
    <w:sz-cs w:val="24"/>
    <w:lang w:val="EN-US" w:fareast="EN-US" w:bidi="AR-SA"/>
    </w:rPr>
    </w:style>
    <w:style w:type="character" w:default="on" w:styleId="DefaultParagraphFont">
    <w:name w:val="Default Paragraph Font"/>
    <w:semiHidden/>
    </w:style>
    </w:styles>
    <w:docPr>
    <w:view w:val="normal"/>
    <w:zoom w:percent="100"/>
    <w:doNotEmbedSystemFonts/>
    <w:proofState w:spelling="clean" w:grammar="clean"/>
    <w:attachedTemplate w:val=""/>
    <w:defaultTabStop w:val="720"/>
    <w:characterSpacingControl w:val="DontCompress"/>
    <w:optimizeForBrowser/>
    <w:validateAgainstSchema/>
    <w:saveInvalidXML w:val="on"/>
    <w:ignoreMixedContent w:val="off"/>
    <w:alwaysShowPlaceholderText w:val="off"/>
    <w:compat>
    <w:breakWrappedTables/>
    <w:snapToGridInCell/>
    <w:wrapTextWithPunct/>
    <w:useAsianBreakRules/>
    <w:useWord2002TableStyleRules/>
    </w:compat>
    </w:docPr>
    <w:body>
    <wx:sect>
    <w:p>
    <w:r>
    <w:t>This is a </w:t>
    </w:r>
    <w:r>
    <w:rPr>
    <w:b/>
    </w:rPr>

    --
    "Sufferin' succotash."