Tim Bray on Microsoft Office
jgeelan writes "The co-inventor of XML, Tim Bray, has been talking about the newly XML-enabled version of Microsoft Office, code-named 'Office 11' and tells XML-Journal that 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"
Wow, I was way off when I predicted that Microsoft would further obfuscate their Word format. This seems to be in all respects a Good Thing.
StarOffice has used XML for their native file formats for some time now; I wonder if this means we'll see an even better-quality translator between the two formats?
The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
.... I guess it's just MSXML rather than THE standard XML. But we can figure it out with some "intelligent guesswork" now because the file would be human-readable.
--
Error 500: Internal sig error
I've been waiting for this. It's gonna allow me to goto a full Linux system and not have to pay any money...I hope.
I'm wondering, can MS charge for licences to write tools that parse the XML documents?
internet like monkeys'
Why the fuck would MS give up their MS Office file format monopoly?
Yes Microsoft, Open Standards really are kind of cool, arent they?
were you expecting to see a sig here? perhaps you'd rather see the inside of an ambulance!
The most important question, besides if the MS Word XML format will be well-documented enough, is if it will be the default saving format. Most MS Office users simply don't care enough to save MS Word documents in RTF, for example, even if it's more than good enough for the vast majority of the documents.
Not the main issue on the article, but it is unfair to single someone as the inventor of XML, which is just a streamlined version of SGML which is an evolution from IBM's GML.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
I really have my doubts about wether Microsoft will allow "any programmer with a Perl script and a bit of intelligence" to muck around with Office documents.
I'm guessing their XML document format will be just as hard to decyper and the current office formats.
Life is too short to proofread.
One small such point is when IBM gave out the specs to their hardware for PC allowing everyone to clone it, while Apple did not.
This could be such a point. Maybe in 10 years we'll look back at this and ask ourselves "Why the heck did MS XML-enable their Office app, releasing the hold that they had"
Only time will tell I guess.
I Play Hattrick
You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
I beg you pardon? Smelly programmers can keep their hands off my documents. If I wanted you to have them, I'd have emailed them to you as plaintext. I wasn't aware the the Office license meant my documents were common property....
... and today's pet project has
MS is trying to time this right.
Right now they are seeing diminishing sales, possible shrinking market share. Most of the danish public sector is looking to save money using OpenOffice/StarOffice.
MS needs to increase their compatibility with other options, as they would otherwise force customers to convert every single user away from MS at once, instead of OpenOffice coming in slowly.
They can also hope, that their format is setting the standard, and the other companies will have to play catch-up rather than the other way around.
Why are all anonymous comments at +3?
...all sorts of wonderful new things can be invented that you and I can't imagine...
When will MS ever learn that we don't WANT to imagine how wonderfull the MS Office Universe is ?
When will I end this grieving ? When will my future begin ?
Would it be feasable?
COULD it make it illegal to "reverse engineer" the document format?
I can very easily see that if it could, microsoft could include a clause that explicitly prohibits GPL programs from interpreting the XML...
I wouldnt put anything past microsoft when their trying to keep their formats closed...
Hrmmmm...
WTF!? XML shouldn't need to be documented. The whole point is to create a human readable file that is parseble by computer. If MS Word delivers an XML file that I can't figure out, it's not XML.
"A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
There's lots of speculation here about MS doing stuff to create lock-in with this new format, but I want to actually see the format. Is there any documentation anywhere about it? Or does someone out there have a document in the new format that we can take a look at? Of course, being XML we should be able to just open it and take a look. That would put an end to all this speculation.
That might be a big benefit for them. One of the main reasons why I have never considered using MS Office yet is their miserable support for Import filters. They cannot even handle Lotus' fileformats correctly..
perl -e 'printf("%x!\n",49153)'
As far as I can tell, one of the major reasons many businesses refuse to change over from Microsoft Office to cheaper options is due to file compatability. As our company's IT admin put it recently on the suggestion of using OpenOffice, "I get sent hundreds of Microsoft Word, Excel and Access documents a week. I need to know that I can open and access every single one of those without problems". An example of proprietry file formats helping Microsoft keep the monopoly.
However, if Microsoft Office documents become "built around an open, internationalized standard", i.e. XML, would this not enable the people behind OpenOffice, StarOffice etc to acheive total 100% file compatability and thus negate Microsoft's largest advantage with Office?
Of course, this could be yet another Microsoft "embrace and extend" tactic, a la` kerberos. Incorporate the standard in a bastardised form, claim standards compatability, then pollute it so you must be using Microsoft technology to properly interact with it.
Janie took my gun...
Just look at an HTML file exported form Word2k. I would not call that compatible with any HTML I've ever learned. Most probably the XML file exported from Office 11 will be a Microsoft specific file, specifying lots of Office specific ActiveX (aka OLE) info that cannot be emulated. And, hey, they can probably store binary data in XML. The only change is that most competing products will emit files that Word can easily read, i.e. M$ will get the biggest benefits.
So MS Office will use XML in the next versions?
It might be XML yes, but...
I have seen what MS Office did to HTML.
And I'm scared.
http://www.inspirelight.net/
I guess they trust on Palladium to make sure that XML-files can only be read and written using MS software.
-- Cheers!
Just because the file format, instead of binary, is "human readable", does not make it more open.
For "any programmer with a Perl script and a bit of intelligence" it doesn't make a difference if you read bytes (binary) or XML structures.
As long as you don't get a DTD with extensive comments on how to interpret the elements, along with some promise/guarantee that the DTD won't change every minor release, there is no real improvement at all.
The fact that XML is human readable is irrelevant, since no human shall read the files, but programs such as perl scripts shall. For them it makes hardly any difference; it is only marginally easier since you can use an existent XML parser instead of rolling your own (which is no big deal using the right tools such as YACC).
This 'openness' comes at a good time for Microsoft. They suggest openness in a time that they are criticized and attacked because of file-format lock in. Many 'advisors' shall be mislead, blinded by buzzwords such as XML as they are, and actually believe that this solves the issue.
"all sorts of wonderful new things can be invented that you and I can't imagine"
What?! I for one thing can imagine a Beowulf cluster.
Yeah, Imagine a Beowulf cluster of machines all running Windows XP and Office 11!
... the horror...
this sig has intentionally been left blank
... when all you need to convert Office files from one application to another with a simple XSL transformation.
I won't hold my breath.
It seems M$ has done their best over the years to protect their file formats... The implication now is Ballmer's enemy #1 (open office, ximian, koffice, star office, joe's office, etc) will be able to interchange documents seamlessly with M$ Office.
I don't know about anyone else, but the reason companies hold onto M$ (like grim death) is they receive documents via email in M$ format - defacto proprietary format.
There has to be an angle here. This can't be construed as a tactic to hold market share.
Dogbert: Here's the new XML-enabled Micro$oft Office 11 our company should upgrade to. ...
PHB:
Dogbert: The suite is saving their documents in a format they have invented called M$ Xtremely Malformed Language (XML), and they are impossible to decipher and reverse-engineer for compatibility.
PHB: (looks closely at the box Dogbert handed to him)...
Dogbert: But that's okay, because you don't undestand anyway.
Perhaps these announcements of XML compatible office file formats are just stalling tactics? MS has done it before.
MS now has a serious competitor in StarOffice/OpenOffice.org. And that competitor has two compelling advantages - it's cheaper/free, and open XML file formats. So when clued-up IT people say to their Pointy-Haired Bosses that they should use StarOffice/OpenOffice.org, PHBs can respond "but MS is doing that next year. We can avoid all the disruption of changing office suites just by waiting a bit and upgrading to the next version of MS Office. Besides, we're already paying for it." Then when MS actually releases Office 11, they will have used all sorts of devious and subtle devices to keep their lock-in of the file format, and MS and PHBs will be happy.
- Spreadsheet::WriteExcel
- Spreadsheet::ParseExcel
(there are also simpler interfaces if you want them too.)Or you could go the whole hog and use a SAX writer like XML::SAXDriver::Excel to create the documents from XML yourself.
(This is not to say I don't think XML native formats arn't cool and will have many uses, I'm just pointing out what you can do now.)
-- Sorry, I can't think of anything funny to say here.
With Open Office... one or two years ago...
---
Nothing here to see... move along... move along...
The article states that:
"The important thing," he explains, "is that Word and Excel (and of course the new XDocs thing) can export their data as XML without information loss..."
Does this mean that MSO will have the same support for XML as currently for RTF? In that case I'm not that excited. If the default will be to save as MS-word format, and not XML (or MS-XML as the case may be), then we are no better off. Only Microsoft is, as they are now able to import OpenOffice/StarOffice documents.
It's sort of like when Word could read WordPerfect documents in the old days.
There arises from a bad and unapt formation of words a wonderful obstruction to the mind. (Francis Bacon)
I think maybe it was the CEO of Microsoft Denmark. I'm NOT sure though
<uueWord2kDocument>
M"@D)("!'3E
M("`@(%9E7)I9VAT("A#*2`Q.3DQ
M($9R9
M92!V97)B87
</uueWord2kDocument>
Is this XML with buttloads of encryption?
I just cannot see their whole office suite being like StarOffice or Excel's export to XML... thats to to good to be true.
members are seeing something, your seeing an ad
Yes, the point of XML files is that their _syntax_ is simple and easily parseable by computers. But that doesn't tell you anything about the _semantics_ of a document. And as long as there is no proper documentation on what the mess of tags in your XML file means, there's hardly any way for you to hack together a Perl script to, say, extract plain text, or convert the Word XML file to an OpenOffice.org XML file, or whatever else comes to mind.
there is no money in software products anymore. microsoft sees it so should you. that is why they are doing the .net services and palladium thing. who cares about file type monoplies when they can dictate how you use your computers and sell you your monthly subscriptions.
Other MS products that use XML (Visual Studio.net, for example) actaully do it quite well. The VS.net generated XML, including project files, is clean and very readable.
Office's MS-XML will be even less compatible with sthe spec than MS-Kerberos or MS-Java/J++. Office is their cash cow. It brings in 30-40% of their revenues all by itself.
If you think there is even a remote chance in he-double L that MS will loosen their grip on this revenue stream, I have a bridge to sell you.
You can call this flamebait if you want, but what in MS's history would lead me to believe they are suddenly going to change their historic behavior pattern AND risk a huge amount of revenue at the same time?
python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
SQL Server has had an XML web gateway since version 2000. You can run any query and output it as xml or have an xml template pull the query and transform the results with XSL, all without one line of server side script.
ASP.net uses XML for all the human-readable files, and the IIS in windows.net server finally uses Apache-style configuration files which are also XML.
Pedro
----
The Insomniac Coder
XML is a format with nearly infinite possibilities for obfuscation, convolutedness and poorly defined standards. The most we can expect is the possibility to validate a file to absolutely certainly determine if it is compliant with the new Word format or not.
Contrary to the popular belief, there indeed is no God.
They bloated out all your HTML with meaning less XML and Useless Stylesheets.
I'm working with that weedy Word 2k at the office. And we use Outlook as a standard communication Platform. Believe me, that their Software often is such a pain isn't that much of a greater plan to rule the world, but more the flat-out ineptitude of delivering products with a conceptual consitency.
Looking at Frontpain and Word HTML and extrapolating XML from that, tells me they're gonna do just a crappy job as usual and really think they've done a great thing.
Just like the people sending me source code additions and DB content as Wordfiles. Nothing but simple inemptitude, I say.
Not that my System of choice, Linux, is that much more consistent. Mind you. With a bazillion Font methods, every single one of them looking crappier than the next and QT, GTK+, Motif, Lesstif, Inbetweentif, Swing, TK and whatnot and none of them following the same Clipboard behaviour it's just as weedy. Only it is under *my* control to change it.
That way, the bottom line is: With OSS if it doesn't work, there's another way. With M$ it's 'Game Over' with the first "Error in module [fill in random hexcode here]".
That's the simple difference.
We suffer more in our imagination than in reality. - Seneca
code-named 'Office 11'
awesome. Apparently the next version of the linux kernel is code named 2.6! Wow!
I've recently been reviewing a dozen of different software to convert from Word to XML.
.
So far the best tool I found is upCast (free for personal use) from http://www.infinity-loop.de/
To convert a Word file:
* Use Word's AutoFormat feature to convert visual formatting to Word styles
* Redefine all the text as Word styles
* Run upCast to convert to XML using the "XML (content, no DTD)" filter
* Run HTML Tidy from http://tidy.sourceforge.net/ with the parameters -xml -utf8 -clean -bare .
Other tools that might be worth a second look:
* Majix (Open Source) - http://www.tetrasix.com/
* WorX SE - http://www.xyvision.com/
* XML MarkupKit (in German) - http://www.eds.schema.de/download/MarkupKit/
* DocSoft LLC Word-to-XML - http://www.docsoft.com/w2xml.htm
The thread a couple of weeks ago about the death of META headers will apply 1000 times worse for semantic tags-- if the semantic web is going to work at all it needs to start from headers describing the webpage as a whole.
(Also, what's with XML-Journal's claim the article has three pages when it only has two?)
Yes so its portable. Yes so its (mostly) human readable. So what? So is GWBASIC. XML is just a data description format (I wont grace it by calling it a language , its not) and there have been plenty of portable DDFs in the past. Pdf , postscript (though the latter is actually a language). So why all the hoo-ha about XML? Seems to me that various marketing types have jumped on the bandwagon with this one and are going to ride it till the wheels fall off and take all the suckers along with them.
Look at the bigger picture of where Microsoft is heading. They're diversifying their line of business.
In the past, MS Office was the cash cow at Microsoft, but the market for office packages is rather
saturated... companies and governments are looking for cheaper alternatives etc. Not much room to
grow. Now they can afford playing the good guys by opening up their file formats, since they got
new markets to capture... mobile phones, handheld computers, home entertainment etc.
Don't be so fast...how does he know what goes on in someone else's mind. I've been imagining all sorts of good things in that regard, for quite some time.
And it's not the documents we're talking about...it's the content. The content that has been held captive by MS for so long.
That big sucking sound you hear it all those U-Hauls leaving Seattle.
The open office group should get together with the rest of the guys (abyword, koffice and maybe wordperfect) and work out a format that can be submitted to the ISO. Possibly based on the open office format.
Then goverments and corporation will adopt it for official documents so they can read their own documents in ten years.
When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
The article give Tim Bray XML "co-inventor" status. Come on. Ever since HTML was around people have been extending it with fake tags like , , etc. Sure XML is useful but hardly an invention.
Whats wrong with HTML and CSS2 for all your word processing? Then its totally cross platform and web-ready...
just a thought.
RJ
Last.fm - join the social music revolution
any programmer with a Perl script and a bit of intelligence
and I thought intelligence was a prerequisite to be able to handle perl ? :)
It's very easy to make an XML document that can't be processed with any common parser library. It will make programmers work extremely hard if they have to make different XML parser for M$-XML.
Now if the M$-XML isn't compatible with the standard XML what's the use? You still have to save it in M$-XML format to be able to use it with Word. If most coders want to use M$-XML it might even brake down XML standard since there are more Word documents in the world than XML documents put together!
Why has this post been moderated as a troll? There is nothing trollish about it at all.
when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"
so a new software release will "magically" convert every document ever made to XML? I dont think so. The fact that they will finally have compatability with the rest of the planet is nice, but I'll bet a $100.00 that they will bastardize xml to their liking just like how they did it with IE and HTML.
Do not look at laser with remaining good eye.
Looks like M$ has found a way to placate those various governments that are beginning to insist on open file formats for data storage.
Obsfucation of the word file format (and FUD that staroffice/openoffice etc won't read their files correctly) is, for a lot of people, the only thing keeping them from switching from M$ to Linux. M$ just wouldn't throw away a central part of their business model.
I don't believe it, and I won't until I see an XML file of a complex word document that is actually completely understandable.
It is simply not what others is claiming: <?xml version="1.0"><data>blahblah</data>
¦ ©® ±
This will be so useful!
F (*SDF8234587*& 348734
<xml format="ms">
USODIFU(@#*&$*&@#*&@#($&*FHS*H(*SDY
</xml>
I cannot wait!
XSL - the new MS vulnerability?
Rather than assuming this relates to eXtensible Markup Language, consider the following insider information:
M$ have been basing their business model on XML for years.
It stands for Kiss My License!
X
it doesn't matter if everyone is able to read, modify and generate Office-compatible files.
For many businesses, the ONLY thing keeping them using MS is file compatability. They can't change because it's industry standard, and they need to be able to share docs with their suppliers and customers.
I think this is a case of where they are abusing their monopoly situation.
I think they realize that companies are planning on moving to Linux from Windows and that if they can placate the masses with more accessible file formats for their Office suite with XML, then since it works only with their OS (especially with the SQL database that is going to be incorporated) then it makes it more difficult for users to move.
Don't know if this makes sense....
John Smith
WriteOnly
ReadWrite
None
"%$RGTKFDBGUIT&%TGBHG(%TGJ
ETGWESBJYSDGDRU%"$QGBTHJWE&QATGHAQT
Yes sure wonderful things could happen...
I used to love that quote - I would put it after every program I wrote. I used to get several patches a day on some of the more popular software to remove it. It drove the users nuts. :)
They'll do something to ensure only MS Office 11 can read these files. Strong encryption perhaps?
Microsoft is switching from a proprietary file format, to XML, and the first 100 comments are all flaming MS. WTF does it take to make you people happy?
.NET that they can make an entire programming framework (and at least 3 assocated languages) into an open standard and even have them ratified by the ECMA and maybe even ISO. Because of this people have already managed to port Perl, Python and many other languages to this framework before it even came out of beta! The guys at Ximian have even managed to port quite a bit of the framework itself as part of the Mono Project.
They've already shown with
So perhaps instead of perpetually slating Microsoft, you could get off your arse and do something useful instead.
Nick...
Why are all anonymous comments at +3?
Cowboy Neal finger trouble. This one's OK - isn't it?
I think you are right here. Actually I have a feeling that MS are aiming at a middle point between fully open and fully closed w.r.t. the exact format. Open enough such that people can index, summerise and process Word docs for content and document management systems. While at the same time being closed enough such that passing Word documents around is painful for those not using Office. (Think formating and printing)
1. Give the correct interpretation to the bytes representing the document content, in order to import the Office document in some other office suite using a different representation. This is mostly solved (thanks to years of trials and errors).
--
Simon
I presume we can expect "Extend" at Office 11's release, and then we can pencil in "Extinquish" sometime late next year?
Is that good for everybody?
Clearly, any adoption by ms of open standards is an attempt to co-opt the standard.
I've been wanting to process word docs with my perl scripts for years, and they fscking know it. They don't have to have some down the road conversion to XML to allow me either, all they have to do is open their fscking standards. What I wouldn't give for a microsoft word document api on linux that was reliable instead of what we have: reverse-engineer peices of cruft that enever get things quite right.
Since they haven't opened up in the past, I don't expect them to know either. Either (1) the project will get buried, (2) Microsoft will use a subverted MSXML standard somehow to make sure it's not usable by us, or (3) the xml documents will be encrypted and protected by Palladium so that your only hope of realizing this perl promise is to use a licensed copy of Microsoft Visual Perl#++.
11*43+456^2
As he is one of the people responsible for XML and Office 11 is going to be using XML as its native file format have you spotted the link (hint think of three letters...)
That aside, if MS do adopt XML as their file format AND they don't screw the way the HTML formatted output did then it is about time, and I would imagine that the people who came with XML are going to be happy to see their work being put to good use.
--- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.
Since its HTML capable why don't they call it "Word X11"
...to see where they're going with this. Word has been exporting to HTML, which is really some funky XML/XHTML with stylesheets that IE can read and display, for a while.
Unfortunately, Microsoft won't let it happen. The data may be "in XML", but that doesn't mean you can read it or generate it well. Instead, Microsoft will give you just enough to serve their business interests and nobody else's.
How? Office will probably stick undocumented base64 encoded binary stuff into the output, containing formatting information. You can use the document content, for example, with a database, but you can't load it into another word processor and preserve all the formatting. And in the other direction, sure, you can generate simple documents that Office will import, but you can't generate arbitrary Word documents--they will, again, have weird, undocumented tags and binary stuff.
In short: don't hold your breath. Microsoft isn't stupid.
Comment removed based on user account deletion
Sure, IBM lost control of the PC market...but is that better than what's happened to Apple?
Let's go back in time to 1985 and you can choose which company to invest in...IBM or Apple. Hmmm...tough choice isn't it? Their stocks have both appreciated almost the same amount since then! Shocking isn't it.
Here's a valid XML documentS DASDASDASDWQWE[/format ting]
.NET, VB.NET, and Win32. That basically means, OpenOffice would have to find a way of including both Mono, VB.Mono, and WINE and have to deal with all the compatibility issues that result from all the re-engineering.
[document]
[formatting]FASDFASDASDASDA
[text1>Hello World[/text1]
[text1>Testing 123[/text1]
[text1>This is a test of the emergency broadcasting association[/text1]
[/document]
You'd be extract the text from the document, but you can't format it without knowing what the "formatting" tag means. It would be a huge step backwards for OpenOffice import of MSDocuments.
Let's say that they eventually manage to decrypt the formatting and interpret it, chances are that the formatting text would decipher into VB.NET calls into the MS Win32 operating system. You couldn't format the document unless you emulated
Exactly, you can embed platform specific code in the XML. In particular, you can embed VB.NET, .NET, and Win32 calls in XML tags. The only way to interprete what the other tags actually refer to, you need to run on a platform that supports all these (i.e. MS Windows.NET).
what a shame that PDF will be killed considering what a bandwidth pig it is.
First of all XML requires a DOCTYPE, which I am pretty sure MSFT will closely guard through copyright, patents et al.
.DOC file in notepad or less and be able to read it completely, I will never for an instant believe this kind of statement. It's either another FUD joke, or MSFT has truly repented and will forever more do everything in the open. Yeah right. And I have a 16" penis!
Second, You can't attempt to understand the XML intentions because that would be in violation of the DCMA. Knowledge is Death.
Third, XML for identification of tags does not in any way implicate an open format for the documents themselves. I can create XML documents using trakemarked DOCTYPES and wrap it into a PGP encrypted file and still claim Open Standards use of XML.
Until I can open a
Hola :)
.ctt file that looks like this:
:)
Playing a bit with Windows Messenger, I found an option that lets you save your contact list under the "File" menu.
It creates a
XXXXXXX@hotmail.com
XXXXX@hotmail.com
XXXXXXXXXXX@hotmail.com
Looks pretty interesting
Best wishes from Valencia (by the Mediterranean Sea in Spain / España)...
It is unimportant that any average Word doc can be exported to xml because the average Word doc does not carry semantic meta-information. It carries stuff like "make this line bold and indent it 4 pixels" That kind of info is pretty much useless unless you're Google and you spend your days writing algorithms that parse semantics out of display information. The best case scenario for legacy Word documents would be the ability to save as FO.
The key feature is "It seems Word can also edit arbitrary XML languages under the control of an XML Schema" This, coupled with IE5+'s "Web Folders" (really WebDAV) Means that I can point my users to a schema/stylesheet combination, let create a compliant XML document in WYSIWYG mode in Word and then save it directly to my webserver over HTTP. On the server-side, I do ACLs, Versioning, etc.
XML content creation has long been the missing link in CMS software. XMLSpy has been doing this for a while now but they're f'ed now because they never quite got it right and now the 500 lb. gorilla is about to sit on them.
Why not a tag like
The document at MSDN doesn't seem to have anything to do with MS Office 11 or the new "built around XML" Office file formats. It simply explains how files can be imported to/exported from Access and Excel of MS Office XP.
"If I can't have a revolution, what is there to dance about?" - Albert Meltzer
So now more powerful viruses will entertain the masses!
They havn't opened the office document standards, they might just make then more parsable. You would still be breaking the law if you built a product with ability to parse an office document without paying a MS royalty.
"God fights on the side with the best artillery." - Napoleon, Marshal of France - speaking truth to power
YAWC Pro (http://www.yawcpro.com/)
This can output XML according to any DTD (by default it uses the Simplified DocBook DTD).
bp
Microsoft never tells you guys that you can't breathe under water... "it was as if a million voices cried out at once, and then drowned" btw, using Yukon as the file system will only help accessibility because every file on the system will be as accessable as a SQL Server database is now.
The truth doesn't care what I think.
<xml>
<clippy autoinstance="true" kill="false">
Hi, I'm Clippy! I'm inserted into every
XML document to help you migrate to Microsoft
products...
</clippy>
<virus>
RunNimda()richedit.dll&
</virus>
<virus>
RunNimda()richedit.dll&
</virus>
<staroffice runat="false">
Shouldn't you use Microsoft products?
</startoffice>
<linux runat="false">
See above.
</linux>
<datacheck>
<DMCA>
www.fbi.gov/reportmusictheft
</DMCA>
<MS>
www.microsoft.com/reporteulaviolation
  ; </MS>
</datacheck>
<clippy autoinstance="true" kill="false">
Hi, It's me again... Clippy! I'm inserted
into every XML document to help you migrate
to Microsoft products...
</clippy>
<doc>
Yep... there's a ton of possibilities for Joe
user with this one. Where can I buy me a copy
of Office 11?
</doc>
</xml>
This space for rent.
Openoffice is XML-based, and extended into suit-compability by StarOffice. It is to my best knowledge rather xml-based, easily parseable and well documented.
That alone is a unique feature that adds a lot of value to openoffice in the medium to long perspective. Microsoft would certainly not risk one of their big cash cows by clinging too tightly to their paradigms. They are many things, but not they are not complete idiots.
So, opening up the format would remove some of the reasons why customers might want to migrate to other systems.
It's a defensive move, really. A rather good one for all parties, too, especially if they refrain from their anti-open-source licensing. If they allow open source projects to process their documents, we will add value to their product. I certainly hope they will see it this way, though I'm not convinced.
Stop the brainwash
XML is a text format and therefore isn't suitable for encoding huge chunks of data. That's why JPEG, MPEG are in binary. Users with 100 page documents are going to have to store them in the old Word format, either that or contend with gigabyte documents. You could try Compressing, but that would be a huge performace hit every time you try to save and open a document.
Who needs XML?
c tivate;e t_file,-4158 );
my $handle= new Win32::OLE('Excel.Application.9') || die "died: $!\n";
#version 9 is ofc 2k, version 10 is office xp.
if($source_file=~/\.xls$/i)
{
$handle->Workbooks->Open($source_file);
my $worksheets_count=$handle->Sheets->Count;
#print "Count: $worksheets_count\n";
#note that a) excel sheet tabs are #numbered from '1'
#(YAR VBA should not be considered a real #programming
#language)
#and b) for my purposes the first 3 were garbage. #Season to taste.
for(my $i=4;$iActiveWorkbook->Worksheet($i);
$sheets->A
my $temp=$source_file;
$temp=~s/\.xls$//i;
my $target_file= $temp . "S$i" . '.' . "txt";
#-4158 is the MS magic number for tab delimited.
$handle->ActiveWorkbook->SaveAs($targ
#not quite sure what the line below does any more.
$handle->{XLSaveAction}=2;
push @target_names,$target_file;
}
$handle->ActiveWorkbook->Close(0);
This is one of the things I put under the ruberic of 'Stupid Perl Tricks' Saved as text and data locked in a SS can then be easily imported into any database. After assorted data munging to normalize it, of course...
putting the 'B' in LGBTQ+
Doing XML stuff with OpenOffice is supergreat. It took me half-an-hour to study the format enough to write a XSLT parser that extracts all strings from an OO document.
: //www.w3.org/1999/XSL/Transform"t tp://openoffice.org/2000/office"t p://openoffice.org/2000/style"/ /openoffice.org/2000/text"p enoffice.org/2000/table"o ffice.org/2000/drawing"o rg/1999/XSL/Format"r g/1999/xlink"g /2000/datastyle "c hart="http://openoffice.org/2000/chart"3 d="http://openoffice.org/2000/dr3d"h ttp://www.w3.org/1998/Math/MathML"t tp://openoffice.org/2000/form"p ://openoffice.org/2000/script"
Now I wrote, just for demonstration, the following XSLT example in just a few minutes, useable directly with xsltproc in Linux.
The example prints all the Heading paragraphs in a OO Writer document, indented according to the header level.
<?xml version='1.0'?>
<xsl:stylesheet
xmlns:xsl="http
xmlns:office="h
xmlns:style="ht
xmlns:text="http:
xmlns:table="http://o
xmlns:draw="http://open
xmlns:fo="http://www.w3.
xmlns:xlink="http://www.w3.o
xmlns:number="http://openoffice.or
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:
xmlns:dr
xmlns:math="
xmlns:form="h
xmlns:script="htt
version='1.0'>
<xsl:output method="text" encoding="ISO-8859-1"/>
<!-- Print all headings, indented. -->
<xsl:template match="text:h">
<xsl:value-of select="substring(' ', 1, (@text:level - 1) * 2)"/>
<xsl:text>* </xsl:text>
<xsl:value-of select="text()"/>
<xsl:text>
</xsl:text>
</xsl:template>
<!-- Don't output any other text. -->
<xsl:template match="text()">
</xsl:template>
</xsl:stylesheet>
The result would be something like:
* Top-level heading such as a chapter
* Second-level heading (section)
* Another section
* Subsection
* Subsubsection
* Yet another section
"So it seems to me," he concludes, in delightfully prophetic mode, "that when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine."
Actually I can imagine. I'm doing this with HTML files for years. Thanks to openess of HTML format.
If you're in love with this new XML support from Microsoft, don't forget you have to purchase an upgrade or buy the new version to get that XML support! Don't send you money to M$! OpenOffice and other such are a wiser choice. Come on - let's just forget about M$ and do without them.
Look up at this. Putting information in XML makes the first baby step of reverse engineering easier, nothing else.
XML helps only if the creator of the document wants the information to be easily accessible by programs other than their own.
To a Lisp hacker, XML is S-expressions in drag.
It's seems that this should be regarded as a good thing, but a lot of opinions here seem to regard the whole thing as an evil scheme. I don't think openness is their whole motive for moving to XML, but that doesn't make it a bad move. It may be easier for them to create and maintain Office's code if the format is XML rather than a binary format. Since storage space isn't such a premium these days, programs can afford the luxury of a file format that trades efficiency for ease of development.
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
Anyone looked at the HTML output from an office program? It's terrrible. Do you think their xml will look any better?
love is just extroverted narcissism
I have a feeling that Microsoft "XML" will use Microsoft "Unicode." That is, any character in the range of 0x82 to 0x95, which Unicode reserves for extra control characters, will be littered with "smart" quotes, emdashes, and other proprietary extensions to Unicode that ensure that nothing works with it. I ran into this problem when I tried converting FrontPage generated HTML into XHTML so I could do conversions with XSLT. Needless to say, it took a lot of effort, even with HTML Tidy, to get Microsoft's generated HTML to get converted into XHTML! HTML Tidy constantly complained about the HTML, and looking at what FrontPage generates, it's not hard to see why it complained.
I ran across the demoroniser, which fixes Microsoft Unicode problems, but it still doesn't fix the invalid HTML that FrontPage generates.
Microsoft XML? Hah! I'll believe it when I see it.
Well, one consequence is that many people will be forced to upgrade to the new office, since all the Word-attachments will require the new word to be readable (and editable)... Now, this is a good motivation for M$.
I've seen the native Word XML format (alpha mind you, so it might get changed). It isn't exactly pretty, and if I had to write code to extract all the paragraphs that contained the word "foo" in bold it would give me a bit of a headache, but I could do it.
/> />
The word "foo" in bold single-underline looks something like
<r>
<rf>
<rp class="bold"
<rp class="underline" lines="1"
</rf>
foo</r>
Yeah, it's pretty verbose.
Near as I can tell, it is 100% round-trip-able, i.e. you save as that file format, you read it in again, you hit ctl-S and it saves again; about as good as a native format. Now someone needs to write some script-ware to run Word in batch mode to xml-ify server directories with zillions of office docsl
I think the reason MS is doing this is obvious. Look at their financials - they *really* need people to upgrade to the new version of Office. End-users don't buy Office any more, CIOs and the like do. These people are just not gonna be impressed by another new word-processing feature, but they might be motivated to upgrade if they thought that they were opening up all their data to re-use by other programs.
I expect that with any luck we'll get a secondary industry built around doing cool unexpected stuff to Office docs. Don't want to sound over-excited here, but a huge amount of all the intellectual capital in the world is sitting around in Office docs, and this makes it noticeably more re-usable. Has to be a good thing.
Cheers, Tim
So what happens when MS starts changing XML?
Anonymous Cowards suck.
Microsoft Corp. today announced revenue of $7.75 billion for the quarter ended Sept. 30, 2002, a 26 percent increase over revenue of $6.13 billion for the same quarter last year. Operating income for the first quarter was $4.05 billion, compared to $2.90 billion in the same period last year. Net income and diluted earnings per share for the first quarter of fiscal year 2003 were $2.73 billion and $0.50, which included an after-tax charge for investment impairments of $291 million or $0.05.
Random is the New Order.
Whew, the boys in marketing must have had a hell of an all-nighter coming up with that one!
MS Office saving its data in XML format is a great start.
But will this really be enough?
Previous complaints about how versions of Office didn't disclose the format were often referred to a specification that Microsoft made available to describe what was in a Word document.
The key problem, IIRC, was the the description was not sufficient for one to predict how the Word document was actually formatted and rendered on the page.
Because XML is very much like SGML or TeX, it has the potential for much more exhaustively describing document structure. But whether the new Word XML format (or OpenOffice format, for that matter) contains sufficient information for developers to reproduce the "right" format is a different issue.
I hope I'm wrong and that the format is specified comparably to the level you'd find in say PostScript or PDF.
Maybe MS is willing to let rendered Office douments change, just as HTML rendered documents change whenever one resizes the browser window.
But I doubt it.
"Provided by the management for your protection."
Micro$oft already does some things with XML. They (sort of) EXTENDED the XML spec (I'm not sure here) to make sure they could embed binary data in it.
This way they can put a M$ Word file inside an XML body, but still be a binary file.
This is what I think is likely to happen.
My half asleep brain managed to come up with what sounded quite logical...
"Tiny Brain in Microsoft Office"
A slip of the foot you may soon recover, but a slip of the tongue you may never get over. -Benjamin Franklin
all sorts of wonderful new things can be invented that you and I can't imagine.
I'm guessing that the Anti-Virus groups have finally been able to catch most Word virii, so MicroSoft now needs something new to be able to generate the quanity and quality of this type of software that they demand.
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
You can always serialize a set of objects into xml. Now, how to use that xml without the original code is left as an exercise for reader. In that case, most likely you need a bug for bug compatible MSOffice clone.
You haven't got a clue about this have you?
Your post is just a bunch of paranoid, slashbot FUD. No wonder you got modded up!
Let's wait until we SEE the finalized XML document format before we declare this a good thing. For all we know, it's going to be WhK39AHE@KEH+=J9017ELDHJH+! -- totally unintelligible yet 100% XML!
What do you think paladium and .NET are all about.
It's a stupid nested name-value pair text databasing system.
That'd be like inventing the sentence.
It must have taken Microsoft months to come up with that ultra-secret code name.
Trolling is a art,
My guess is, the XML format will make it much easier to manipulate Office documents from scripts, but it will still be very difficult to construct an actual WYSIWYG editor for them.
e.g. Say that there is a tag with extremely complex, undocumented, formatting and display rules. It might be easy to add or remove things from tags, but only Office would actually know how to *display* a table correctly.
This would allow MS to say "we have an open file format" without really endangering their core business, GUI document editing tools.
"Software As A Service"
Anyone remember hearing this term from M$ before? That's where they're going with this. They want to be able to offer the word application as little more than a front end to a series of web services that they'll be offering for a fee. This makes an XML-based file format much more attractive to MS because it's more effecient to sent data that is already in an XML-based format to a WS than it is to take a binary format, serialize it to SOAP, and then send to the WS and have to deserialize said object.
Do I believe that MS will actually use a real XML format? Sure. Do I believe for one second that this is to be more open? Hell no.
Karma: Non-existant. Due mostly to the fact that you smell funny and nobody likes you.
Oooooh, yay, Microsoft added another export filter to Word and Excel. The world is a better place.
The reality is that unless the XML format is the default format, this change is useless to most users. The cry against new word processors is always, "If it doesn't import every single Word file ever created with every single feature supported, it's worthless." Unfortunately the insane complexity of the Word file format, the lack of documentation, and the constant churn as new versions of Word come out mean that you'll never see perfect conversions, yet too many people whine that it has to be perfect. (Completely ignoring the fact that most users would never notice the (in most cases) ever so slightly inaccurate translation, the minority that push the documents to their limits refuse to admit any value in an imperfect translation.) An XML would make the translation easier, but it's useless if it's not the default. Microsoft's monopoly on office productivity software is based on the massive numbers of existing Word and Excel files.
Search 2010 Gen Con events
If they continue to allow trade secrets like this to leak out, who knows what could happen. I mean, if the world knows that MS Office uses XML-based file formats, that could be a huge disaster! If MS doesn't act quickly to stifle this leak, cross-platform software developers might copy this innovation and take away their competitive advantage!
Cut that out, or I will ship you to Norilsk in a box.
I think the reason that they are switching over is probably due to the trend in emerging foriegn markets. Peru being a prime example. Countries are starting to enact legislation that requires any government procurments of software to only be for software that uses an open file format. Due to the long term storage problems.
This tied to the fact that US sales are going to slow down or are already, due to the complete inundation of PC, they need new markets, and unless they use an open format they won't be able to get them. I'd be panicked Linux and Java eroding their server market. Governments are eroding their Office market. They only way they can grow is add value.
Does everyone miss that MS will have XML, certainly.. and they will have tons of proprietary data in between those xml tags, that they are under no obligation to document for anyone.
You will be able to see the structure of the file, but not make sense of it.
That's what I'd do, if I was ms.
Yesterday, when I attempted to moderate something as "Interesting", the confirmation page showed a
moderation of "Overrated" instead. I'm pretty sure
I selected the right value from the pulldown list,
and suspect there may be a bug in the moderation system.
>;k
Nope, no sig
Funny, but that url is invalid because of the second "?" (it should be a "&").
I asume that the space in "Ge tDoc.aspx" is a result of the previusly mentionated long word separator script. I've read a lot of complains about that "bug", when are the slashcode maintainers going to disable that for url's?
16,777,216 comments ought to be enough for any forum!
Perhaps, when confronted with carrying a snake across the river on our backs, we are properly wary?
What we call folk wisdom is often no more than a kind of expedient stupidity.-Edward Abbey
A non XML grammar/syntax, if accompandied by a decent and documented EBNF description of it's grammar, is much better to base your program on than an undocumented XML.
Except that an undocumented XML file is in an exhaustively EBNF-documented syntax already. Not to mention that the constraints upon QNames mean that the semantics of the schema will be available for disclosure via existing tools even if obfuscated. The same cannot be said for an arbitrary syntax, ANTLR notwithstanding.
illegitimii non ingravare
'all sorts of wonderful new things can be invented that you and I can't imagine.'
Too bad most of the things will come from lawyers.
Does it somhow become encrypted on its way out of the database, remains scrambled on it's way over the internet, and reassembles itself into nice XML once it arrives on the recepients computer?....
I think you just described Palladium.
This
That's the easiest way, really. And the benefit of having nicely documented DTDs. OO is the true compatibility XML file format for office files.
That's why MS need to have their own. Because if they don't do it, many companies will use OO as a gateway (many not just yet, but soon).
So they have to do XML, plus MS is wanting to integrate Office + Windows Programing + WEB Frontends + EVERYTHING in an interoperable way. They can dictate the standard of what the WORLD will have to use in the future.
They will always be in the middle, and their revenue models will adapt to it just fine. The MS layer if you want to call it.
unfinished: (adj.)
This HTML filter is so useful! Now I can actually make web pages out of Word docs! Thanks again! Someone mod this guy up informative!
This is a huge stretch.
XML derived from SGML.
Tim is a disgrace to the community - with all this marketing spin.
Do you navigate the page with the arrow keys? If so it is very easy to choose a moderation from the pulldown box, and then forget to click on the page before hitting down a couple of times and changing your moderation. I've almost done that a few times myself.
I read the internet for the articles.
But, as the saying goes, it's only paranoia if they're not out to get you. ;-)
The business world is a harsh and, ultimately, fickle one. Microsoft got to the top by doing good things, but you can't abuse your position for long or people will start to notice. As the world comes to depend ever more (rightly or wrongly) on IT to get its business done, following standards and maintaining open sources of information will become ever more important. Even a company as big as Microsoft won't survive by locking people in forever. They rose to the top in a remarkable period of time, and they now have near 100% market share in certain fields, but convincing companies to continue upgrading is becoming harder and harder.
One of the major drives to get new versions of many products today is the promise of greater power to get data from A, where it is, to B, where you want it. If everyone else is playing ball (because, being minor players, they have to just to stay on the scale) and Microsoft doesn't, then sooner or later, Microsoft will lose market share to everyone else. No company survives by not giving its customers what they want, and right about now, Microsoft's customers most want the two things they can't, or won't, give: security and interoperability. All the UI reworks in the world aren't going to change that, and they know it.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
that describes the use of XML in office XP and 2k. and while the ability to export to xml is nice, it is not what we are talking about. in work you can export to plain text or html. this does not stop the default and "recomended" format being the beast known as ".doc". my dad uses office all the time but does not want to "mess up his files with weird formats". this is a common mentality about office formats. Microsoft would not allow their ".doc" format to be read by other programs after they have worked so hard to stop that.
SQL Server has had an XML web gateway since version 2000. You can run any query and output it as xml
I hadn't even thought about that. If the XML format is anything reasonable, you could get query results as XML, toss them through a little XSLT, and have your report as a Word or (probably) Excel file. That would be pretty slick.
I send this to you to have your advice.
(1) Strategic value of proprietary Word format decreases. Most texts written today are E-Mails not Word-Documents. Word becomes more and more an editing format. Documents are published as ASCII texts, HTML and PDF. Word douments can't be combined with Web services, I've never seen a Web application creating Word documents. (2) Microsoft can't create a new proprietary format, that can't be read by Word 97. Everybody will accept that Word 97 doesn't read XML. If you want XML, you have to buy the new Office. (3) Outlook and Internet Explorer are examples how Microsoft can dominate a market starting with standard formats and protocols.
Does this mean we'll see the very first XSLT Virus soon? I mean, VirusBasic scripts are getting so tiresome...
If you follow the XML Journal link and look at the "feedback" at the bottom it appears to be the comments that are appearing here on slashdot. Is there some sort of reciprocal exchange of comments going on between the two sites? Is this kosher?
The logic is: Everybody goes to XML, and Office becomes the universal front-end for everything XML.
If on the other hand they screw it up, then that leaves a potential "underserved market" for somebody to step in and get some leverage in the newly created "xml frontend" segment of the business.
Which then by virtue of market share becomes standard. It is actually in their best interest to publish it clearly. Then the other potential competitors will feel strong pressure to fit their software to match MS and have no real excuse why they can't. If MS waited there would be some other standard emerging and MS would be pressured by customers to adopt it. Then it would be MS having to shoehorn its document logic into some other form and not the other way around.
While other potential competitors are playing catch-up with making their documents fit into the MS schema MS can be busy thinking about the next thing to do.
So frankly I expect the word document xml (and excel and the rest) to actually be quite clear and documented but very aligned to how MS Word sees a document, which will likely impress others as obtuse.
Yeah. And when Microsoft embraces and extends XML so it only works with Windows by obfuscating the format to the extent that nobody wants to parse it except the 20,000 monkeys beating away at Microsoft's very own 20,000 keyboards, nothing good will come of it. Oh well.
'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'
He also promised me the tooth fairy would pay me $2 for my tooth!
HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA!!! OMG, I almost fell outa my chair this is so funny. This isn't April fools day.
Seriously though, with the change in file formats could come some decent structure and less loosy-goosy WYSIWYG! (Yay structure!)
Unfortunately, office will essentially be a version 1.0 product, buggy as all hell. Even worse, now that the Microsoft documents will be saved in an open, self-describing, and flexible data format, I'm sure we can look forward to a new level of sophistication in the macro viruses that will attack this new platform. Life will get very interesting after this new Office comes out. (Boo viruses!)
Hola!
:)
Weather is quite hot during these days. In fact I'm going to the University with t-shirts and short trousers. The temperature is about 25 Celsius degrees and there are only some white clouds in the sky.
Nights are a bit colder (around 10-15 Celsius degrees
Hope this helps!
Best wishes from Valencia (by the Mediterranean Sea in Spain / España)...
You were right, weather was great. ;) The Dali museum in Figueres is magnificent, visit it if you haven't yet.