Office 2003 and XML
zachlipton writes "Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format. Gary Edwards, the OpenOffice.org representative for the OASIS XML file-format group is quoted as saying "although it's still early in the review process, it does look as though XP XML has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers." Apparently, all formatting and presentation information is removed from the XML. Furthermore, Office's new collaboration featres will only work with users who are also running Office 2003 (requiring Windows 2000 or 2003) that are connecting over XP servers." So Microsoft will continue its efforts to lock-in users with proprietary formats, and hopefully the rest of the world will produce an XML standard document format without them.
Well, it friggin' figgers, doesn't it? Anyone who didn't see this coming must have been living on another planet.
With the US antitrust suits off now, the EU is our only hope to curb their anticompetitive practices.
Microsoft will have to learn IBM's lesson about transforming from a company that makes standards, to one that contributes to them.
They still don't get that their attempts to "embrace and extend" the whole damn internet isn't going to work.
The rest of the world WILL produce an XML standard document format without them, thank heavens.
--
CPAN rules. - Guido van Rossum
Developer
.xml.
.xml, which makes it easy for companies to integrate InfoPath forms into their existing business processes -- one of the
March 13, 2003
Will Office 2003 Lead to Lock-in?
By Thor Olavsrud
With the recent beta release of Microsoft Office 2003 out the door, many customers got their first look at what Microsoft hopes will re-write the office productivity landscape with a new ecosystem of collaborative functionality based on XML (define). But will organizations have to scrap their piecemeal systems and buy into an entirely Microsoft architecture to tap it?
That's the contention of Gary Edwards, a Web application design consultant and OpenOffice.org's representative on the OASIS Open Office XML Format Technical Committee.
Edwards said that Office 2003 beta's handling of the XML file format means that firms will not be able to tap the rich collaborative features of Open Office 2003 without resorting to proprietary Microsoft file formats. And to truly unlock its collaborative potential, firms will have to standardize on the Windows XP operating system (Office 2003 won't run on Windows 9x), as well as Windows 2003 Server, SharePoint Server, Exchange Server, etc. As for the file formats, he called Office 2003's XML "crippled," because it strips XML files of all presentation and formatting information when saving them in the XML file format. It does not do this when saving files in Microsoft's proprietary file formats.
"Although it's still early in the review process, it does look as though XP XML has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers," Edwards said. "Reports are that when saving to XML, [Office 2003] strips out the presentation and formatting information, leaving near raw content. It appears, at least from the non-enterprise systems user's perspective, that all the really cool collaborative advantages are based on saving files in the XP proprietary format. Which means that "all" the users in the collaborative effort must be on the XP platform, using XP Office, connecting through XP servers. What kind of universal connectivity and exchange is that? XP users won't even be able to collaborate equally with the 200 million Win9x users. Not unless they upgrade."
However, Ronald Schmelzer, founder and senior analyst of XML research firm ZapThink, noted that Microsoft's approach aligns more closely with a core tenet of XML theory: the separation of process and data.
"The idea is for XML not to specify how the information should be processed, but rather leave that task to XSL (define) templates and other post-XML processing steps," he said. "XML is supposed to be a presentation-neutral format."
Still, Schmelzer said that becomes more tricky when integration goes beyond the enterprise itself.
"I think when it goes beyond intra-business integration to cross-industry and inter-organization integration, the question will be how much of the data they exchange do they want loaded with presentational and operational functionality and how much do they want to leave to the individual implementation of the company?" he said. "This is really not an answerable question -- because it depends on the scenario. The problem with standards is that there are so many of them. The resolution here is to look at how companies and industries will adopt XML in their verticals and then determine which aspects of that should be embodied in standards and which should be embodied in products. Experience shows that companies and industries can hardly agree on the data, let alone the representation, so erring on the side of "less" in the XML body makes more sense."
Microsoft chose not to respond to questions about presentation and formatting in their XML vs. their proprietary file format, simply noting that the native file format for InfoPath (the application for creating XML forms in Office 2003) is
"The native file format for InfoPath forms is
Somebody honestly thought that Microsoft would suddenly give out their most valuable asset, the proprietary office file format, and people would be free to use whatever they want..
Pigs will fly if that day ever comes!
Prevent email address forgery. Publish SPF records for y
Windows 2003 server is comming out on April 24
Windows 2003
I am shocked. Shocked! I'm shocked that Microsoft would do something like this that wasn't in the best interest of their customers.
This is hardly surprising news.
My question, though, is whether it is possible for other vendors and OpenOffice to create a better , more pleasing formatting and presentation of the content in the XML than Office 2003 does?
"Provided by the management for your protection."
So Microsoft will continue its efforts to lock-in users with proprietary formats, and hopefully the rest of the world will produce an XML standard document format without them.
I'm not trying to start a flame war here, but it seems that they're missing the point! We don't want it to be MS with one format and the rest of the world with another. That really wouldn't make it much different from how it is now. At least the way it is now, non-MS office software can read the MS formats. If it comes down to the choice between using the MS format or the "rest of the world" format, MS is going to win every time..
I took this to mean that their XML doesn't store things like "bold this", "underline that", etc. Missing that information means you might as well save it as text to import it into another document.
Or isn't this good enough?
Personally, I only use a word processor to re-markup things I've written in HTML. That includes my dissertation. HTML isn't super printer friendly, but come on, we're all trying to go paperless anyway, right?
Hey freaks: now you're ju
I think the point is that if you save to their XML specification, you will loose all your document formatting. So yeah, the data is there, but it can't be reopened in Office or any other word processor and be in a structured way. Essentially, it is the same as just saving as plain text which has already been available since Office 95.
Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws-Plato
The XML features they are putting into Office XP look to me as if they will only be of use in very large companies. I don't see much benefit for small or medium-sized companies. And the expense of upgrading is such that, in the current climate, I doubt many will make the move to office XP.
Microsoft used to be able to force everyone to upgrade because if you didn't, you wouldn't be able to read documents sent to you by others. I don't think that is going to be so successful now, there's too much resistance and the price is now too high.
Does anyone know of a company that is planning to move to Office XP once it's out of beta? I don't.
I have to agree. The the basic concept behind SGML and its diminutive offspring, XML, was to separate content, structure and presentation. This just means that you have to share a style sheet, FOSSI, or whatever when you share a document if you expect the person you share it with to be able to view it.
There may be other *valid* criticisms of what Microsoft is doing but this isn't one of them.
They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
Ben
How will it read the XML if it only contains text?
Will there be a binary portion which contains the formatting??
How will this work for Spreadheets?
How will images be embedded?
So many questions!
And this is bad how? Isn't this the dream that XML document proponents have aspired to for years? You just can't please some people...
.XML format for interoperability. If the XML format can't figure out the fonts, colors, and various drawing elements in your document, then people will abandon it for something that does - at the expense of the rest of us.
Unfortunately, Manny Manager and Sarah Secretary are now very used to depending on the formatting and presentation information. To be honest, not too many people these days subscribe to the whole minimalist document theory (unless your idea of starting your editor is typing 'vi').
The main point here is to encourage the
Do you have Linux and a DotPal? Click here now!
Instead, create an XML format that is specific to your needs and write a DTD or XML-Schema that describes it. If you need to translate it to someone elses' XML document format, a quick XSLT stylesheet will transform the document with a minimum of effort.
Just my 2 cents.
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
It's going to be very interesting to see how they market this incompatibility as a feature.
. Quit playing Monopoly with Bill. Switch to one of many non-Microsoft products today.
to switch to a free and open office suite like star office. Seeing as how Office is MS's cash cow, if sales drop, maybe they'll stop using obfuscated formats. It's obvious this is the only way to get them to stop since the DOJ seems to think that a conviction is enough to shame them in to playing fairly. The only way they'll change is if customers make them realize that it's not in their interest to use obfuscated formats. People need to understand that when you buy Microsoft, you are not just giving them money, you are encouraging them to take away your freedom of choice.
-- Knowledge shared is power lost. -- Aleister Crowley
From the article:
"The idea is for XML not to specify how the information should be processed, but rather leave that task to XSL (define) templates and other post-XML processing steps," he said. "XML is supposed to be a presentation-neutral format."
There's nothing stoping anyone from making their own collaborative product that works with XML files. MS isn't going to do it, but that doesn't stop an open source solution.
All Microsoft needs to do is make their standard an open one (that can be used by others), like Adobe has done with their PostScript and PDF formats. Adobe has done quite well with their products based on these formats, too. Products like Adobe Illustrator and Photoshop (which works very well w/ bitmaps saved in PostScript) are the industry standard in digital art. If Microsoft followed a similar model, I'm sure that Microsoft Word will continue to be the industry standard in word processing software, and Microsoft as a business won't be any less richer for it.
There is a big difference between seperating presentation from content and removing the presentation totally.
XP users won't even be able to collaborate equally with the 200 million Win9x users.
Gates: Johnson who is the little guy today?
Johnson: Well our older platform is still performing quite well...
Gates: Great scott!?! screw him Johnson!
On a more serious note however, the post mentions that the "rest of the world" will band together. I think the reality of the situation is that the large amount of Office users are going to have to demand this for it to really phase the goliath and maybe not even then.
-bort
To come up with a standard, or use an existing one, and write a couple of VBA macros or Add-in to import that file-format and export to it?
Haven't looked at Word's automation API recently, but I suspect you could get a lot of the data necessary that way and export accordingly. Perhaps I'm totally off the mark...
Looking at it now, it would probably be pretty cumbersome, but probably manageble. I'm just thinking, it would be nice if Microsoft would do it for us, but I suspect that the applications at some level probably expose an API making available most of the needed information, and that could be used to export the type of file people want.
- Sighuh?
But I don't think users will be running Windows 2003 as the blurb suggests.
Isn't part of the concept of XML relating DATA and being able to seperate presentation from pure content. Isn't the additional concept of XML it's extensibility and adaptability for one group to use it differently than another? Because if not I've been using XML wrong for about 2 years now.
This article makes it sound as if MS is doing something completely improper with XML (i.e. changing it's "standard"). But it seems to me that MS is simply separating content from presentation and relying on ????(something proprietary, xsl, more xml) to provide presentation. Just because they don't use the standard the same way you want them to doesn't mean that they are breaking the standard. I'm sure if you look at the XML that they output it's all standard XML. It also sounds as if they are not using any of the "tricks" that others have complained about (i.e. storing binary data in an xml tag).
Instead of bitching about the problem maybe we should
1) provide feedback if we are a beta tester
2) wait for it to be released
3) ready some tools to provide interoperability
4) work harder on creating tools better than MS
"Do not be swept up in the momentum of mediocrity." - anon
Although a XML-MS-Word forma would make compability easier, it doesn't means that it will be compatible. MS and Mr. Bill Gates uses of many theories developed through the centuries to overcome not only enemies but everybody else. From romans to George Orwell.
There's no to MS use a standard format when they own the standard today (MS Office files). And even if they hadn't the standart they would destroy it, just like they tried with Java and J++.
MS still have a lot to learn (and to suffer) until it's capable of colaborate with an open standard.
-=-=-=-=
I know life isn't fair, but why can't it ever be un-fair in MY favor!?
It is obvious that Office 2003 will not have a beautiful open standard the will interpolate with any piece of software. I find that unfortunate, but not unexpected. As the Oasis link points out, Microsoft is not really interested in letting its consumers out of the box of proprietary formats they are currently stuck in.
The article is on the other hand very vague (probably because the information still isn't available) about what information is left in. My interest is no so much in being able to read OfficeXML documents, though as a WordPerfect user I would find this handy. What I am really interested in is if Word 2003 can in anyway be cajoled into being an authoring tool for already existing XML formats like DocBook. WordPerfect2000's support for XML is present, but clunky. My real hope was that Microsoft would offer a more useful solution, and to spite the bad rap about "presentation information" being removed, if other more useful information like 'heading,' 'strong,' 'table' etc. are still present, then I think it is a(n admittedly small) step in the right direction.
JFMILLER
Strive to make your client happy, not necessarly give them what they ask for
It's something M$ has done all the time. So I think the next time M$ say something about open, interoperability, and whatever that may give users some freedom, there should be no debate. Just know that they are not what they are, and debating will not help.
Ofcourse, they will change when Linux has 80% of the market.
that other programs will eventually figure out a way to open these new Office documents? I do like OpenOffice and don't have any plans to buy an MSOffice product for home.
An XML file is supposed to have no presentation. If you want presentation you need a stylesheet to go with it. Are you a dumbass or what? Read a little.
Isn't the whole idea of XML to separate content from format? So, Microsoft is guarding the last mile from the software infrastructure (including their data format) to the user's brain (supplied by formatting). So, to use Microsoft's data format, I have to come up with my own styling. Isn't this what happens with rss and rdf already? Isn't this potentially a win? Couldn't an industry spring up using microsoft's data format and a set of styling sheets built to transform that format (ie, xslt).
I sense some of the shock and outrage around this article is that people would like to be able to use excel as their data viewer, with an open file format that they could write to. What about simply treating excel as a data publishing system, perhaps even transforming its output to the more open standard developed by OASIS? This starts to consign excel to legacy that needs an adaptor.
Actually, this looks promising for the company I'm currently working at. The article states that, when saved to XML, Word creates presentation-neutral XML (by not exporting any presentation-related information). In fact, that's exactly what they are complaining about.
... sweet! Right now the Word templates we're going to be using to "flow" data from Word into XML are a complete pain in the ass, due to Word. (And since our content-creators are external, we're not going to be escaping from Word any time soon.)
... *sigh*
We don't want Microsoft's presentation crap--we don't CARE how the data looks in Word. We want pure XML content (content completely separate from presentation). Our XSL stylesheets will create our presentation upon XML tranformation into PDF. This is the base intention of XML: separating content from presentation. Presentation information SHOULD NOT be part of the XML format.
If we can create pure XML templates that'll export using our DTD/Schema
Our only issue would be getting freelancers to upgrade to the newest version of Word, given that the majority of them still use really, really old versions. Maybe five years from now
"Apparently, all formatting and presentation information is removed from the XML"
And moved into stylesheets where it belongs.
Some people who have reviewed the XML stuff in the new Office beta think it's a good thing.. guess it depends on your perspective.
Plain text RULES!
Or, if you INSIST on something else, use rtf.
1) Take MS, make a report that says they did something bad, watch how many people flock to bash them DESPITE THE FACTS PRESENTED IN THE ARTICLE, which leads me to:
2) How many people read the article? And of those people who DID, :
3) How many of them know that XML is supposed to be a divorce of data from presentation? Why this comes as a shock to people is obvious - they didn't know that.
The poster above who said "style sheets" - bravo. You couldn't have made a better point with two words.
i'm amazed that i survived - an airbag saved my life.
When did this come out?
All of this is only for the users of latest Microsoft products... Does any of this suprise anyone? If so, they have their head in the sand.
Microsoft wasn't built by playing nice with others, why would they start now?
---- Booth was a patriot ----
Someone doesn't know what they are talking about. I'm using Office 2003 on my XP machine right now.
The article also states that you would need XP Server... what the hell is that? XP is a desktop OS, Windows 2000 (or 2003) is the Server OS.
Bill
It's my Sig and you can't have it. Mine! All Mine!
...but I stopped at Win98SE and Office 2000. I might consider switching to Win2K, but I like 98SE. I refuse to go to the crap that is XP, and anything with DRM built in doesn't come near my computer.
Jeez, you gotta think things are getting bad when people won't even pirate your software anymore. Time to find a nice stable version of Linux that lets the common Windows user install and setup with the ease that Windows users are accustom to. Not to actually brag up Windows...it's still a piece of crap that becomes unstable after about 10 or so apps are installed, but I don't know any other OS where I can be up and playing NHL 2003 in under 2 hours. Not to mention recognize all the hardware in my system. When you can deliver a version of Linux like that, I'll buy it.
Many offices will soon have to upgrade their PC's and software to be able to use XP together with this new MS software. Apart from this being a Good Thing for the economy, this has another important side effect: the 2nd hand market will be flooded with PIII's and cheap Athlons. I was thinking of buying a new computer to make a nice Linux server but I guess I will wait until this new Office thing comes out.
-- Cheers!
I don't think this means that there is no stylistic information in the document, rather that the style information is contained within the proprietary code segment of the document.
If Word documents all utilised the same style for various elements, it'd all be hunky-dory. However, users like their choice of a 50pt purple serif font for a title to stand, so the formatting information MUST be included with the document.
Perhaps a better format would be a zipped file that contains seperate XML and XSL documents...
Well, what is the content of a PowerPoint file? If all the formatting and presentation info are removed, what exactly is the point?
Sometimes the format is the content.
'Sensible' is a curse word.
The point of XML is to seperate the presentation from the content anyway. If you add in formatting and what have you directly into XML you have defeated that purpose. That is why there is XSL and CSS. Those are the things you are supposed to use for the actual presentation and formatting.
They have more influence than you imagine,
its not like anyone is going to stop them and their practices
they didnt stop Enron
they didnt stop Anderson consulting
they didnt stop AOL/TW
and the USA courts didnt stop MS apart from a few small requirements that MS had to implement, but in truth didnt stop the status quo and IE still continues to be the number 1 browser and Media player will be your average persons number 1 choice of player because its there on the desktop, right next to msn messenger and msn8 which wasnt installed as "standard" before and in the case of MSN messenger cannot be removed unless the user is an expert with editing config files (and armed with the relavent knowledge) no add remove entry for messenger.
so they are forced to change a few options but then lock you in again with new "features" that wasnt in the original ruling
in otherwords buisness as usual
Europe Trials are coming and MSFT stock continues to slide as this looms forward , the euro's arnt easily bribed, so things might have to change over in euroland
but not anywhere else
untouchable as always
I'm sort of of two minds on this -
On one hand, there are a lot of folks who have very strong opinions about the fact that the data should be separated from presentation... If Office 2003 were to strip the MS-apps-specific formatting (which is probably NOT very standards-friendly), but leave the style markings (heading, paragraph, footnote, etc...) then really, they would be providing a semi-structured document that conformed to XML standards.
As a web application developer/web author, there have been many times when I have been given MS Word docs and Excel spreadsheets as content for our web site... In the past, I have resorted to copying the whole page directly onto a text editor (thereby scrubbing all formatting information) and then using HTML markup to make the document look much like the Word original, but without having to deal with that rather poor HTML output the Word and Excel's Save as HTML features produced. If I could have a semi-structured document, it would have been easier to write some macros to parse the XML structure to automate some of the rough formatting (hooks for stylesheets or somesuch).
On the other hand, it seems to me that is might be in Microsoft's best business interest (the selfish ones) to make darn sure that it's not possible for OpenOffice fully interoperate with MS Office documents. I don't think they would be very smart (current business model-wise) if their new products (which will rapidly become de-facto business standards) helped to enable Open Office standards to take away their marketshare.
In the final analysis, I probably wouldn't worry too much until there's a critical mass of people using it. By then, a bunch of folks will have figured out what CAN be done with whatever format MS ended up with. At that point, Office 2003's XML format will probably make it possible for people to do something they couldn't do before or at least, to do something easily that once was more trouble that it was worth.
That's worth something...
The Digital Sorceress
scripsit dasmegabyte:
If you can submit a dissertation as an HTML file, your university is a very different place than mine.
Isn't HTML good enough? No, it's not. It's not even remotely close to good enough for doing a dissertation. It can't do footnotes; in my field, that eliminates it from the get-go. Not to mention that it can't do indices, tables of contents, figure numbering, cross-references, etc.
For academic work, it is very important that your work be citable. How the hell can I cite where in a book I got a quotation if there is no standard pagination? Count paragraphs?
What I want is XML TeX. For writing a dissertation, use (La)TeX. It can't be beat. And it's open, so you can write a converter to make it into XML -- or vice-versa, like docbook.
In principio creauit Linus Linucem.
What a paradox eh.
Here's my shot for it:
Separating content from presentation is a good thing. But under what context? Development? You can not separate content from presentation completely from a webpage. You can do that from the webserver, but not down to the browser.
Another angle, ok, you can separate whatever you want, but the whole idea is not that. The idea is that XML is used to store data, whatever the data is: data about something users type, data about how they format it. Then separated them out? Fine, but M$ just does not publish the presentation part. If they want it in 2 documents? That's another story to argue about, but the whole point is not that. The whole point is the presentation part is not there for you to access (at least in XML).
OSOpinion also has a good article about it. Less technical, more informative.
Whoever modded this troll needs to go back to elementary school for more reading comprehension skills.
Microsoft Markup Language. I constantly have users that send me html from office or frontpage and make them go back and use netscape composer or somthing that actually writes HTML or XML. I sometimes wonder if this is real or an accident. Ever read MicroSerfs? A company of fresh out of school engineers that work in their own words until they burn out. I just had to figure out how to use the locale routines. In US enclish computers the ISO 639 abreviation is "en". Microsoft returns "enu". No such thing!
I have Office 2003 Beta 2 freshly downloaded from MSDN. This article is completely wrong. I did the following:
.DOC Word document with tables, multiple fonts, etc. .DOC format.
1. Opened a heavily formated
2. Saved the document as XML.
3. Opened up the XML document in Word and it looks EXACTLY like the original
I also opened the XML file in a text editor and sure enough it contains complete formatting information.
let someone at the DOJ read this and FUCKING DO SOMETHING.
-- Insert wisdom here:
I thought your only supposed to save data in your xml and then use CSS/XSL files to then make the data presentable...whats the problem again...oh this is microsoft my fault.
The point of the Office 2003 "Save as XML" with the "Data Only" checkbox is _NOT_ a poor mans Save As XHTML. It's decide to allow the data of the document and pet placed into an XML document based on a schema. You literally can make your own schema file/XSD, and use a tool inside Word to map the elements of a Word document to elements of the schema. If you simply map a paragraph to a string you will lose formating. Unless of course you define in your schema how you'd like to store formating information. But that is generally an overkill.
Think of a resume. you could define an XSD for a resume, and be able to save resumes against this XSD, as validated pure XML.
Now, if you want to produce a document, using an XML syntax but want to combine both data and presentation, then you want WordML.
WordML uses Word's own tags to markup the word document. I was going to show you an example of WordML but i don't feel like escaping allt he greater-than/less-than signs. Anyhow, WordML contains all the formating and everything necessary to display a Word document as it is supposed to look.
I think this Open Office guy is looking for a devil in Office 11 that isn't there. That or he didn't read the friggin manual.
-Malakai
-Malakai
A Dragon Lives in my Garage
I cannot begin to grasp the twisted minds of people that write their dissertations in Word, but HTML? Why not use LaTeX? I wrote my dissertation in WordPerfect but had I know about LaTeX at that time I would not even have considered something else.
-- Cheers!
I think the author is missing the point. It's not about presentation separated from content that is the issue, it's about whether people can use other applications to read and write word files.
At present, it is an unknown format with lots of binary stuff which means that you can't easily make your own, except by trial and error writing documents and seeing what it does.
As long as the XML has things like type of tags that make the document make sense, then it seems to me like a good thing.
My fear is that we get a whole load of nice tags, and a 2198yhwqffoiasdfbdfa that hides it away from us.
As long as it is easy to use XML, I can read it, write it and stylesheet it.
Dear smart,
Can style sheet alone show how the document look?
You are missing the point. The open format part of the document contains the information. The closed format part contains the formatting. Which means extracting the information (and probably quite a bit of the style) from a word document will become fairly easy.
.RTFs with nothing but text properly. This could a massive step towards compatability.
Right now OpenOffice can't import word created
"has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers."
That indicates to me that the problem is really that the document format is so complicated that it takes tremendous resources to understand and implement compatibility with it, as this implies that larger companies like say a Xerox will have no problem producing tools to work with it.
So from a business consumer perspective this is still a tremendous win.
This sounds like more whining from the open source crowd.
Can't someone with the beta (I'm sure at least one slashdotter does) does some examples of this XML file format? I thought it would be an obvious thing to do with the article...
Now go off into your corner and be a good little troll.
"Does anyone know of a company that is planning to move to Office XP once it's out of beta? I don't."
I have the answer!
Microsoft!
This is my sig. Its pathetic.
This may be bad (keeping in mind the jury is still out on exactly how Microsoft is making this work) because in the case of office documents, the style is actually *part* of the content, from the perspective of Joe Office User.
.xml file, then that .xml file is practically useless to anyone who wants to collaborate with the original author since all of the styling information is lost.
.sxw (the default writer format) you're actually taking the raw data of the document, the styling rules for the document and a few other important bits and pieces and zipping them up into a single file.
i metype
.doc, .xls etc formats have changed and will need to be reverse-engineered again.
If Microsoft just puts the raw text data into a
As an example of a good way to do this (IMHO), take a look at how OpenOffice.org builds their files. When you make a
After unzipping this file, the following directory structure was exposed:
content.xml
META-INF/manifest.xml
meta.xml
m
settings.xml
styles.xml
With this type of design, you can get the best of both worlds. Technically, there is a separation between your presentation and content which allows simple programatic access to the data when necessary. At the same time, this design allows for full collaboration between people who also consider the styling of the data to be part of the content because the style rules for the content are included with the document.
With xml-saved Office documents containing only data and no style, collaboration between non-office users (and apparently Win9x users as well) will be no better off than before. Perhaps worse, assuming the binary
If this article is true and Microsoft has decided to remove the styling of their xml-saved office documents, I see two possible reasons for this:
The first is obvious. You're not using Office? Ok, second class citizen, here's the data but in a format that is next to useless for you to use.
The second possibility involves Microsoft just not being where they want to be with the Office XML sharing. Keep in mind that it took OpenOffice.org something like a year and half or so to define their XML interchange format. Microsoft may be going there, but due to overwhelming inertia, it just might not be going there very quickly.
Personally, I think the first option is the most likely. However, with OpenOffice.org working with OASIS and others on a common XML interchange format, I'm hoping Microsoft will be forced by the marketplace into option 2.
Best regards,
David
Your post can be summed up as. You can't beat them, so you might as well not try. Glad I don't have you as a role model. Look all through human history for examples of "pessimists" who said the above, and the "optimists" who ignored them, and succeeded in spite of, not because of the said advice. You'll find plenty of examples, and the overall benifit to humanity because of it. A Pessimist will never build a pyramid, or cross the ocean in a lone plane, because they're too busy telling everyone why they can't.
Look how surprised I am:
-Peter
It would also be nice to be able transform the XML via a provided XSLT into fo (FO at W3C and FOP). Then you could present the document as a PDF, RTF, Doc, Java applet, or whatever.
Yeah, it is both bad and unnecessary to remove the formatting from a document.
It is bad because when I open a document that has been emailed to me I want to see the formatting without having to use the same style sheet, end of story.
It is unnecessary because XML already allows you to tag the formatting separately to easily differentiate it from the content. In this way you should at least be able to see the text even if your word processor doesn't understand the formatting structure.
It's a good thing if it removes the actual formatting (tab here, bullet there) and replaces that with conceptual tags such as paragraph, list, etc, allowing each client to decide how those things are rendered.
In theory, someone should be able to create a word processor that, if it interprets all of the same elements and attributes in all of the same ways, can render the document identical to Word's interpretation. Instead, I believe we are getting the functional equivalent of being able to programmatically paste our text into notepad.
Seen any BadMarketing lately?
I wish it was that simple. Take, for example, XSL. Are you telling that XSL files aren't XML files? OK, so we agree that XSL is built on XML technology and XSL files are parsed with the same parser as any other XML file, right? Guess what? XSL means "The Extensible Stylesheet Language", which, like the name suggests, has something to do with the presentation.
XML is simply a format to store structured documents. If your document has no real structure a valid XML document (without XML declaration) could look like <data>...encoded binary data here...</data>.
You're right that it's a good idea to separate content and presentation but XML doesn't require (or even suggest) that.
_________________________
Spelling and grammar mistakes left as an exercise for the reader.
CSS2 includes elements for forcing page breaks, and one of the CSS3 drafts deals with identifying the page number of an element etc.
For an example of some of the things you can do with CSS2 + Javascript + XHTML have a look ata print preview of this page. The headings are auto-numbered by CSS2, there are page breaks between the sections and a ToC is generated by JavaScript. CSS3 will allow the ToC to contain page numbers.
Currently the only browser to render this page properly (IE, Moz and KHTML don't) is Opera 7 (the JavaScript seems not to work in 6).
I am TheRaven on Soylent News
You are thinking of XSL. XML doesn't specify "bold this." You're thinking of HTML. The whole point of XML is to NOT do that!
A well-formed SGML or XML document should have absolutely NO formatting information contained within the content. This allows the document to be completely portable since most formatting is dependent on the output media/device. Keeping the formatting out of the content means the output can be made to look correct regardless of whether the document is printed, displayed directly, converted to say PDF, displayed on a high res monitor or output on somebody's text capable cell phone by simply providing the appropriate style sheet. As soon as you put formatting into the content, you restrict the output to devices and media that support that formatting. Not a great example but even in HTML, the document is more portable if you use an <EM> tag instead of say a <B> tag since the output device can interpret emphasis a number of different ways but bold means bold.
At one point in my career, I was writing software to tag documents (SGML) that were originally intended to only be printed. We went through HELL developing code to recognize the myriad different ways the original authors had put in formatting as content and then trying to figure out what the formatting meant with regard to the document structure.
They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
Ben
Hogwash! The market deserves to die if it buys the stuff.
Those making the purchasing decisions must hoist the middle finger.
And don't say BeelzeBill made you buy it, either.
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
"Right now OpenOffice can't import word created .RTFs with nothing but text properly. This could a massive step towards compatability."
You know that Word has always been able to save in the Text(.txt) format don't you? Were was the massive step then?
"it does look as though XP XML has been so seriously crippled as to be useless"
And this is news?
Invoicing, Time Tracking, Reporting
"...connecting over XP servers..."
I must have missed the release of Windows XP Server.
samrolken
Dear dumb,
Yes.
You seem to forget that, in the context of office programs such as Word, the 'content' is the sum of 'text' + 'formatting' + 'presentation'. You need all 3, or you do not have a workable document. Having 'text' only is not enough. We are not talking about being able to read a .doc file on your scrollable cellphone screen here. We are talking about interoperability between all major office suite producers.
Proprietary document formats were fine at one point. Most people shared documents via printed paper, or shared them via "soft copy" within their own organizations. However, the time for printed documents and interoffice "soft copies" is over. We need the ability to share documents with the world in an easy to use, feature rich, and easy to edit format. Since a significant part of a document's legibility is in its style and formatting (or at least people are more apt to read a well formatted document over one which is not) text files are out.
.DOC file attachements.
Once an easy to use, open document format is created, and the ability to read and write those documents is built into many programs, I think we will see an end of
While there are currently some "open" formats like PDF and PS, the problem is that they are not easy to create for the average user, nor are they easy to edit. While PDF may be a good format, we need something better.
XML is a logical choice as a base for an open format because it is a well defined standard, it is text based, and is quite easy to parse.
But I ramble.
Remember, this is BETA 2, a lot of things change, and features are still added, between BETAS. (Ex: Spam Filter, etc.)
XML will be the document format, and an included XSL stylsheet will include the transformation for the formatting. You WANT to seperate content from formatting, that's why XML really came to be!
They can then bundle all of this up into one file. Remember it's still beta!
fuddles' 'secret' DocumeNTwithinaDocumeNT(tm) fee churn.
who wants that whoreabull stuff on their system.
why, those Godless felons who are STILL doing all that larcenious accouNTing upon on wall street of deceit, have luddition spasms, when they discover that they have emailed docmeNTs, with ALL of their revisions embeddead within(tm). talk about sharing?
lookout bullow. tell 'em robbIE.
Wow. Someone on /. suggesting we use an MS file format.
For those of you that aren't aware, RTF is an 'Open' format created by MS. All native word files I've looked at ('97 and earlier) used an RTF derived format. The RTF spec is availible from Microsoft, and is the most obfiscated document I have ever had the misfortune of having to read (in the end I gave up and derived the format from the output of wordpad, it was easier).
I am TheRaven on Soylent News
i'm just curious how within xml you denote that a word within a paragraph is to be bolded or in italics or whatever? an xml "document" in my mind looks something like:
<document>
<title>my title</title>
<chapter>
<paragraph> This is the first paragraph</paragraph>
<paragraph> another</paragraph>
<chapter/>
</document>
what if the word first is needed to be bold? maybe there's another way to structure a document like this.
Read some other articles, or better yet get ahold of a beta and try it out. The authors of this articles will feel like schmucks when they realize what they missed.
First off, by default, if you save the word document as XML, it gets saved as WordML,which preserves Word's styles and formatting in an XML name-space that's separate from the one bound to the schema-controlled data.
If you check off the checkbox "Data Only" then you will lose all formating and your own XSD will be used to map this document into XML data.
WordML looks like a XML'ified RTF language. It would be trival to create an XSL stylesheet that transforms WordML into HTML/CSS with all formating (that HTML is capable of) which directly mimics MS Word. OpenOffice could also eat WordML quite easily and have all the formating/style of Word.
What the authors of this article are REALLY bithing at, is the fact that MS didn't buy into the OpenOffice Document Specification from OASIS. MS prolly sees OASIS as the US sees the UN. Defunct, not needed.
If you describe your data using XML semantics, and all it takes to convert from semantic style A to B is some XSL, then who cares about forcing everyone to use one specific format.
-malakai
-Malakai
A Dragon Lives in my Garage
I suggest you get with it, and start slinging hyperbole like the rest of us, or find something better to do with your time.
Our clients use Lotus Notes. When an Outlook user sends someone an email with an attachment and chooses to send it in exchange rich text format, Lotus Notes doesn't play nice with that and converts the attachments to c.dat file. Our clients can't do anything with this c.dat file and the only option we have is for them to contact the sender and have them unselect exchange rich text format and everything works fine. My point being is that M$ has always had a proprietary way of going about things. Give them enough rope and eventually they will hang themselves on their proprietary ways. Don't think that companies are looking at open source as a viable alternative to M$ just because it's fun and different. It's because people are tired of being locked into M$ products and the constant layout of cash to get upgrades and *features* that nobody wants or needs.
Money not found! A)bort, R)etry, D)eclare Bankruptcy
The Constitution only gives people the right to pursue happiness. You have to catch it yourself.-- Ben Franklin
The Constitution of the United States does not contain any reference to the right to pursue happiness. The phrase is found in the first paragraph of the Declaration of Independence, as found at the National Archive
I highly doubt Benjamin Franklin would have made this mistake, since he was one of the few instrumental in the creation of both documents.
I don't read or respond to AC posts
Your post can be summed up as. You can't beat them, so you might as well not try.
Not quite. Each person needs to make an informed decision. Works for me, for you and for most others on this site. But for each one of us, there are thousands of them. So we can keep our money and our freedom for as long as we can. But eventually, they will bury us.
Glad I don't have you as a role model.
What, as someone with an opinion and an identity? I choose not to hide behing the anonymity of being an AC. You choose to. I'm glad I don't have you as a role model. Point being, we all have our own world view and our own opinions. And there is absolutely nothing wrong with that.
Look all through human history for examples of "pessimists" who said the above, and the "optimists" who ignored them, and succeeded in spite of, not because of the said advice. You'll find plenty of examples, and the overall benifit to humanity because of it. A Pessimist will never build a pyramid, or cross the ocean in a lone plane, because they're too busy telling everyone why they can't.
In this particular case I could have been plainer in my discourse. Build a better sucker trap. The current one (run by Microsoft) is working too well.
Fascism should more properly be called corporatism, since it is the merger of state and corporate power.
... they will rush right out and immediately buy more Microsft stock for themselves.
Ok, this may be a crazy idea, but what if the open source community raised money, and paid for advertiesing of OO. Compare it, show the downfalls of MS Office, etc. I think a lot of people don't know there are these great software tools available! I know when I tell my friends, even CS majors, they never heard of OO!
your bold text here :D
Check your favourite HTML tutorial.
Yes, good HTML is valid XML.
Unlike your example, which is not even valid XML. But that's beside the point.
If you "Save as XML" in Office 11, then by default the data is saved as WordML. WordML is an xml version of MS internal storage format (basically RTF). OpenOffice could quite easily write an interpreter for WordML. Hell, I could write an WYSIWYG editor for WordML in a day. If that. It's pretty simple if you understand the basics of RTF.
It's only when you Save as XML with the "Data Only" checkbox that you get into striping formating (and rightly so). Word WARNS you about this. In addition, you can specify your own XSD to save to. And word will VALIDATE this for. Not to mention, you can use a word tool to map elements of Word documents to elements of your schema. DAMN COOL.
In addition (As if that isn't enough) when you save, in either way, you have the option of specifiying a XSL style sheet. It'll go ahead and transform the output for you as part of the save.
Then only thing the OpenOffice people are upset about is that MS didn't buy into the OASIS/OpenOffice Document Specification. Tough shit. I'll write them an XSL that'll work again WordML to solve that for them. Lazy bastards.
-malakai
-Malakai
A Dragon Lives in my Garage
Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format...
That's because XML is not a file format, it is instead a format for file formats. To quote the O'Reilly "Learning XML" book, page 2:
Note that despite its name, XML is not itself a markup language: it's a set of rules for building markup languages.
I've said this many times on /. (look at my history), but the fact that a particular format is XML-based says nothing of your ability to read it. I'm even going beyond the fact that Microsoft could simply stick their traditional file formats into a CDATA and claim XML compliancy.
The statement "If Microsoft used a standard XML format for their documents then anyone could read them" makes as much sense as an equally stupid statement like "If Microsoft just used 8-bit bytes in their file formats then anyone could read them".
Sorry to rant, but the level of cluelessness around XML is astounding. Please read up, there's a ton of useful information on XML around the internet.
MDC
Do you have ESP?
InternetNews is authored by morons.
-malakai
-Malakai
A Dragon Lives in my Garage
Or spy on other people from a God perspective. Damn you! Now I'll have to spend the rest of my day realizing how pathically small my scope is...
It take more faith to believe in evolution than it takes to believe in God
I'm not trying to bait here, but hello? So everyone's whining (again) because you can't just open up a Word Document directly in OpenOffice / StarOffice / WhateverOffice? You've got better than that -- you've got the DATA, e.g. the English text, and you apply your own styles (xslt etc). This is what XML data exchange is all about no? Presentation is just syntactic sugar, no? Separate your data from presentation.
- Oisin
PGP KeyId: 0x08D63965
Technically, nothing stops you from inventing your own tag, it's just frowned upon.
The preferred way would be to define an element - say, <strong> - and leave the formatting to CSS or XSL. So in your stylesheet, it would say something like strong: { font-weight: bold; }
InfoWorld also writes about Office 2003 and the new XML features:
An excerpt...
"Once valid, the document can be saved as XML in two ways. The default is to create WordML, which preserves Word's styles and formatting in an XML name-space that's separate from the one bound to the schema-controlled data. You can optionally save through an XSLT transformation which, in a publish-to-the-Web scenario, could translate WordML formatting into HTML/CSS formatting. Alternatively, if you tick the Save as Data option, you can instead save just the raw XML data. In that case, you can bind one or more XSLT stylesheets to the document, each of which can generate WordML styles and formatting."
Beware: In C++, your friends can see your privates!
Nononono. Word is all about presentation of data. Some of the data IS the presentation. Writing, "The bullet points below" with a list of bullets below.
Taking the presentation out of data would be like making PSD"s xml but putting the colour in some hidden away place. You'd have only the useless basics and nothign else.
At least XLink the "presentation layer" you are imagining in, in a seperate resource file... ala XSL or SOMETHING.
-
ping -f 255.255.255.255 # if only
<p>Take a look at the Docbook spec for a good example of how something like this works. Basically it has a whole pile of tags that allow you to mark up your text semantically with tags like <computeroutput> or <programlisting>. There are also tags like <emphasis> that would probably give you bold text in the printed output.
"I would reserve final judgement until I saw an .xml file generated by Word."
Word XML
This is bad because with a Word document, the presentation IS the content.
Read your message again and you'll see (if you have 50 cents worth of brains) you just proved his point.
I think this Open Office guy is looking for a devil in Office 11 that isn't there.
Not only that, but why is somebody representing Open Office being asked in the first place? Is he the only one they could find to critique Office 2003's XML support? It's not like he's going to say something nice about his primary competitor.
Seems to me this is just another "loaded" article.
One of the main reasons to use XML is that it allows you to separate content from presentation For the presentation you should use a style sheet (XSL not CSS.)
"The idea is for XML not to specify how the information should be processed, but
rather leave that task to XSL (define) templates and other post-XML
processing steps," he said. "XML is supposed to be a presentation-neutral
format."
What a stupid comment from an "analyst". XML itself certainly is NOT "supposed to be presentation-neutral", if only because XSL, SVG, XHTML and myriad others presentation formats are still valid XML. What XML is is just a structured container for any data you want to store, which can be presentation or content, or both. The idea of separating content from presentation is valid, but it's pretty much orthogonal to XML. Moreover, XML formalisms (such as namespaces) make it quite easy to store both content and presentation together in one document and still be able to easily tell them apart and manipulate separately.
The fact that MS decided to not do that and just pretend that "XML means pure content" is another matter. In fact I expected that they will try to obfuscate formatting information in XML by storing it as an encoded binary, but they found an even simpler "solution" by just dropping it. Bravo Microsoft!
nslookup schemas.microsoft.com
Non-authoritative answer:
Name: msdn.microsoft.akadns.net
Address: 207.46.248.109
Aliases: schemas.microsoft.com
Just curious, what happens when you open the document without a network connection? Does it cache the schemas locally?
Enjoy,
It's just the normal noises in here.
please post a screenshot of how it should look, as i don't see any problems off the bat in mozilla 1.3b opera crashes on my gentoo system, so i can't check that out.
//FIXME: Bad
And this is bad how? Isn't this the dream that XML document proponents have aspired to for years? You just can't please some people...
And this statement says it is a bad thing, how?
Physician, heal thyself.
My beliefs do not require that you agree with them.
I'm alright with them stripping out the formatting of the data, but If they are leving in the "context" of the data, that is great. What I would like to see is something similar to XForms where where you can only write a document according to a schema. Sort of like a template. This would be a great boon to so many industries. Some of the articles I have read hint or imply that this is what MS is doing, but I haven't found any real proof. If someone has some then please post it.
please go back and take XML 101 from your local community college so you don't sound like such a moron in the future.
mmm-kay? Thanks.
Content and presentation, yes, but I can't imagine how to seperate content from structure.
Think about paragraphs in a document... a logical grouping of related sentences. As such they are important to both structure and content. Courier or Times font don't change the usefulness of document, but removing paragraph breaks certainly would.
I guess if you really wanted to strip all structure and content out of the XML representation of a business letter, all you would be left with is:
<document>Dear Bob, Thank you for your invoice dated 03-13-2003. This really helps clear up blah, blah, blahSincerely, Jeff jeff@foo.bar (123) 456-7890 Cc: Barb</document>
Doesn't seem very useful to me.
I take drugs seriously.
Your use of the tired "Bzzzzt" exclamation at the beginning of your post completely overwhelmed any potential interest in whatever it was that you were trying to say.
Please, next time try to avoid the condescending tone, people might respond more constructively.
...because XML is clearly the future of many kinds of document formats and it now puts the onus of creating an XML office document format on the shoulders of the open source community, rather than let Microsoft govern that arena. If the new version Office had come with proper XML formatting, it would have made some people's lives easier, but it would have come at the cost of us forever chasing Microsoft's ever-changing standard and they certainly would make sure to never leave it standing still long enough for any competing word processor to properly support its XML structure.
This is a good thing, folks.
From the article:
"XML and Web services use, especially for content-driven applications, is still very much limited to basic use of XML as a data-exchange mechanism between systems -- primarily for internal integration approaches," he said. "When dealing with exchanging information internally, what is most important is not to bundle all collaborative features into making for a huge, cumbersome XML file that only certain applications can process, but rather to strip out all the presentation layer features and focus on just the data to be exchanged. In this case, I don't see how Microsoft is violating that. You can choose to save a document with all the rich presentation data left in, if you choose (and that data will only be processable by Office applications), or you can choose to save the XML with just the data in it. I don't see how that cripples anything."
If MS is doing XML right, an XML export from word will only mark the text file with the necessary handles to bind to the formatting file. If you open the text file without the formatting file, you get rather plain text.
The same thing happened with MSHTML. Yes, it's got a lot of proprietary comments in it (the "" tags), but the CSS and formatting designations are as standard as the crude hacks and random idosyncracies that a human web designer may do.
Plus, it's only an "early beta." I hope that the authors of the article send their comments to MS, so MS can expand on what their XML exports can do.
Was that "big content management and CORROBORATION systems providers"?
An informative and informed post about a Microsoft product on /.
And it's also a posting on a product by someone who has actually used it!
Isn't that one of the signs of the apocalypse?
Insightful? Make that clueless moron.
.XML format for interoperability", what the heck are you talking about? The XML format specifies ways of encoding certain type of information, but without a DTD it's *worthless*. In the sense you are using it, there's no such thing as "the XML format". If you or the OP had bothered to read the article, you'd have understood that what microsoft is doing is removing all the formatting information, in the sense of "this is a normal paragraph, this is a numbered list, this is emphasized" and so on. Is that easy enough for you to understand?
When you say "The main point here is to encourage the
Go buy a couple of clues and come back then, mm'key?
It does this in two ways. One, you can bind a Schema of your own to a Word .DOT file and anything created using that template must validate against the Schema. Then when you save the data as xml, the xml doc will conform to the schema. You can still markup the document in word, but the SAve As XML (Data Only) will mimic the schema. You would have to Save as WordML to get formating you added (which you shouldn't have added if your using a schema anyhow).
Two, there's a new application called InfoPath (formerly XDocs). InfoPath is primarily designed for getting the data. Esentially XML enabled forms. The forms are easy to constuct based off XSD's. InfoPath and then (obviously) hand that data to word to be format (like a resume) or to Excel to be analyzed.
-malakai
-Malakai
A Dragon Lives in my Garage
"So Microsoft will continue its efforts to lock-in users with proprietary formats, and hopefully the rest of the world will produce an XML standard document format without them."
That is 100% correct. Microsoft will never produce products in an atmosphere of cooperation. The have made a cold calculation that with their monopoly muscle they don't have to.
It truly is Microsoft against the world. What everyone (except Microsoft) should do is cooperate with each other to produce open standards and then follow them. If all products interoperate except for Microsoft's, it will become painfully obvious that Microsoft is using its monopoly power to set its own proprietary standards with the objective of extending its monopoly.
The only real solution here is for the US justice system to break up this monopoly into at least three groups and enact laws that prevent the three from acting as one giant monopoly.
The race isn't always to the swift... but that's the way to bet!
Get over it!
Here you go (108KB)
The reason it doesn't work in Moz is a known bug (number 3247).
I am TheRaven on Soylent News
Let me just put on my Joe User cap here for a sec. I have a rough understanding of what XML is and what it's for (despite the considerable noise). The problem is that what XML is has nothing to do with how users perceive their documents.
Think about that for a second. You can talk about concocting your own schemas, and what the standard is supposed to do, all night. Joe User wants his fucking document to look like how he typed it. He wants to be able to send it to others and have them read it. That's it. If the format (or whatever you want to call the XML implementation) doesn't do that with anything but another copy of (the latest) Word, it is not useless, but very very close.
It's amazing to think that we still have these problems sharing simple documents. It should work like email, by now. We probably have MS to blame for this situation.
So, RTF. Whenever someone sends me a Word doc - even though I can easily strip out the crap with OS X TextEdit - I usually politely ask them to re-send as an RTF. When I tell them its 'just another choice in the little pull-down menu', they are usually happy to do so. I tell them that RTFs will get read on Macs and older PCs more easily, and not loose any formatting (in almost all cases). People understand this and are willing to comply. Only MS fanboys have anything remotely resembling a problem with this.
If Jesus wants me it knows where to find me.
Well, you typically surround it with some markup.
For example, <emphasis> is an element in DocBook used for doing just that. You surround the content in this tag.
Then your sytlesheet defines how the content within that element is rendered. Idea though is that you are not tied to that content being bold no matter what. You can change your stylesheet to make that italic if you wish, without having to modify every instance in every document.
Or, you can have many different stylesheets that behave differently depending on your need. Having text as bold is not really the best example of this.
A better example is with web/print fonts: general consensus is that serif'ed fonts are best for print, non-serif for web. Now say you want to publish the same content as both printed material and on your website. You can do this by pushing the XML through two different stylesheets.
There is much cruelty in the universe, John.
Yeah, we seem to have the tour map.
Did I Make IT!
Sometimes you need your karma humbled
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
I've yet to see this kind of ignorance from this group but what the hell would you call (x)html.
Well, the key is you have to be ready to do it a different Right Way.
No, I can't submit my final paper as an HTML doc. But building it in HTML makes it much easier to send to my advisor or my editors. I do footnoting with hyperlinks, TOC with hyperlinks, and citations inline with hyperlinks to the citations page.
Really, if I felt like nerding it up a bit, I'd make a quick WIKI and include a commenting engine, automatic citing, autmatic TOC and a search engine. But this is linguistics. It's hard enough to get the prof to use a computer instead of post-it notes in my mailbox.
I'm doing it in HTML for ubiquity's sake, and because it means I can work anywhere. I do a lot of my best writing locked in a small room at the university with a VT terminal. No distractions. No need for concerning myself with layout. Content is king, nobody's impressed by 14 point Comic Sans anymore. When I was an undergrad in 2000, I used to print out my rhet essays on greenbar. I'd type them in Pico and then send them to the engineering printers, thus avoiding the thirty minute line for the laser printer. There's something so satisfying about a freshly daisywheeled essay.
Hey freaks: now you're ju
XML is no place for presentation markup. That should be done with XSL.
I think the root of the confusion goes back to Golfarb's original theory for SGML-- that the styles in a document are secondary to the structures, and should be kept separate.
This has been a religious conviction ever since, despite the fact that most authors are messy and intuitive, and SGML-etc are very, very rigid and unintuitive. The rationalisation is that messy authors can just represent their styles using 'fake' (ad hoc) XML, but if this turns out to be 90% of the real users of MS Office, then I think MS could indeed save valid XML, but it won't be portable in any useful sense.
Most agenecy use Office because they think their customers use all of their special features, when they don't. Their use of XML as they intend is really rather a MS way of doing things, just like in IE. The way OpenOffice handles XML is quite cool, if you have unziped a document, its organized into several files ones for styles and content. Anyway what was my point? Oh yeah MS sucks and opensource rules!
Second, the slashdot story says this is from "InternetWorld" yet the link is to a site called "InternetNews". They appear to NOT have the same publisher/owner. Is this a typo?
I also found an article (much better) on InfoWorld which contains more details, and is by someone who actually used the product.
Third, and I missed this before, check out this quote from the InternetNews article:
"Looks like" "Reports are". Did this guy EVEN USE THE PRODUCT?
Christs, we should be allowed to Moderate the sources of articles. This is a pretty lousy article, and if it's indictive of InternetNews, i'd rather not hear any more stories from them.
Fourthly, I notice on my GoogleNews page that the google autogenerated this InternetNews story as the top link. this is no doubt why we have to deal with it on slashdot.
-Malakai
-Malakai
A Dragon Lives in my Garage
microsoft file formats are closed!!! my god!
Those cultures/nations that don't think money is everything usually think that way because they don't have much of it. The materialistic way of life is not limited to the USofA. As globalization spreads economic development around the world the "holistic" ways of life shall fade into nothingness slowly but surely.....replaced by greed and consumerism. Just today I read an article in the WSJ about how obesity is spreading in China due to McDonalds. :)
Lastly, I am not rich. I'm actually very poor. And I don't want to hear anything about how the poor in US/W. Europe are still richer compared to third world nations. Its irrelevant to my personal experience.
Mac OS X and Windows XP working side by side to fight back the night.
Most of us in the advertising and marketing fields are not dependant on just windows products. The lionshare of the advertising market is Mac based and could not be generally assumed to be sharing anything via XP servers. Moves like this will put a crimp on collaborative projects in the market segment.
More than likely this will create bad backlash and M$ will have to do some back pedalling to correct the mistake. It is also the type of move that will discourage big bunches of people from bothering to upgrade.
Some people don't learn unless you hit them with a brick.
Isn't office 2k3 suppose to support drm encryption? If so then this would make the file format useless since it will be encrypted.
From a pure bussiness standpoint (not technical)a close proprietary file format is essential if you want consumer lock-in to keep prices sky high. If a competitor can write software that can read your files and format them proprerly then you lose your file format monopoly and would have to compete with everyone else.
http://saveie6.com/
the article says it all.
.xml. He noted that he opened a heavily formatted .doc Microsoft Word file, saved the file as XML, and later opened the file in Word 2003.
.doc file," he said. "And if I open up the XML file in a text editor, I can see that all of the formatting is properly maintained in the XML file."
to quote:
However, Mark McWilliams, a software engineer and Office 2003 beta tester, said he has seen nothing to indicate that Office 2003 removes formatting information from files saved in
"The opened XML document looks exactly like the original
He also noted that when saving a file, a user has the option of saving in a "data only" XML format which does remove formatting.
Why don't people just read the article before posting about how terrible this is.
If you look past the first couple of paragraphs, you can see that formatting is basically kept, expect when graphics are used (precisley where would the graphics fit in a XML file?), and the feature that this guy (and most people) seem to be bitching about is the option which allows data-only. Surely this is a good feature, as we can have formatting stuff which is interoperable with other MS Words, and an option which is just data!
Given the style.xml and the content.xml, it should be relatively straight forward to generate xslt's to spew the content into whatever format one wants (xhtml, rtf, plain text). Microsoft could slap together a viewer application that could do this in short order or someone could take it on as a skunkworks project to do in their spare time.
this is what Microsloth means by "inovation"
---- "Logoff! That cookie shit makes me nervous!" - A. Soprano
*Applause*
Without going into the evils of microsoft and it's office products, how there are better OS's and products out there on the market, I'd like to ask: WHY??? Office 95 to 97 was a substantial jump. 97 to 2000 was a fairly substantial jump. Stability, document abilities, general ease of use. Most people were happy with 2000. Stable, if large. 2000 to XP: Smaller install, activation / registration nightmare, some interface changes, but otherwise the application is the same. How documents are saved, their base format has been changed, yet to the end user this should be transparent. XP to 2003: What is the major differences? I mean... yes, it's going to be new, in a new year, but why would Joe Schmoe, Enterprise User (Or home user for that matter) want to shell out a couple hundred dollars per license where the increase in functionality will be limited? Increased document collaboration would be good, yes, but is it truly worth the cost? How many users don't KNOW how to use the advanced features? I work as a sysadmin at a plastics factory, and the majority of the users barely know how to use a keyboard. I've worked in an insurance company, where I had to teach the corelation between moving the mouse and the pointer on the screen moving. I've done the dot-com thing, with users wanting more but not using it properly. What are the odds that an entire company would be utilizing the software to it's fullest potential? And what percentage of a company would actually get an advantage out of using these features, compared to the time required to train an entire office? Half of it would backfire if some users didn't understand the base concepts, as most don't.. Thoughts?
When all else fails, use fire.
scripsit dasmegabyte:
Well, if you can convince your university's format committee to pitch their format guide and that a ``different Right Way'' is OK, more power to you. That's not a battle I would want to fight, though.
Which is why I use LaTeX and vim for my papers. My preference these days is to use a laptop at a coffee house, personally, but to each his own.
Speaking as one who grades essays, I can tell you that's a good way to irritate your grader. I expect students to follow style instructions, and an ASCII dump to line printer is not part of the style guide. (I don't want one essay printed on different size paper than everyone else's, and I want to see italics used properly, and French and German names spelled correctly...)
In principio creauit Linus Linucem.
"I'll say it again for the logic-impaired." -- Larry Wall.
How exactly did pyramids help humanity? Seriously. Other than being a testament to man's ingenuity and a pharoah's ego, they offer nothing but a big stone thing to look at.
Maybe the ms word XML format looks like:
c >
<?xml version="1.0"?>
<worddoc>#($*HKJ#$#*....</worddo
The author of the article, and the people he's quoting, need to get a clue. There is no such critter as "XP Server", yet they're flapping their arms, running in circles screaming that people will need to use it.
From web bugs to attempts to exploit Outlook holes to embedded annoying noises to explicit porn photos to... whew, I'm out of breath already!
I go out of my way to disable HTML in email. It's an abomination, resulting in bloated messages and untoward exposure to mail-borne attacks.
If your message is truly important and meaningful to its recipient, it will not suffer from the lack of bright red 48 pt. bold Chancery Imprimateur headline text, describing the 761KB JPEG photo of your newest widget (or whatever you may be writing about). The net is not all about eye candy.
Mail? Put "slashdot" in the subject to pass the spam filters.
I'm not surprised by any means that MS would chose a proprietary standard. It seems MS over the years has made importing/exporting .DOC documents much harder; locking people to using their apps. Which for MS is from a business/revenue perspective is understandable. It seems Wordperfect has OTOH, been much easier to convert/import and most documents I've imported have been nearly flawless.
Perhaps on a separate note, what format would be best to use to compose essays and large documents in non-corporate environment. I compose a lot of documents as a student and I require something that I can easily format and safekeep electronically for many years. Other than POT ( no, not that but Plain OLD TEXT ), would some form of XML be better or Tex/TeTex..... ? It would be nice to standardize everything to one format and not have to worry many years later about not being able to retrieve it.
Seems to me there needs to be an answer to the M$ .doc format. So who wants to help me code the program to extract the terribly structured presentation data out of 'their' format and output a presentable XML version of the file. Then our various clients would have a wonderful way to ensure true cross-platform document sharing... we'd at least get the rest of the OSX users to stop paying a partial M$ tax. I need a job to prevent all this extra coding energy from thwarting the monopolization of the business desktop.
Fnord.sig
Excuse me if I don't take this article seriously, but the author apparently knows nothing about Windows. Office 2003 will only work on Windows 2000 or 2003? Not Windows XP? Maybe he meant that the collaboration servers require Windows 2000 servers or Windows Server 2003 servers, since there is no XP Server. And speaking of XP, what exactly does he mean by "connecting over XP servers"? That's simply impossible -- there is no server version of XP, only Home and Pro.
As for Microsoft not supporting Office on the obsolete Win9x platforms, good for them. It's past time for Win9x to be killed off once and for all. Not supporting it in Office is a good step forward.
The review presented is only the first reviewer. The second reviewer disputed his claims indicating that indeed all formatting is shown in XML.
Typical Slashdot rhetoric, peeps spouting off just based on the article snippet (which is often spurious) without reading the article
RTFA
XSL is a very special case of XML file where the content is presentation information for other documents, not just for that particular XSL document. Thus it doesn't really violate the idea of content/presentation separation.
Your rant is spot-on. I have stated the same thing many times in the past (a couple of times on /.).
However, the statement, "If Microsoft used a standard XML format for their documents then anyone could read them," makes sense if you s/format/schema/ . There *is* a proposed standard XML schema for word processing documents.
The article is sound, with only a few symantic errors (confusing XML format with XML schema, for instance).
Now, if people would just start using the jargon properly....
Microsoft is to software what Budweiser is to beer.
The post you replied to was exactly correct. Your rant is a tangent (imagine that, on Slashdot!). Yes, it might be nice to keep format and content separate, but that is not the point, here. And I don't believe it was the point the original poster was trying to get across.
not having formatting information at all!
As I understand it, when you save into an XML format from 2003, you loose all formatting. Not that it is put somewhere else, but that nice indented italicized quote you put in the middle of your paper is going to end up looking like everything else.
hmm, looks pretty similar to mine
Besides the font, which is probably just because i have it set that way...?
//FIXME: Bad
The point is to seperate them *in the same file*. This is not the same as deleting the presentation information!
You must be new around here. Being condescending and inflammatory when you correct people is the /. way.
They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
Ben
Illustrator and Photoshop can open PDF files is basically just an afterthought. Maybe if the native Photoshop file format (PSD) was open you'd have a point. Anyway...
If Microsoft followed a similar model, I'm sure that Microsoft Word will continue to be the industry standard in word processing software, and Microsoft as a business won't be any less richer for it.
Photoshop's competitors - from Fireworks and Freehand to Corel Draw to all the little graphics apps that you can pick up for fifty bucks or that come with scanners/cameras - have a far greater market share than Word's competitors. There is absolutely no incentive, from Microsoft's point of view, to risk giving them more.
that Michael update the article description with a very well worded retraction of the stupidity that is the current /. article description.
/. look stupider than almost anything I can remember in the last several months, daily duplicates included! And people complain about Microsoft FUD?
This single article description makes
Contrary to popular belief, coding is not all free blow-jobs and beer. Those things cost MONEY!
"How the hell can I cite where in a book I got a quotation if there is no standard pagination? Count paragraphs?"
How the hell did people do it before computers? What do you do when you're citing an offline book? How does XML magically handle that?
Look more closely. Compare the text of the headings. Notice anything?
I am TheRaven on Soylent News
Looks like the article might be a little bit short sighted. If this post is true, then maybe it would be fairly easy to write readers/writers for the O2+K format. I find it hard to believe that MS given it's reputation would implement this in the final release though but people change.
Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws-Plato
Is it possible to write an add-in for Office that will save your document in a standard XML format?
http://www.askthevoid.com
Unfortunately, particularly in the MS-dominated part of the consultancy marketplace, there is a premium placed on having, using, and understanding the latest, greatest stuff.
IOW - by not jumping on the MS upgrade bandwagon, you automatically undermine your credibility with a large number of (potential) clients. MS has worked the marketplace so this is true. (e.g., certification upgrades, software licensing policies.)
Of course, there are plenty of clients who don't buy into the MS treadmill approach to business computing. It's just that if you dig in your heels too much against the upgrade brigade, you've limited yourself. Great for growing spine, but not necessarily the best strategy for growing a paycheck.
(Yes, I will have more Kool-Aid, please!)
Agreed.
Read the freakin' article. Fifth paragraph:
.xml. He noted that he opened a heavily formatted .doc Microsoft Word file, saved the file as XML, and later opened the file in Word 2003.
However, Mark McWilliams, a software engineer and Office 2003 beta tester, said he has seen nothing to indicate that Office 2003 removes formatting information from files saved in
Sheesh.
This may be bad (keeping in mind the jury is still out on exactly how Microsoft is making this work) because in the case of office documents, the style is actually *part* of the content, from the perspective of Joe Office User.
So let me get this straight: Is it better to ignore standards to make things easier for Joe User (as IE is vilified for doing with accepting broken HTML) or is it better to follow standards and break what Joe User wants (as you are vilifying Office for doing)?
Mmmm.. Donuts
"What if the word first needed to be bold?"
That is more data. If the Xml contained that additional data, it might be formed like this:
my title
This is the first paragraph
another
Note that the xml does nothing other than specify that the word 'first' has the characteristic 'bold'. It could also have the characteristic 'IHATEPEOPLE' if that wascontained the xml data. The formatting stylesheet (or whatever program you are using to display the xml data) determines how to make the characteristics for the appropriate appear on your screen.
If the text displaying program knew that the text inbetween the tags was actually to be displayed as bold text onscreen, then it would make it so. If data with the tag was to be displayed in green in 10 point font, then it would display any words that way.
The program decides what to do with the data, the data doesnt decide what to do with the data.
Exactly. There are any number of ways you can do this. For example, placing a stylesheet in the top of the document, or placing style attributes on elements. This is exactly what you do with CSS and HTML. If you aren't familiar with this pair of technologies, I suggest you learn. It makes life easier.
How do you figure this is anti-trust? Microsoft has been judged a monopolist. Since past behavior is a good indicator of future behavior, there is a presumption that this is anti-competitive behavior until proven otherwise.
This is simply a company who has the dominant product protecting their lead.
For a monopolist, nothing is simply any more. In the absense of market forces to correct misbehavior, exactly how they attempt to protect their lead does matter.
And quite honestly, I dont see anything wrong with that, as long as they confine their practices to their product (ie. they arent making Office the only suite that can run on windows) [emphasis added]
As long as nothing in the Office Suite promotes the Desktop OS monopoly.
As long as nothing in the Desktop OS monopoly promotes their own Office Suite.
But this isnt a game, this is business.
And screwing your customers is bad business.
And screwing your suppliers is bad business.
And screwing your investors is bad business.
And screwing your employees is bad business.
Even screwing your competitors is bad business.
And since businesses are SUPPOSED to make money, they need to make sure people continue to buy MS Office.
And General Motors needs to make sure people continue to buy Chevrolets.
And making an office suite that shares documents with all the various third-tier office suites just doesnt do that.
It just makes incomprehensible gibberish unless the recipient happens to have the exact same sooper-dooper magic decoder ring. Unless I can read my stuff, under circumstances of my own choosing, I have a problem. Unless I can send stuff to my correspondents and they can read it un circumstances of their own choosing, I have a problem. If my documents are hostage to the whims of a supplier, I have a problem.
Why should my company buy MS Office if the documents it produces are exactly the same as those of FreeBeerOffice?
New twist on Clippy?
No reason they should. That's Microsoft's problem, not yours or your company's (unless you work for Microsoft;)
The Word editor itself has some features that nobody else has, and has a huge (much larger than Windows, apparently) body of developers, researchers, and testers behind it. It actually is a pretty useful and good product and it cannot be replicated by any smaller startup or open-source effort.
In truth MicroSoft can compete fairly on "innovation" and "features" and their Word market would be as big or larger than it is now. They would also get rid of some of the hostility directed toward them by everybody else in the industry, this hostility is probably hurting their sales more than any other competition is. It is also possible their engineers will produce better work, IMHO the evil company behavior is certainly affecting the desire of any engineer with a conscious to do quality work.
MicroSoft should stop acting like asses and start to show a cooperative face and they could change the entire attitude of the computer industry toward them, and still end up on the top of the heap or even more powerful than they are now.
Well.. I dont personally program.. but that doesn't stop me from thinking about this..
Wouldn't it be apparent that the things Microsoft does to try to keep its market staying with them, will eventually be the things that drive them away?
Microsoft implements so much stuff in their products to prevent an "openness" that aren't these things just slowing down the programs? If we didnt have 2ghz+ machines would microsoft still be deveoloping algorithms that take up precious resources for the sake of keeping their software propriatary?
If Microsoft developed office that they tried to make it compatible in exporting and importing in every way, I think even more people in the open source community would be interested.. Wouldn't they?
- what is the definition of simultanagnosia?! I've been meaning to look it up!
I am seriosly pissed and it just so happens that this article is relvant to my frustration. I am presently at my Universities state-of-the-art computer lab which runs windows XP and officeXP on brand new P4 Dells. I was starting to enjoy some of the admitted advanced features of powerpointXP, when all of the sudden my computer CRASHED. I have lost my entire midterm powerpoint speech which was saved on a floppy disk.
For all of you who are still locked into myth that you somehow need Microsoft Office products and standards, I seriously suggest you take heed to that which I have experienced today. I for one will start using magicpoint, LaTeX, and html in the future for presentations, and will continue to use abiword and gnumeric as my office sweet (still waiting for that OpenOffice woody backport). I fatefully regret even considering to using powerpoint.
Also, where are my intellectual property rights? My intellectual property was just DESTROYED because a mega-corporation which holds the worlds most illegitimate monopoly has refused to adopt standards which allows the free flow of information and ideas. Where is my army of lawyers and my monetary compensation?
I will say here that the Microsoft corporation should be thoroughly dismantled through public institutions such as our Federal Supreme Court. Their is no legitimacy whatsoever to the existance of Microsoft.
We can fight this monster of proprietary standards if we all just refuse to use these standards and explain to others why we do so. I suggest creating flyers and writing to your school explaing to them why microsoft products should not be used in any academic environment. Hail GNU, free software, and those who make it happen!
]>
Crappydoc
William H. Gates III
BORG
Unimatrix 0
Secondary information processing adjunct
Doc about crappy M$ things.
Haha, you cant parse this, it's BINARY! You're still screwed!
firoiorfioeiojvonvonviniooiwnconcooisoi39f940f943
You couldn't be more wrong. Although I can do anything I want within the law to maintain or increase my market position, once I have a monopoly I cannot legally do some of those same things any more.
For example, if I am Red Hat, it is perfectly legal for me to make an agreement with a hardware manufacturer to charge them based on how many PCs they sell, whether or not they have Red Hat installed. If I am Microsoft, and have an OS monopoly, this very same business practice is ILLEGAL.
Ummm, good HTML is not necessarily valid XML.
<p><b>Hello<br>It's me again</b>
Is perfectly valid HTML, yet not valid XML. But that's beside the point. The point is, you're an idiot, mmmmmkay?
Why doesn't someone save an xml document with Word2K3 and then post the xml (or a link to it)? Then we could see how much formatting is in there.
scripsit realdpk:
Um, you cite the page number in every style I know of.
The point was that HTML provides no way to ensure that two people reading your work have the same pagination. (And I'm not saying HTML sucks, it's just not what HTML was meant to do.) I didn't say ``use XML vice HTML.'' I was responding to someone who said (roughly) ``we don't need any other kind of XML for document preparation, because we already have (X)HTML.''
In principio creauit Linus Linucem.
I agree.
First Microsoft built HTML export features into Word, and everyone complained that all the formatting and presentation cruft was retained and made the documents overly bloated and sloppy.
Now Microsoft builds XML export into Word, and everyone complains that the formatting and presentation is missing!
WTF?
Office 11 is going to have a proprietary file format (M$ will undoubtedly call this a "security feature"). Since the DMCA makes the circumvention of any "security feature" a felony it will soon become a felony to access documents that you created in Word 11(?) or any other "Office Suite" product with any program but the M$ program with which it was created. This sounds a lot like anti-competitive behavior to me. Moreover do you think for a minute that the "update" feature in Windows won't rat out the fact that you accessed a file created with M$ Office without using the appropriate M$ software. If you do we need to talk about a bridge I have for sale. M$ has worn out their welcome in my shop - I probably should have in yours. BTW -I'm posting this A/C as my licensing agreement with M$ forbids me from disparaging M$. Tough I did it anyway.
{b}Grow some spine, explain things to them, and you'll be surprised about how many of them get it... And yes, I'm doing rather well even with principles.
So your principles involve pulling clients into your own holy war and immersing them in technical details that don't directly drive their business?
If I was a customer of a business who demanded that I start changing my technical decisions in order to communicate with them, I wouldn't remain their customer long.
Yeh, and also, you might not just set it loose, but you might lose all your document formatting, too.
That the OpenOffice representative on OASIS clearly has not a clue what XML's purpose is?
That's because the HTML export was meant for the web, so bloat matters. An XML document format is meant for easy readability, so bloat doesn't matter quite as much. This is an extremely obvious distinction.
The point is to seperate them *in the same file*. This is not the same as deleting the presentation information!
Then you're not talking about XML. The point is not to have presentation information in the file at all--that is supplied by an *external* stylesheet. Your DTD or Schema, which may be part of the document, ARE NOT about presentation. They are about the wellformness and validity of your tree structure. And that's XML: your XML tagging and your DTD/Schema.
Pure XML is completely content-driven, with no presentation markup. (Go take a look at the XML format of a document on the W3's web site.)
There are cases where conceptually content and presentation are intermingled, and you have to explicitly take those cases into account in your DTD/Schema. For example, bolding a word is both about presentation AND content context. So, an XML element needs to be defined, say [EM_ELEMENT] which your stylesheet recognizes and handles properly when translating your XML document to a printable format (whether that be a word doc, pdf, or whatever bizarre format your printer requires.)
This is supposed to be funny, not a troll. Mod this guy up.
..
OK for dumass moderators, here is the humor
"requiring Windows 2000 or 2003"
---> Will run only on win2k or win2003 (And not winXP). That my dear friend is FUNNY
Sorry, I have nothing to say. I just had to laugh at someone taking the EU seriously.
I'm not fond of Microsoft (indeed, some people consider me a 'basher'), but I don't think the majority of Slashdot's reaction has been rational in this case. For instance, some quick googling brought up the following comment by Tim Bray; he is one of the authors of XML, by the way. ;-)
I think it would be interesting to take a look at an example WordML document. Unfortunately, I don't have - nor to I plan to get - the beta for Office 2003. Would anybody like to post an example document of what Word 2003 presents?
It is impossible to enjoy idling thoroughly unless one has plenty of work to do.
- Jerome Klapka Jerome
You would think CmdrTaco could get first post :)
"Furthermore, Office's new collaboration featres will only work with users who are also running Office 2003 (requiring Windows 2000 or 2003) that are connecting over XP servers."
Did anybody actually try and read and understand this sentance. Because if you read it this guy refers to Windows 2003 as a workstation and totally leaves out XP in the process. In addition to refering to XP as a server.
I don't know if any of you cought on to that but it takes all credibility away from this guy in my eyes when he doesn't even understand the basics of the technology he is bitching about.
For Office '03 docs, it may turn out that this is so, However, a lot of the push - and controversy - behind .NET was it's XML storage and messaging facilities for web apps. If you believe that was going to be completely non-proprietary, I could probably sell you some prime tundra up here in Canada.
"Embrace and Extend" is such a wonderfully misleading concept. On the surface, it appears that Microsoft is a big proponent of open standards. They're members of some of the more influential standards boards, and have actually deigned to submit the C# spec to the ECMA.
However, the pivotal falsity of the whole enterprise is that underneath a thin vineer, every Microsoft app is locked tightly by the virtue of legality and obfuscated design.
Possibly it's a tension between management (who want to protect sales) and the younger stafflings (who want to work with the Next Best Thing). I prefer to imagine a small cadre of techno-literates whose job is to peruse worldwide resources, looking for something up and coming to latch onto and connect Windows with in a superficial, candyass way.
I believe it's these people who are also responsible for product naming:
DirectX
ActiveX
XBox
Windows XP
Wtf?
But I digress. The point is that every product produced by Microsoft has exactly the same nuts and bolts of half a dozen competitors, but these constituent parts are used in completely non-compatible ways. So while Microsoft can claim that they are "Standards Based" or whatever, their products will never gain the benefit of following standards -- interoperability.
Have you ever put an html table in style sheet, and the rest of data in html?
Like:
you dumb as
stupid
Agree in principle for the most part, but ...
What if you are designing a newsletter or something, customized to a specific layout and paper size?
Shouldn't there be some kind of formatting in the XML there? (Assuming you don't want to lock yourself into one word processor, which is the whole point of this)
Have you ever put an html table in style sheet, and the rest of data in html?
Like:
data:
you
dumb
ass
stupid
brain
of you
Presentation: (in xsl)
I gess that's very separated.
Try to display that.
Essentially, it is the same as just saving as plain text which has already been available since Office 95.
It's been available just a little bit longer than that, sonny.
That means you would find nothing wrong if Bill Gates would hire hitmen to kill off some people that would "threaten his lead", right?
In a democracy (no, I am certainly not talking about the USA) the state is passing laws against anticompetitive behaviour to protect customers and competitors from artificially created barriers of entry and other unfair practices.
Example:
In the EU, printer makers will soon (I don't know the exact date, could be 2005 or 2006) have to stop building anti-copy chips into refill cartridges.
This is good for customers because it brings down prices for refill cartridges, is good for the environment because people will buy less printers and more cartridges and is good for 3rd party competition.
(In the USA, anti-copy chips are not only allowed, the DMCA forbids 3rd parties from creating a market.)
Anyway, returning back to Microsoft: Microsoft of course shall be allowed to do anything to improve their products, but encrypted formats don't improve their products, they just decrease interoperability, which is bad for everybody: The competition, the competition's customers and MS' customer's.
If there is no law against that, well there should be.
I took the trouble to go to Internet World and read the ENTIRE article. The portion of the article quoted on /. clearly implies the informtion being received about MS is second hand. "REPORTS ARE [emphasis added] that when saving to XML, [Office 2003] strips out the presentation and formatting information...". The person quoted is a representative of the OASIS OpenOffice XML Format Technical Committee, so there is a definite risk of bias, particularly when coupled with secondhand information.
The article goes on to quote someone who is actually is an Office 2003 beta-tester. He claims that saving in an XML format does not, in fact, strip out the formatting, and states the tests he ran to confirm this.
The source of confusion may be in different XML formats supported by Office 2003. There are two, one of which strips out all of the formatting information, while the other does not. A lively debate then ensues between the pros and cons of both approaches.
Open file formats mean that a geek living in the basement, or Mom and Pop Software Solutions can write programs that simply use the fully documented, existing file formats. Effectively it means that a software product wouldn't be paying for the actual _design_ of the product, but only the implementation of it. Atleast part of the product market price should pay for the design/research/infrastructure invested in it, isn't it?
Any product cannot be so easily duplicated becuase the infrastructure is prohibitively expensive, and/or isn't worth the effort. In some cases, you can make sure you've an exclusive right in making the product, with patents. Since we know infrastructure in software development costs next to nothing (since most of them own a computer anyway) and we need file format to be open AND without patents, how does a company that wants to make money on software products survive in an Ideal World?
[quote]
Apparently, all formatting and presentation information is removed from the XML
[/quote]
So that would be a bit like XHTML and CSS then wouldn't it where the formatting and presentation information is removed from the XHTML.
Umm... GOOD.
I don't want the presentation information mixed in with the content. As a developer I'd be complaining if they did that.
I'm not the AC you respond to, but I think he meant somebody who always gives up before even trying.
OfficeXP is already 2 years on the market and has only captured about a quarter of the market, most people still use Office97 and Office2K. After a release adoptions slows down, so it will take much longer than 2 years to reach even more than half.
This gives use plenty of time. OpenOffice is available, unlike Office2003 runs on Win9x, Linux, MacOSX and Solaris and is free. There are 200 million Win9x users out there who will happily upgrade to OO when somebody gets off his ass and tell them.
If you or the OP had bothered to read the article, you'd have understood that what microsoft is doing is removing all the formatting information, in the sense of "this is a normal paragraph, this is a numbered list, this is emphasized" and so on.
So let me see. You are agreeing with me that they are removing formatting information from the document, and yet you call me a clueless moron. My question is, what do you call someone who agrees with the moron?
When I say XML format, I mean exactly that - a document encoded in good and proper XML. Of course you need a DTD - I figured that most people here would have figured that out. I don't care how complete your DTD is, it still can't infer that the following paragraph is centered with 12 point bold font, because there is no manner to do so.
So my point continues to stand - the proposed format has no manner to save text formatting, people want text formatting, people won't save with the document format because it doesn't save their text formatting.
Its like selling two car models:
One model can be repaired with parts from any other auto maker, the specifications are open, and the entire design is on the front seat when you buy it.
The other car isn't as open, but it does feature seatbelts, windows, a radio and comes with 4 tires. Which one do you think people will buy?
Do you have Linux and a DotPal? Click here now!
And, of course, sometimes your content IS the presentation. This sometimes slips by the SGML people, but one of the reasons people spend time doing formatting at all is because they want something to look a specific way. You wouldn't want the mona lisa to be reduced to/> </painting>
It's called "embrace and extend" and in case you have heard, it's the basic microsoft strategy. Latest victims have been DNS and Kerberos. :-)
Welcome to real world, XML-fans. How did it feel to have your standard b*ttfscked?
Proprietary file formats are a thing of the past. They used to be a necessary evil, but now they're just evil. If people send you a document you can't read it is THEIR FAULT for not knowing what formats you require. It's a kind of bigotry, and we all have to show a little backbone when it counts.
--- Nothing clever here: move along now...
I think your theory is mostly complete. But here's an idea...
If we consider that many web browsers are turning into generic XML viewers, and if we also consider the idea of editor-enabling a web browser, then if MS include *all* the document information in one easy to process file/package, then there is a very large risk that people will in the future be able to edit their Word 2003 documents in (say) Mozilla!
So in one swoop, that makes MS Word replacable upsetting MS Office sales and also makes it easier for MS customers to switch platform, also risking Windows sales.
This theory isn't too far fetched - just think about how much you'd need to modify Mozilla to allow it to edit (an un-zipped) OpenOffice document!!
Personally I'm just wetting myself (no not literally.... oh no, shit I have) waiting for Moz/Gecko to allow full editting just for XML document -- anyone who hasn't anticipated this, please please think about the implications, because it does open up a whole bunch of applications (none of which I can think of right now).
Oh, for added thought food, consider XForms vs. MS InfoPath. In my opinion, InfoPath is designed to head off any functionality that might emerge from XForms.
All the best,
P.
However my initial reading indicated that all the formatting information, including *where* it happens, is stripped. Ie if the the word BOLD is in bold, there is nothing left in the text between it and the non-bold text. It does not matter if that thing is an instruction that says "this is important" or an instrcution that says "render this in MicroSoft-Ariel-Bold at 12.389 points and .1 kerning", what matters is that the instruction is there.
By "seperating data from formatting" I mean that a program reading the document should be able to easily seperate it, not that it should be absent. I think this is what everybody *really* wants, and only some ivory-tower users want the presentation actually in a different file.
You didn't even RTFA! It's completely bogus! You and /. can go to hell with this stupid shit.
/. should seiously close their doors over this enourmous bit of FUD.
I think that news sites and news readers that don't even read their sources should be dismantled.
If we could somehow remove PrettyPicturePoint from MS Office, worldwide worker productivity would triple since everyone would focus on content.
"Prior to this evolution, the only way to effectively interact and exchange information was to standardize on a specific platform, using specific applications (including exact version synchronization), and specific file formats. Literally everyone had to agree on the same proprietary stack, top to bottom."
This is the same old load of nonsense pro-XML people come out with all the time. The fact is that only the file format had to be agreed in the "dark days" before XML came to our rescue. Now that we have XML, of course, all we have to agree on is, er, the file format. If I don't know what schema you're using then your file is of very limited use to me. So we have to agree, just like before. And, just like before, if someone doesn't want to say what their schema is (by hardwiring the understanding of it into their binary, for example) everyone else is pretty well screwed and having to break out the hex-editors.
What, other than a very inefficient file format that is difficult to read/edit does XML get us? Apart from a single parser library (which is now a dependancy, of course) I can't see any real reason to use it.
Put it this way: if MS rigourously documented the Word file format why would we care about whether it was XML or not?
And don't get me started on XHTML! "Your document does not have a doctype". It's goddamned HTML, that what "doctype" it is...
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
but can we not write a little add-on program to word/excel/powerpoint to allow File->Save As->Openoffice.sxw or File->Export->Openoffice.sxw???
I am not a programmer, so I don't know how feasible that is. I know I would download and install something like that.
even if you were able to rid the world of the "suckers" you mention above, there are still enough of us informed, educated, intelligent users who simply prefer Microsoft Office to its competitors to outnumber the users of OpenOffice, etc. Then you can add the informed, educated, intelligent users who simply don't care either way, who will normally go with the easiest, which is probably MS Office.
The truth doesn't care what I think.
people who are users like me just want a fscking file that i can open with Word, with OOo, with iWrite.. whatever... and then send it to other people. If it requires the use of pixie dust or ass cream - so long as it works, that's all anyone wants.
.doc format. Like i can even buy fuscking Office 97....
.pdf) and editable (.doc only) we're stuck realistically with .doc... as bad as it is.
Relgious zeal with XML content being separated doesn't MEAN SHIT to users. And it doesn't get me anywhere when the fact remains that when i send in my busines proposals to the government, they want it in Word-97
wankers. However you want to make an open format - be our (the Joe Salesdepartment) guest... until there is something which is universal (.doc and
guns kill people like spoons make Rosie O'Donnell fat.
I can bet money that whoever submitted this did not even bother to look at the file formats. Office 2003 uses all internet standards. XML, UDDI, XSD, and all that. They are not encrypted, malformed or have anything done to it that would prevent a developer on another platform to use the fileformats.
Enough said.
But that's not what he was saying. Try to avoid knee-jerk reactions. What he (and most everyone else) is saying is that presentation information should be available somehow. No one's asking it to necessarily be embedded in content, just be available, either on different file, or be defined as static specification (like HTML default rendering suggestions or de facto 'standards').
Without presentation information there just is no way to know how to adequately render content for human usage. And MS certainly stores and uses such information, but apparently they just don't want to share that... since that would make the somewhat open format much much more useful for competitors.
MS should just look at how OpenOffice folks did their format. It's not perfect (far from it), but it adequately both stores presentation information and separates actual structured content.
I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
...where do I get the linux and solaris version?
All those people complaining about MS just didn't know about this little gem.
-pyrrho
When I read your sig, or the version that changes it to "slashdot" (don't know if that's you or not), I want to laugh, I get it... but everytime my compulsive gaurdian geek pops into view on my shoulder and says, "A million is no where near infinity... compared to infinity, a million is no more than 1."
but anyway, more on topic... how was the Grappa?
-pyrrho
it's not limited to allowing me to dispaly a word document of a report in open office and have it look exactly the same. I want to be able to import the xml file and have my analysis software know that a particular record is an non-current asset etc.
who cares what font was used. Interoperability should not depend on font size or colour.
It's precicesly the Microsoft specific bits of a file that should be stripped out. If a display property is only available on a ms platform then the xml file should not contain them.
The big if is wheter ms takes out more and leaves the xml file unusable because there is insufficient description.
What you will find is that industries and user groups will begin to define xml schema for their data. WP will be different but xml will still have a place.
you are supposed to separate the formatting/style... NOT DISCARD IT!
"Ok, I took the dirty diaper off the baby."
[looks around] "um, where's the baby"
"oh, what, I was supposed to keep the baby?"
-pyrrho
gah, i was too busy comparing font sizes that i didn't even notice the numbers not being there, heh....you're right
incidentally, opera is all that renders This page that i go to constantly correctly....mozilla/konq can either do the correct stylesheet (depending on how they identify themselves) or the correct layout...not both...
Here's a link to comments i posted about this on another discussion...
//FIXME: Bad
Mmm, Grappa. My fav...
I'd rather you do it wrong, than for me to have to do it at all.
I think its about time someone points out that this article, or whoever those "testers" are, are full of sh*t, or have serious problems using a computer.
.DOC document. I saved a 20-page research with all kinds of pictures (stretched/cropped etc) and using bullets, italics, bold text, different sizes and fonts. I re-opened the document with Word, and it looked just like its .DOC counterpart.
Saving in XML format keeps 99% of all the formatting in a
*FURTHERMORE*, Microsoft has even added an option called "Data Only", which will save only the Data itself in the XML file (-as the format was MADE FOR-). You can then choose to append an XSL file for the format.
MS pleases both sides, both the strict-XML-Data-Only group, as well as the maximum-openness group, and yet over 550 post are complaing about an article with no substance. I don't love MS, but don't bash them for something they've done right.
The XML saving feature in Word is flawless and semes to be standard-compliant. Any XML reader should be able to display the document properly, under any OS.
One of the main points of this version of office is graphical tools to help you define document structure.
... the content and presentation of the XML doc are being separated. Isn't that an inherit feature of XML?
Organizations and individuals should be able to markup the parts of their document that apply special formatting, yet the actual rules of the formatting should be separate from the document itself (the actual content). Once a standardized way of noting different "categories" of marked up content is reached and then a standardized method of actually specifying what the author intends with that (e.g. italicizing and bolding with a color of red) then end users can more easily setup their own rules as to how to present these parts of the document based on their own personal organizational ruleset. (e.g. I might need to ignore certain italicized parts if in a certain body segment or perhaps if they are also specified as being a definition I can hyperlink to a dictionary... something that simply putting <i>RTFM Definition</i> will not allow except by using a complex or restrictive and hard coded method that does away with the reasons for using XML in the first place.
Ironic as it is, I applaud MS for going this way simply becuase it may offer the CHANCE of standardization compliance... if people will get over "my team vs. your team" groupthink.
It's bad because you can't pass an Excel document to another program for modification or update, and open it back up in Excel without loss of formatting (which was presumably important to you if you put it there).
And if Microsoft makes it possible to exchange "native" documents among .Net apps with full fidelity, then the XML standard "support" is just a fig leaf for vendor lock-in, analogous to overtly supporting Java while blending in proprietary extensions.
Doing Excel around "pure" XML plus something like CSS for formatting is possible, but only if you're willing to start from scratch and accept some compromises (in terms of fidelity of the conversion) that most Excel users would reject. If the world moves to open standards based document exchange, Excel will have to go the way of VisiCalc and 1-2-3. Don't count on Microsoft to volunteer.
please.
You ever create a OpenOffice or Star Office document and then use WinZip, Pkzip or UNIX unzip to unzip it? You will get exactly what you mentioned.
Azghoul (25786)
DaveAtFraud (460127)
Yeah, he's real new around here. Please excuse his inexperience.
This whole mess comes from how MS office products generate too much formatting (e.g., html in office)
Ever try to edit office document saved as html?
150k of crap html for a simple one line 'hello world' text document.
Registration Bla bla
Help fight continental drift.
Troll? /. orthodoxy?
Failure to be sufficiently anti-Microsoft is a violation of the
-1 Offtopic
I would have thought it alot more going by my email accounts. I must be getting spam meant for someone else....
If anyone out there is getting less than 40% please email I'll forward you some of my surplus.
How many of the cool wizbang features that you describe (you know, the ones that give the users N options) will make it pass the bean counters? *That* is the real question.
Too bored to login. -LM
read subject
you can read, right?
I had to buy a computer to use Linux. I guess that means it isn't free either?
HOLY SHIT. That appears to be a totally STANDARDS COMPLIANT XML DOCUMENT. It even opened in Mozilla without complaint.
:) LS - Like Searching, Long Scroll, Lop Sided, nope must be "List Shit". Don't even get me started on grep ;)
I bet I could create an XSL file that transformed that bad boy to look like just about anything I wanted to. Even, dare I say it, a Word document!
Hmmm I wonder if w:p is paragraph and w:r is for row and w:t is for text, w:b means bold, w:u probably signifies the next text is underlined w:val="Single" probably means a single underline, w:i probably means the next line is italics. Hey what if w:sz had something to do with size. I could be well on my way to violating the DMCA.
Now if I could just decipher things the same way when I'm looking at UNIX commands. I would be a guru.
(yes I know g/re/p from ed)
"Do not be swept up in the momentum of mediocrity." - anon
This is not true. The only thing a browser could do with XML is display the XML, in IE it would auto-format it using CSS into a nice 'tree' view of the XML. No idea what the Mozilla's do.
In order to render the _DATA_ in the XML document as you would see it in Word, would take a custom XSL transform (WordML to XHTML/CSS2). You could reference this XSL file at the top of the XML document, and IE again would execute the transform for you (don't know about mozilla).
However I'm not sure if you could even do it with full CSS2. I'm sure there are still things browser can't do that common word editors take for granted (Header/Footers come to mind, though I could think of some ways to hack aorund with with CSS2 and #Media directives). I guess Drop cases are in CSS2, I wonder about other things we simply take for granted when we format in a WYSIWYG document editor. Either way, rendering it _EXACTLY_ as you see it in Word would take rather serious xsl transform and excellent understand of CSS.
-malakai
-Malakai
A Dragon Lives in my Garage
You seem to be "new around here" too.
1) "New around here" is the excuse used when someone complains about about "normal" slashdot behavior: flames, trolls, condescending responses, knee-jerk rhetoric, etc. All of the things that make this place "interesting". It has NOTHING to do with how long someone has been a member. (Thus, my comment about you also being "new around here.")
2) If someone asks a valid, non-trivial question (i.e., one that can't be easily answered with a minute or two of research on Google), I'll generally respond with an answer if I get a chance. However, when someone voices a technical position that is wrong, they should be prepared for flamage. My rules. If you don't like them, set your preferences accordingly.
My suggestion is lighten up, have fun and don't take yourself or anything you read here too seriously. I don't.
They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
Ben
Unfortunately, Manny Manager and Sarah Secretary are now very used to depending on the formatting and presentation information.
My example may not be exactly what you mean, but it does demonstrate how many MSWord users approach documents. Just today I had to add some information to a document to update it for a new project. It was in MSWord. No styles were used. None at all. Every paragraph was individually formatted. From the title to the signature, to even table content, every paragraph had the style "body". Bullet lists were created by individually indenting paragraphs, changing their spacings, and inserting wingding bullets.
This made it extremely painful to add in the new stuff, especially since I don't know MSWord very well. I pulled up FrameMaker, opened the document as plain text, and applied a few styles. Twenty minutes later I was done, and with a document that looked twenty times more professional.
Now take that FrameMaker document and export it as XML. What do you get? Every bit of your original information. It's fully structured. It is fully usable given a DTD. Now export the MSWord document to XML. What would you get? Garbage that doesn't mean much. Even if you managed to export all the formatting information, it will only make sense after importing it back into MSWord.
A Government Is a Body of People, Usually Notably Ungoverned
Uh, no, it' s not. Hence the creation of XHTML.
Just did it... That is TOO cool. :)
FunkyChild (99051)
DaveAtFraud (460127)
Yeah, he's real new around here. Please excuse his inexperience.
And now -- horrors! -- MSFT listens and gives folks an option to generate XML as data, even according to one's own schema! And the response? The thanks?
I wish the zealots would make up their mind. A consistent position is fine, but a position based on the platform of "anything you do is wrong" just shows where folks are really coming from. Why even read an article when you can issue knee-jerk complaints and criticisms to impress your similarly-minded friends and co-zealots?
"Use the force, fool!" -- what Mr. T would say if he was a Jedi knight.
It's amusing to see how credulous the /. crowd are. God forbid that anyone load up Xerces and try playing with a WordML file to determine that the guy from OpenOffice was lying.
-- Respect for the word - to employ it with scrupulous care and an incorruptible heartfelt love of truth - is essential
hint? I think not...
starts with <xml> it is xml
starts with <html> it is html. Otherwise it is poorly formed. As for XHTML being actually a xml language then it should start with XML and we are back to square one.
I just wish people would quit hacking together crap and throwing out to everyone to adopt with pretty sounding words.
The Declaration might express this aspiration explicitly, however it is left to the Constitution, in enshrining the method of government by which the Declaration's aspirations might be realised, which implicitly "gives people the right to pursue happiness," as a right per se. Ben Franklin, or whoever actually uttered these words, wasn't making any mistake.
I wouldn't be as concerned about this if for example he had just come out and said, "Microsoft is full of a bunch of stupidheads out to rape small children and kick every dog they see" as at least that spells out clearly what his thought process is concerning MS.
What zealots consistently fail to understand is that foolishness like this only HURTS the underlying cause. The underlying cause here is to support the creation and adoption of an open standard for document information interchange. This does of course include being proactive about crushing action that is of the "embrace and extenguish" variety that MS is of course infamous for. However, MS itself should NOT be the target of people's ire especially not in any sort of pre-emptive attack. The saddest thing of all is that what MS did here was actually a great step in the right direction. Now idiots like Edwards come along and bitch about it. Admittingly I believe that Edwards did not bother really looking at the beta program except perhaps looking over the shoulder of someone who fired up a new document, then added some text and then finally saved as XML and furthermore failed to notice the little checkbox for saving as "data xml" only.
What do you honestly expect MS and the rest of those interested in this issue to do in response to such foolishness? Additionally, how can you expect people to be sold on the idea that:
- Open standards are the best way to go
- Oasis and OpenOffice.org are interested in the welfare of all who want increased productivity and efficiency and NOT in their own emotional, hateful agenda
So basically, there are two issues here... Edwards ability to rant and rave AS AN OFFICIAL representative of OOo and OASIS on something he knows little about technically (in this case the MS Office beta and XML functionality) and second that he perhaps does not understand what XML is supposed to be about.Looks like we have a buzz compliant fool here, as he is planning on using XML as a bloated text format instead of as a standard markup definition language that logically makes use of a wider (because it is standard, and generalized) selection of tools for it. XML is great for many things, but please stop trying to stick it where it is not needed. Furthermore, don't ironically (and hypocritically) bastardize what it is for all the while bitching about the very same thing from proprietary vendors.
XML is for DATA.... DATA not presentation or format. Using it otherwise will simply create another bloated presentation format that prevents more than it permits the users and organizations from doing what they need to do. Why not just put a big sign on the applications that use it that says, "Thanks to idiots that don't understand technology you will be severely restricted in how you can use this technology. Enjoy it if you can."
Edwards has no business being any sort of "advocate" of anything standards or XML related until he can first clue in on what XML is for, then on what we all need, then on why XML can provide that... and most importantly that he LEARN to do some research on the actual thing he is bitching about.
Several students were asked to prove that all odd integers are prime.
The first student to try to do this was a math student. "Hmmm...
Well, 1 is prime, 3 is prime, 5 is prime, and by induction, we have that all
the odd integers are prime."
The second student to try was a man of physics who commented, "I'm not
sure of the validity of your proof, but I think I'll try to prove it by
experiment." He continues, "Well, 1 is prime, 3 is prime, 5 is prime, 7 is
prime, 9 is... uh, 9 is... uh, 9 is an experimental error, 11 is prime, 13
is prime... Well, it seems that you're right."
The third student to try it was the engineering student, who responded,
"Well, to be honest, actually, I'm not sure of your answer either. Let's
see... 1 is prime, 3 is prime, 5 is prime, 7 is prime, 9 is... uh, 9 is...
well, if you approximate, 9 is prime, 11 is prime, 13 is prime... Well, it
does seem right."
Not to be outdone, the computer science student comes along and says
"Well, you two sort've got the right idea, but you'll end up taking too long!
I've just whipped up a program to REALLY go and prove it." He goes over to
his terminal and runs his program. Reading the output on the screen he says,
"1 is prime, 1 is prime, 1 is prime, 1 is prime..."
- this post brought to you by the Automated Last Post Generator...