Vendor Neutral File Formats?
timmyv asks: "I have recently been tasked with developing a corporate wide policy that will standardize all employee created documents on vendor neutral file formats. OASIS is good in theory, but I haven't been able to locate enough concrete examples of policies or implementation schemes that work at a corporate level. Does anyone work at a company where documents can only be saved as RTF, HTML, etc. or have any experience with this type of problem?"
and we, unfortunately, use _all_ the formats known to the world.
I've already tried to encourage the adoption of hassle-free formats (rtf, html, TXT, whatever).. they don't pass.
It seems that people simply can't get it.
Unfortunately.
If anyone can hear me, slap some sense into me But you turn your head, and I end up talking to myself
There could be a huge number of different files you need. CAD files, images, Powerpoint presentations, complex spreadsheets will all mess up any format you can come up with (eg HTML). How would you even edit some of these things?
Even OpenOffice formats are not vendor neutral, you have only one product out there that really uses it.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
The idea of switching applications for people can be a task no one wants to undertake for many two reasons.
Comfort level:
It's like having designers switch from Photoshop to The GIMP, or MS Word to OO Writer. Granted, the apps accomplish the same thing, but it's not the *same* program. People will resist the change because they know how to use the first program, and the reason for the change isn't a concern for them.
Dominance:
Going vendor neutral when the major still use vendor specific requires you to see if your users use vendor specific features that are not available in the neutral. If those features aren't there, then what do you do? Write code to compenstate for the feature, or get plugins, or do nothing if there's nothing you can do. Are there tools that can do as good a job as the old tools, to work in this neutral envirnoment?
It would help more if you stated your case in more detail.
XML isn't a format. It's a language for creating formats. Saying "we'll use XML" is like saying "we'll use an SQL database". It's a step, but only a small one. The big decisions remain.
Well, that's not exactly "vendor neutral", since only one vendor supports it. Of course, that one vendor is an open-source project, and the format is well-documented XML. So if you want to break out of the Microsoft orbit, it's the obvious first choice.
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
Avoiding vendor lockin is of course A Good Thing. However, as others have said, there is no format completely vendor neutral - each platform has it's own set of unique features that don't translate directly and must be stored somewhere in an extension or custom tag. I'm certain the OASIS/OOo format has a few StarOfficeisms in it.
.doc would be unsuitable since the format is undocumented and you would be reliant on the correct version of office to correctly and completely read/export it, hence you would depend on Microsoft.
.sxw would have been unsuitable (even though it was 'just zipped xml'), since OOo/StarOffice were the only way of performing any completely trustworthy export. Now the format is formally documented and independant tools exist it is suitable.
What matters is that the data you own is readly transformable into a Fully Open and documented format independant of your chosen platform, normally (but not necessarily) this will mean your native format is Fully Open and documented. This includes all data, styling, formatting, metadata and interrelationships. Bascially you should be able to quickly jump ship, even if your vendor has been wiped of the earth or there are legal/technical issues preventing you from running the original platform, without loss or 'damage' of any information. There must be at least one other clear route to all your information, completely bypassing the original platform.
As an example
Similarly prior to it's released as open source software and even immediately after
There are grey areas such as databases, which have no common datafile format but do expose Fully Open interfaces such as ODBC or JDBC.
With this in mind I would argue that forcing everyone to save documents in 'basic' formats such as HTML and RTF is counterproductive, they lack wide support for features such styling and precise page layout. Any format will do as long as you can readily, fully & demonstratably extract all your information, independantly of the platform that created it.
Alex
Umm... you a moving from a vendor-specific system to in-house expertise-specific system.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
HTML is only vendor neutral if you don't use any vendor-specific extensions. So you can't just say, "Everybody save your files as HTML". You also have to forbid anybody using apps (such as Word) that save to a non-standard HTML.
In theory, you can create an XML-based format that looks the same in Word, OpenOffice, FrameMaker, and any other XML-aware app. But doing so means designing a schema in extreme nit-picking detail, and writing a lot of transformations to get that XML in and out of all the apps that need to read or write it. It's a lot of work, and nobody does it unless they have a specific application that requires highly-structured information. Like if you have a huge set of technical documentation that you need to update a lot. (I was involved in just such a project -- and the politics of converting all those documents to XML cost me my job.) Or if you have invoices or similar business documents that need to go into or out of a web services app.
But for the big mass of unstructured documents, there just isn't a vendor-neutral solution, and nobody has any real incentive to create one. The solution remains the same: standardize on certain specific applications. Which boils down to using OpenOffice if you hate giving money to Bill and/or want a platform-neutral solution. Otherwise you standardize on Microsoft Office, because it's what everybody knows how to use.
Are they doing this to save money? to clamp down on the uppity workers? because the CEO got emailed an AppleWorks attachment with no file extension from some Mac user? to avoid the risks of single vendor lock-in?
Many documents formats can be converted back-and-forth with some degree of effectiveness. Yes, if you open a document from WordPerfect in Microsoft Office, the word spacing may change a little. However, this happens if you move from a machine connected with a HP4000 printer to a HP2100 printer as well. However, some formats give different feature capabilities; saving from DOC to RTF will cause (as an example) tables to shift about a bit. TXT format is readable by most anything, but the formatting capabilites are nigh nonexistant. (Ooh! Tabs!) While WordPerfect and Word will each open the others documents, they aren't so good for saving in open formats
What formats are currently used? Why are they needed? Will everyone need to be able to write to them, or are pay-writer/free-reader combos acceptable? And, *ARE* there any "vendor neutral" formats out there? (For desktop publishing, the real answer is "no". Publisher is a joke, and while Adobe and Quark maintain some import compatibilties, the formats AREN'T neutral.)
For myself, working in a small department, "Let a thousand flowers bloom" is just fine. I accept that I will occaisionally get forwarded an e-mail with an attachement that the user can't figure out how to open-- usually Mac/PC file extension name issues solved easily by renaming. Once in a blue moon I have to explain to someone that no, not everyone has FooBarBaz market research organizer, since for most the $800 license cost for it would be more beneficially used for other things, and they will probably need to examine such data files once in their career, if that.
Perhaps a list of universally accepted formats-- that is, formats that must be used for wide distribution-- would be more appropriate, after considering what features are needed in said formats. After all, Photoshop .PSD documents are harder to view outside Photoshop, but far more useful for subtle graphics work than JPEGs.
I suspect you are being sent out on a project inadequately considered. Depending on the pointy-hairyness of the person who assigned it to you, you may find some substantial benefit to reconsidering the ground assumptions.
//Information does not want to be free; it wants to breed.
Hmm, I'd say LaTeX would be a good alternative? There are interpreters for most platforms, the source files are plain text, and it can output a variety of readable formats (pdf,ps,html etc).
I'd recommend you find a way to get out of the assignment. You will not find what you seek as it is one of the holy grails of computing that should exist but does not and does not for good reason (money).
-- $G
And this isn't a mystery?
No. It's a matter of researching documentation.
Yep. Just use unzip and you'll get several XML files, among them: content.xml is the document itself, meta.xml is the property sheet info, styles.xml is the stylesheet(s) in use when the document was saved.
After that, you can your favorite XML widget, such as the XML::Parser Perl module, to turn it into HTML or other things of your choosing.
Or create an XSLT file and use something like Xalan to
format it on the fly.
Gotta love OOo and those open formats!
That is all.