Vendor Neutral File Formats?
timmyv asks: "I have recently been tasked with developing a corporate wide policy that will standardize all employee created documents on vendor neutral file formats. OASIS is good in theory, but I haven't been able to locate enough concrete examples of policies or implementation schemes that work at a corporate level. Does anyone work at a company where documents can only be saved as RTF, HTML, etc. or have any experience with this type of problem?"
Isn't vendor neutral.
and we, unfortunately, use _all_ the formats known to the world.
I've already tried to encourage the adoption of hassle-free formats (rtf, html, TXT, whatever).. they don't pass.
It seems that people simply can't get it.
Unfortunately.
If anyone can hear me, slap some sense into me But you turn your head, and I end up talking to myself
OpenOffice file format is a good start. The format is open standard. As governments around the world embrace it companies will ultimate flock to the format.
-----
One is born into aristocracy, but mediocrity can only be achieved through hard work.
Any postmodernist worth his or her salt would tell you there's no such thing as a vendor-neutral file format.
"I have recently been tasked with developing a corporate wide policy that will standardize all employee created documents on vendor neutral file formats."
Sorry, but looking at that statement, it seems to me that you are asking the wrong questions. Rather than getting concerned about formats and standards organizations, you should realize that to replace certain formats you will need to improve on open source projects without funding for the development of them. If they say "no" to this, then congratulations, you don't actually have to do this research. Nothing's quite as useless as an unfunded mandate.
Sadly, I'm not sure if this post is meant to be funny.
There could be a huge number of different files you need. CAD files, images, Powerpoint presentations, complex spreadsheets will all mess up any format you can come up with (eg HTML). How would you even edit some of these things?
Even OpenOffice formats are not vendor neutral, you have only one product out there that really uses it.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
It might sound like Adobe lock-in,
but with PDF Printers (files are printed to pdf's) for Linux and Windows (I asume Mac has it built in), it's a good option for creating documents that'll be displayed everywhere in the same manner.
XML maybe????
What you need is a toolchain that allows conversion back and forth between several different types. For example, I could write a short paper in XML, SGML, or LaTeX, and convert any of the three to PDF. I could convert the XML or SGML versions to LaTeX, then use latex2html to turn it into an HTML document. I don't know of converters that turn XML,SGML->HTML, but they probably exist.
The point is that it doesn't matter which method I used to create the document; I can convert any of them into either of the other formats without losing information, and any of the three can be turned into HTML or PDF for display purposes.
You've probably got several different types of documents to mess with. Technical papers with plots, accounting spreadsheets, secretary generated memos, and presentations with pretty pictures so that management can understand what's going on. LaTeX alone could handle all of these situations. Create document types and environments to match the needs of each type of document. XML, being completely generic, could also handle any of the situations, but it's easier to type LaTeX markup than it is XML. There is at least one caveat: you have to be careful what type of images you feed TeX.
Heck, you could use Perl bindings to MS-Excel to snag data out of spreadsheets and export it into a format that some other chart making tool uses. You could use Excel itself to export as CSV files, which you could then use awk to convert into some other format.
Basically, it doesn't matter what tool each person uses, as long as what they export off their own workstation is in a standard format.
The idea of switching applications for people can be a task no one wants to undertake for many two reasons.
Comfort level:
It's like having designers switch from Photoshop to The GIMP, or MS Word to OO Writer. Granted, the apps accomplish the same thing, but it's not the *same* program. People will resist the change because they know how to use the first program, and the reason for the change isn't a concern for them.
Dominance:
Going vendor neutral when the major still use vendor specific requires you to see if your users use vendor specific features that are not available in the neutral. If those features aren't there, then what do you do? Write code to compenstate for the feature, or get plugins, or do nothing if there's nothing you can do. Are there tools that can do as good a job as the old tools, to work in this neutral envirnoment?
It would help more if you stated your case in more detail.
That seems like kind of an unclear idea. How many vendors do you have, and do they all use the same software in the same fashion?
Unless you have pretty carefully surveyed all of those people you really can't choose one file format over another.
In other words, you're asking the wrong question. Instead of trying to figure out what your employees can standardize on, you will first need to find out what what the majority of your vendors have standardized on.
Of course you'll have problems. HTML or PDF are horrible if you're circulating documents that need to be edited or excerpted. And vendors and suppliers will still send you documents in whatever their house file format is.
Really, for this to be effective you need to involve your employees, management, vendors, and probably suppliers in order to get everyone working within the same set of file formats.
Three Squirrels
Avoiding vendor lockin is of course A Good Thing. However, as others have said, there is no format completely vendor neutral - each platform has it's own set of unique features that don't translate directly and must be stored somewhere in an extension or custom tag. I'm certain the OASIS/OOo format has a few StarOfficeisms in it.
.doc would be unsuitable since the format is undocumented and you would be reliant on the correct version of office to correctly and completely read/export it, hence you would depend on Microsoft.
.sxw would have been unsuitable (even though it was 'just zipped xml'), since OOo/StarOffice were the only way of performing any completely trustworthy export. Now the format is formally documented and independant tools exist it is suitable.
What matters is that the data you own is readly transformable into a Fully Open and documented format independant of your chosen platform, normally (but not necessarily) this will mean your native format is Fully Open and documented. This includes all data, styling, formatting, metadata and interrelationships. Bascially you should be able to quickly jump ship, even if your vendor has been wiped of the earth or there are legal/technical issues preventing you from running the original platform, without loss or 'damage' of any information. There must be at least one other clear route to all your information, completely bypassing the original platform.
As an example
Similarly prior to it's released as open source software and even immediately after
There are grey areas such as databases, which have no common datafile format but do expose Fully Open interfaces such as ODBC or JDBC.
With this in mind I would argue that forcing everyone to save documents in 'basic' formats such as HTML and RTF is counterproductive, they lack wide support for features such styling and precise page layout. Any format will do as long as you can readily, fully & demonstratably extract all your information, independantly of the platform that created it.
Alex
XCircuit, a circuit layout app for X, uses postscript as its default format. If you have XCircuit, you can load the postscript file into it and edit it like any other circuit. If not, you can still print it or view it as you would any other postscript file.
XML is a good start, because it's easy for a new app (the fictional YCircuit) to add support for the format, but you are still stuck unable to print it if you don't have the skills to write a conversion script and no one else has written it for you.
Why not combine the two? XML embedded in a standard PDF file would allow any application with support for the creator's XML tagset to import the file, and at the very least those without any similar application could view and print the file.
You can't judge a book by the way it wears its hair.
HTML is only vendor neutral if you don't use any vendor-specific extensions. So you can't just say, "Everybody save your files as HTML". You also have to forbid anybody using apps (such as Word) that save to a non-standard HTML.
In theory, you can create an XML-based format that looks the same in Word, OpenOffice, FrameMaker, and any other XML-aware app. But doing so means designing a schema in extreme nit-picking detail, and writing a lot of transformations to get that XML in and out of all the apps that need to read or write it. It's a lot of work, and nobody does it unless they have a specific application that requires highly-structured information. Like if you have a huge set of technical documentation that you need to update a lot. (I was involved in just such a project -- and the politics of converting all those documents to XML cost me my job.) Or if you have invoices or similar business documents that need to go into or out of a web services app.
But for the big mass of unstructured documents, there just isn't a vendor-neutral solution, and nobody has any real incentive to create one. The solution remains the same: standardize on certain specific applications. Which boils down to using OpenOffice if you hate giving money to Bill and/or want a platform-neutral solution. Otherwise you standardize on Microsoft Office, because it's what everybody knows how to use.
Store everything in giant PNGs.
OpenOffice.org format may not be vendor neutral particularly (though like others said, KOffice at least uses it) but it is an open and prevalent format. MS .doc is prevalent but as it's not open then it's not necessarily going to have filters available for it in the future. I think OOo is safer in this respect. Also OOo format is (compressed) xml so can probably be parsed by xml readers (? - I haven't got a clue, really!!).
Are they doing this to save money? to clamp down on the uppity workers? because the CEO got emailed an AppleWorks attachment with no file extension from some Mac user? to avoid the risks of single vendor lock-in?
Many documents formats can be converted back-and-forth with some degree of effectiveness. Yes, if you open a document from WordPerfect in Microsoft Office, the word spacing may change a little. However, this happens if you move from a machine connected with a HP4000 printer to a HP2100 printer as well. However, some formats give different feature capabilities; saving from DOC to RTF will cause (as an example) tables to shift about a bit. TXT format is readable by most anything, but the formatting capabilites are nigh nonexistant. (Ooh! Tabs!) While WordPerfect and Word will each open the others documents, they aren't so good for saving in open formats
What formats are currently used? Why are they needed? Will everyone need to be able to write to them, or are pay-writer/free-reader combos acceptable? And, *ARE* there any "vendor neutral" formats out there? (For desktop publishing, the real answer is "no". Publisher is a joke, and while Adobe and Quark maintain some import compatibilties, the formats AREN'T neutral.)
For myself, working in a small department, "Let a thousand flowers bloom" is just fine. I accept that I will occaisionally get forwarded an e-mail with an attachement that the user can't figure out how to open-- usually Mac/PC file extension name issues solved easily by renaming. Once in a blue moon I have to explain to someone that no, not everyone has FooBarBaz market research organizer, since for most the $800 license cost for it would be more beneficially used for other things, and they will probably need to examine such data files once in their career, if that.
Perhaps a list of universally accepted formats-- that is, formats that must be used for wide distribution-- would be more appropriate, after considering what features are needed in said formats. After all, Photoshop .PSD documents are harder to view outside Photoshop, but far more useful for subtle graphics work than JPEGs.
I suspect you are being sent out on a project inadequately considered. Depending on the pointy-hairyness of the person who assigned it to you, you may find some substantial benefit to reconsidering the ground assumptions.
//Information does not want to be free; it wants to breed.
...when i get locked PDFs. Just take a screenshot of the document. Easy.
Hmm, I'd say LaTeX would be a good alternative? There are interpreters for most platforms, the source files are plain text, and it can output a variety of readable formats (pdf,ps,html etc).
I'd recommend you find a way to get out of the assignment. You will not find what you seek as it is one of the holy grails of computing that should exist but does not and does not for good reason (money).
-- $G
There could be a huge number of different files you need. CAD files, images, ...
Before starting, try to determine what the true question is. Were you asked to choose something that is truly vendor neutral, or were you asked to choose corporate standards that will interoperate with your customers and suppliers? The first question is *very* difficult to answer; the second one is easily solved (albeit in a non-Slashdot friendly manner).
I will assume the latter question is the true question, and continue my posting based upon that assumption.
For each major document type, determine who needs to be able to read and edit those documents. This question must be answered for your employees as well as your customers and suppliers. Then, choose a file type that is widely used in that community; which may mean standardising on an older version of a particular application.
For example, in the case of word processed documents, MS Word 97 is a very safe, very widely readable (by other applications) format. Newer versions of MS Word can be configured to only create Word97 files, and many other non-MS applications are able to open and edit Word97 files. So, although Word97 format isn't vendor neutral, it is widely interoperable and makes a good corporate standard.
MS-office2003 is XML format but that does not mean it is open.
l ated+pa tents/2100-1013_3-5146581.html
It is restricted by patents, see..
http://news.com.com/Microsoft+seeks+XML-re
Yep. Just use unzip and you'll get several XML files, among them: content.xml is the document itself, meta.xml is the property sheet info, styles.xml is the stylesheet(s) in use when the document was saved.
After that, you can your favorite XML widget, such as the XML::Parser Perl module, to turn it into HTML or other things of your choosing.
Or create an XSLT file and use something like Xalan to
format it on the fly.
Gotta love OOo and those open formats!
That is all.
XML embedded in a standard PDF file would allow any application with support for the creator's XML tagset to import the file, and at the very least those without any similar application could view and print the file.
For a more pure XML solution, it'd be better to embed domain-specific XML data in an SVG document, which Adobe's SVG viewer can display and print. In fact, it might even be possible to XSLT the XML into SVG.
I mean, if an unsuccessful platform is your best example of non-Microsoft development of RTF-based software
Unsuccessful my ass; learn why.
True, but given an RTF using visual formatting, how can a program know in advance which font size was meant to be "heading level 1", which was meant to be "heading level 2", whether italics represent emphasis or the title of a work, etc?
Well, for CAD, its a screwed up world. The best/most portable format is probably IGES, except its such a huge specification that nobody's IGES file is compatible with anybody else's. I'm an engineer and for myself I use Turbocad 10 professional at home. It reads/writes AutoCAD files and numerous other formats, and is somewhere in between AutoCAD and Pro/Engineer in terms of its capabilities. You'll have a tough time convincing any corporation to use TurboCAD though.
For text documents, HTML would be good, except MS products tend to produce the most screwed up HTML files I've ever seen. All I can recommend is to use PDF files for important and official documents because they are essentially immutable and tend to produce consistent hardcopies from any computer.
OpenOffice formats are nice, and if I were starting up a new business I would of course set up Linux workstations to use OO exclusively, and put a Windows machine down in the IT room so the IT staff could convert any troublesome documents that come through the email.
For Visio, there is no equivalent, other than exporting the visio file as a DXF or maybe a WMF. Windows MetaFiles never seem to load right in other apps though so thats something to think about. SVG files will probably be the future here if Dia starts using them.
Clickety Click
Like HTML, which surprised people in the 1990's, the OASIS OpenOffice.org file format is indeed vendor independent, though, it is now called Open Document. Anyone can use it or develop tools for it without restriction. Even Microsoft is part of the team at OASIS, at least on paper. And, even if MS doesn't get out of the way, interesting things will happen with Open Doument.
So far OASIS Open Document being used by at least the following:
- StarOffice
- OpenOffice.org
- AbiWord
- kWord
Unlike MS-WordML, which is encumbered by patents, trade secrets, and difficult licensing issues, OpenDocument is free to use. It also meets the requirements specified in European Interoperability Framework for Pan-European eGovernment Services. It's getting increasing attention: Note that the only industry actor not currently involved in the OASIS Open Document Format has been and still is MS. MS is still trying to shoehorn old MS-Office 97 customers into DRM'd MS-Office 2003, which functions in effect like a roach motel for your data. So far the worst insult that Balmer and Gates can cough up is that OpenOffice.org (OOo) is like MS-Office 97. However, I think even those two can see that OOo meets this groups functional requirements quite well, and is free and multiplatform. OOo is also available in more languages than MS-Office, handles long documents better, and does better with styles and stylesheets.Currently, there are many governments moving up to StarOffice or OpenOffice.org for the sake of these formats. Singapore comes to mind first, but there are many, many others that don't necessarily make the mainstream press like Sarpsborg. Likewise, there are many small, medium and large businesses moving along. Some with an axe to grind (with good reason ) speak up. However, most are silent until the move is being implemented to keep the goon squad from Redmond from getting in the way.
The current choice:
- OASIS Open Document --
- be able to access your own data indefinitely as XML
- and change productivity tools, operating systems and hardware only if and when it suites you
- MS-WordML --
- pay that Redmond tithe indefinitely
- and buy new productivity tools, operating systems and hardware when Chairman Bill tells you to
Easy choice. You don't need to be a wizard to see which direction things are going to head.Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Where is he mentioning that the applications have to be Open Source ones?
For all applications there are formats that are industry standards and unencumbered by patents (as far as it is possible to ensure this in certain litigious countries).
The knee jerk reaction "boooh! Open Source software is not ready" should be only used when actually Open Source is a necessary part of a requested solution.
IANAL but write like a drunk one.
The article poster is explicitly stating they want to move to vendor neutral applications.
In such a situation why would they need to do such conversions?
IANAL but write like a drunk one.
> RTF does contain, in theory, sufficient control words to describe
> everything that Word 2000 can do, but it's hardly a direct translation and
> things get lost a lot.
What gets lost?
Examples please.
There are no "StarOfficeisms" in the OASIS XML Open Document file format specification. Least ways not any we know of. By December of 2004, when the OASIS TC submitted the XML file format specification to ISO, all known references and anachronisms that might be called starisms were changed. Neutralizing changes were even made to such things as the file format extensions and mime type registrations. We even changed the name from OASIS Open Office to OASIS/ISO Open Document.
Separating the file format from any particular application or applications suite is a big deal. Especially if there is a rising demand from enterprise level end users for an applications independent universal structured file format solution. tty. Separating the file format from any particular application or applications suite is a big deal. Especially if there is a rising demand from enterprise level end users for an applications independent universal structured file format solution.
So the OASIS/ISO TC chose to keep that most powerful of technology terms, the word "Open", but lose the direct reference and/or suggestion to OpenOffice.org.
The second reason for changing the name to "OASIS Open Document" is far more interesting, and directly relates to the European Union "TAC/IDA" task force recommendations based on the infamous Valoris Report. You will recall that by September of 2004, the EU had evaluated responses from both Sun and Microsoft regarding the Valoris recommendation that all EU information system purchases be required to support an open standards based XML file format specification.
Microsoft's open XML proposal was determined by the EU to be "not open enough". This criticism was in the original Valoris Report, and not altered by subsequent Microsoft arguments. After much squealing, squawking, finger pointing, complaining and outrageous misrepresentation, in mid November of 2004 Microsoft finally conceded and agreed to meet EU requirements. More about this in a moment, but for now the important thing to note is that the EU held firm. A remarkable feat even though there is currently a range of cross platform alternative solutions that meet EU requirements, including the open and free OpenOffice.org, Sun's StarOffice, IBM's WorkPlace, and Novell's Open Office. And if Microsoft had not sold their share in Corel to a vulture investor outfit for pennies on the dollar, an investor who then proceeded to cut XML out of Corel, WordPerfect Office would also be OASIS/ISO XML compliant.
Meanwhile, the EU was also not entirely satisfied with the OASIS XML specification as explained in Sun's response to the EU requirements recommendation. Three things in particular concerned the EU.
First, that OASIS submit the file format specification to ISO. In September of 2004, OASIS management and the OASIS TC came to agreement with ISO that the file format specification would be submitted to ISO before years end, but maintenance and improvement would remain with the current OASIS TC. Hence the combo moniker "OASIS/ISO".
Second, there was a great deal of concern about "custom-defined schemas". Sometimes this issue is also referred to as "user-defined schemas". Others just call it a "forms" or "template" issue. Basically it refers to an applications ability to load (or consume) an externally defined schema template that might include specific user interfaces (forms), business - workgroup logic (routing), meta data interfaces, and other things related to the emerging world of collaborative computing.
Microsoft of course champions the auxiliary Office productivity application, InfoPath. However, in September of 2004, the OASIS TC finished work on extending the specification to include XForms, SVG, and SMiL. Current OOo -v.2 builds fully demonstrate the powerful capabilities of these extensions, including the binding of web services and data to graphical objects and forms/template widgets. Move over InfoPath. Hello OASIS UBL!
The third issue involves EU concerns fo
How many NextStep applications have migrated to OS X?
Depends on whether the developer is still around. Mac OS X implements the Mac OS Toolbox API as "Carbon" and the OpenStep API as "Cocoa". If the developer still has the source code and wants to reach thousands of Mac users, porting starts with a recompile. But if your developer has gone out of business, on the other hand...
Permanence of public data.
.pdf deployed on modern tapes archives will be meaningfully usable in 30 years. If by "permanent" you mean 10 years or less than no problem. If you mean 100 then in addition to all the other suggestions below, I'm going to say a Microfiche printer should be part of the solution. 100 years from now people may not have a clue what Microsoft Word was and thus no idea what to do with a ".doc file" on a DVD or whatever but they will know how to use a magnifying glass and a light source just fine.
.jpg, .pdf, .doc... or they "print" to this printer which captures 400 ppm very cheaply (server + printer + setup for a little over $10k). It may sound really really old fashioned but I think it is worth considering. Think about how you would get digital data from the systems you were using in 1975....
I guess how permanent is permanent? Its very hard to store data electronically long term and have it be accessible years later. How many computer techs today could even deal with a 9 track data tape (a state of the art archival format 20 years ago)? While PCs can handle Bus and Tag data streams the adapter card is $3k per. No one 30 years ago would have conceived of having individual users not connected in any meaningful way to operations center.
I've done a lot of work taking data in "will be good forever" formats like code 1 and moving them to formats that are actually usable by non mainframes. I see no reason to believe that
With one of these printers your users either export,