OpenDocument Alliance to Fight Digital Dark Age
OSS_ilation writes "A consortium of vendors and academic institutions -- including IBM, Sun Microsystems and the American Library Association -- has announced today that they are forming the OpenDocument Alliance as part of an effort to promote open file standards worldwide. The group will support the one truly open standard file format, OpenDocument, which is an XML-based file format used saving and exchanging editable office documents such as text documents, spreadsheets and presentations. Sun's Simon Phipps said he believed ODF would allow future generations to view all of today's digital docs and prevent a digital Dark Age from occurring."
There is more than one "truly open format", so using the word "the" is a bit pretentious.
--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft
But there are already such formats. I.e. latex. Ufortunatelly the only usable wysiwyg editor s LyX which runs oout of the box only on linux.
Open formats are definitely the standard for which to strive.
It appears Microsoft claims an open format, from the (fine) article:
Can anyone clear up exactly what OpenXML is? When I google it, I get vague references leading me to believe OpenXML is more of a container, and not Microsoft's specific document format. So, this sounds like another canard from Microsoft with the claim "open" obfuscating what is probably not.
Any /.'ers have more info about Microsoft's format?
On the other hand, the consortium (if you will) proposing a universal open document standard sounds more open and the proof will be in the implementation. Still, I'd like to know more specifically what that standard proposal is in detail.
The dark age has already happened several times. There are oodles of media formats from the 70's and on that are no longer readable today in the standard computer. Heck, new computers don't even come with floppy drives for 3.5" floppies. I hope they have a strategy to tackle media problems along with file format compatibility, because the medium is the message.
Saskboy's blog is good. 9 out of 10 dentists agree.
Naming your interest group "The [anything] Alliance" gives it that hardcore "We'll form Voltron and smite you if you look at us wrong" street cred.
Slashdot Burying Stories About Slashdot Media Owned
Not being able to read the damn file format isn't the problem. The fact that there is no possible way to store even a tiny fraction of the data being produced for the long term is what will cause a digital dark age.
I mean hell. I've got 1.25 terabytes of online storage at home and probably 250 CDs burned over the last ten years I can't reliabily ensure I'll still have access to in ten years. Half those CDs are probably unreadable now -- from recent experience at least 10% aren't.
If they want to solve the digital dark age problem, they need to figure out how gigabytes or terabytes of PERSONAL information will be saved for future generations, not filtered down government or commercial archives. File formats just aren't that big of a deal. Worst case someone has to reverse engineer it in a hundred years, if you actually HAVE the data in a hundred years.
I really do wish them luck. The thing is the "document" and "content" companies are going to fight like hell to expand proprietary formats as they ultimately look to the MS word format, the sheer number of copies of MSOffice sold, and see the dollar signs available by controlling the format and making everyone dance to their tune. Anyone who remembers the fiasco that occurred when MSOffice 97 wasn't very compatible with the previous version will also remember that companies simply shelled out for converters etc until MS issued a patch. They had no choice.
While packages like open office etc exist, they have for a while and are perceived as "not being ready for prime time" by most businesses. The only advantage many see is the ability to save as PDF (another proprietary format). For ODF to take hold, governments and some very large publishing concerns are going to have to adopt it. Else, not much will change and the march towards increasingly proprietary formats will continue.
The problem is not that there is no long-term storage. The problem is that we produce more useless data than ever before.
Really, who gives a f*ck about your 1.25 TB of crap? Or mine? We're just two ants in the anthill. You really think you can look up any substantial amount of information on someone who lived 200 years ago? Hell, try *50* years ago. Aside from public records like tax information and housing details, and maybe some family photos, you are likly to come up with bubkus, unless that person was famous.
It's going to be no different 200 years from now, and frankly I don't see the problem with that. Only in the past decade has everyone gotten this weird urge to try and archive and record every unimportant detail of their daily lives (see MySpace.com, blogging, etc). What they don't realize is no one really gives a crap today, and they sure as hell won't give a crap in 100 years.
Historians want to know about culture as a whole, not in bite-sized chunks. Aside from the major move-makers (politicians, *some* celebrities), historians won't be any more interested people's musings on shit like Paris Hilton than I am.
...for it to be truly open and future-accessible, it would have to specify everything from the bottom hardware level up - from how bits are encoded on whatever the storage medium is, through to file system layout, and only then starting to talk about the contents of the data itself. From there you then have to decide formats for all types of data (text, formatted text, images, audio, etc.) and specify how they're encoded and what their structure is, and how to interpret their data (e.g. images composed of pixels, which are made up of red green blue and alpha components).
It's really not a trivial thing to do, and simply specifying an Open Document Format is a fairly small part of a bigger situation.
Game dev and music blog
I like the term. Literally the term 'dark' only refers to 'lack of information', but it has great connotations of doom and gloom that make for great PR.
If it weren't for the end-of-civilisation hype, most Y2K bugs would have remained unfixed until early 2001 by buraucratic laxity, not resulting the end of the world, but in a major headache for many companies.
Sometimes you need a catchy image to get people to take notice.
Any /.'ers have more info about Microsoft's format?
Get thee to Groklaw, my curious friend. The debate, along with fine technical details are found there.
On the other hand, the consortium (if you will) proposing a universal open document standard sounds more open and the proof will be in the implementation. Still, I'd like to know more specifically what that standard proposal is in detail.
The implementation is here. It's called "ODF," the "Open Document Format." It is the default file format of the Open Office suite of applications; KOffice is also moving (or *has* moved, I'm too lazy to look) to that format, as well. IBM's office suite will implement ODF.
Again, Groklaw has a lot of information, including pointers to the official specification.
Microsoft is to software what Budweiser is to beer.
As of right now, you won't need to authenticate the music in 200 years, cause the copyright will have FINALLY run out. (Of course, I'm sure that will be extended by then, so I guess I am probably wrong.)
How about the DAAG? (The Dark Ages Alliance Guild)
He who knows best knows how little he knows. - Thomas Jefferson
Three Standards for the iMac-kings under the sky,
Seven for the HURD-lords in their halls of stone,
Nine for Microsoft Men doomed to die,
One for the Big Blue on his sparc throne,
In the Land of Sun where the Shadows lie,
One Standard to rule them all, One Standard to find them,
One Standard to bring them all and in the darkness unite them
In the Land of Sun, where the Shadows lie,
The One Truly Open Standard.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
in 500 years, an explorer finds a pristine cashe of old cdrs. imagine, if you will, that the dye hasn't degraded to the point of unreadability. so the explorer places this wonderful find into an equally old cd rom drive, closes the door and hears the cd spin up, then pow! the disk explodes because old plastic is brittle. how does open format help this?
Enjoy Every Sandwich
DRM and Open Source.
How about you solve the Palestinian question first? It's easier.
Sure, you can argue that they aren't as "rich" as Word, PDF et al, but they're standard and they're open.
For that matter PDF is open too. No this basically a crusade against MS proprietary formats and I'm all for it. I have inherited Word files that already cannot be opened with any product available on the market today. Governments especially need to be encouraged to move all the data that belongs to the public into open file formats and one of the best ways to do that is to proscribe an open standard for government use.
Don't worry about other open formats, there will always be ways to convert them, but this is a good strategic move to stop the use of closed formats. One standard provides a unified front for everyone to collaborate on.
They will use whatever physical media is most appropriate at the time. Really, storage media is not the issue. You can just keep migrating data from one to the next with suitable error correction of course. Being able to interpret the data on it is a big deal.
Having said that, most strategies for dealing with long-term digital preservation also involve reasonably regular (e.g. every decade) migration of file formats to more current ones (except for strategies that involve emulation). Things like ODF are useful, because they are easier to migrate to and from a variety of formats. I don't think anyone in the digital preservation community really expects ODF files to be directly readable in 100 years time - but it is much easier to transform data encoded in open standards into new formats than it is to convert ones held in proprietary formats.
Am I the only one that finds this number a little low? They could probably bump it up to 99% and still be right. Assuming they aren't including other types of files Office doesn't do (like cartography files as an example), I would think .doc, .xls, .ppt and the like account for almost all document types. I know if I have a document in OpenOffice or something similar, getting it to .doc format usually isn't far off, since otherwise it seems like noone else can really use it.
In undeveloped countries, the consumer controls the market. In capitalist America, the market controls you.
Ah, but you see, this is all part of a diabolical scheme by the RI/MPAA to blackmail storage companies into ensuring that no storage medium except theirs will last long enough for something to become public domian, so that no one else will ever own it!. Your CDs, DVDs, and hard drives will all die before that happens, and you will be forced to buy again, in HD format!
Bwahahahahaha- oh wait, not my scheme, nm.
The 'Net is a waste of time, and that's exactly what's right about it. - William Gibson
What we need is a storage solution, similar to HD-DVD/BlueRay etc, that has DRM and is open source.
I guess you will say it's inevitable, or something, but...
It would be like having a diamond, covered in poop. And the open source is the diamond, in case you wondered.
Don't poop on your diamond.
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
The concept of digital dark age assummes that only proprietary document formats and their corresponding applications are lost, while public knowledge (like W3C specs, encoding specifications, internet protocols) is preserved.
:D The data is safe!
Suppose that a very important document is formatted in Billy's proprietary document format v1.21, but there are no more copies of Billy's wordprocessor which was discontinued 250 years ago, so the format has to be reverse engineered.
Now what happened if Billy's wordprocessor instead used a public standard format whose specifications have passed through the generations since your great great great grandfather? Ah! Then you can use ZOffice v2500 to read the ancient document and it's compatible!
OpenDocument Alliance to fight looming 'digital Dark Age'
SearchOpenSource.com
http://searchopensource.techtarget.com/originalCon tent/0,289142,sid39_gci1170532,00.html
btw, if you have .doc that you are unable to open but the files themselves are not damaged, try opening them with latest oo.org version (2.0.2rc4 for now). if you still have problems, submit a report alongside with the file to oo.org developers. you might just get to your data :)
Rich
space in original URL was the problem, I retract the rant: http://searchopensource.techtarget.com/originalCon tent/0,289142,sid39_gci1170532,00.html
And you forgot to log in, so we can't even congratulate you.
Why can't we have Slashdot set up so that ACs can't post until a logged-in user has made the first post? That would put an end to the riduculous "first post" trolls.
Je fume. Tu fumes. Nous fûmes!
Sure, you can argue that they aren't as "rich" as Word, PDF et al, but they're standard and they're open.
You just answered your own question. ODF is meant to provide a way of encapsulating all the metadata for office-style documents. Meaning it's specifically designed for word processor documents, spreadsheets, presentation graphics, etc. These are highly rich formats just like Word, PDF, etc.
IBM are a member of this, and Sun are either joining or are planning to join. IMO trusted computing is much more likely to herald a "digital dark age" than any existing proprietary document format, so the _real_ headline is "IBM and Sun want some of Microsoft's lucrative Office market, and think that pushing Open Document might get it".
I'm not going to change your sheets again, Mr. Hastings.
It's already been solved. It's called "paper". It's been used for 1000's of years, and if you take care of it properly, it can last a LONG time, always be readable, and is more open source than any of the FUD the OSS camp spews out. Paper. Written records. Hasn't been beat yet. Kinda' like all of the people thinking that they were re-inventing the wheel with e-books. We've all seen how well that has gone.
I don't respond to AC's.
Historically, the problem of the disappearance "important" information has always existed, but some do not see the possible connection in a modern, digital world.
Some pieces of information did really exist long ago, but we only have references to the information, not the information itself. This could be from the lack of copies, or from suppression from religion or government.
In our digital world the same could happen with information, including software, books, music, and movies.
In an effort to absolutely control the information, different information industries attempt to control the media, using secrets, encryptions, and government control. These industries intend to profit from this information control as long as possible. The end of this control is assumed and mandated not to exist.
The problem is that at some point in the future the information could become non-valuable to these information industry. But currently, no mechanism exists such that these industries would be required or motivated to reveal the secrets or encryption mechanisms that would make the information useful. One cause could be that other information uses similar encryption or secrets, and the profit possibility of that information may be jeopardized.
The result is that unprofitable information may silently disappear, as whatever backups of the original expire.
Some examples would be:
A software company writes software, selling binaries only to the public. The copyright for the software is 100 years. Far before the end of the 100 years (perhaps 10 years),
the original source was no longer kept by the company. So in the future, looking back at the state of software in the year 2000, perhaps there may be some pictures of "Windows XP", but it may be unclear what it did, as no source exists, and it's not really worth reverse engineering. While somethings called Linux and BSD did exist, and the complete information/source about these would still be available. History can really focus only on the known, not the hidden.
Similarly, assume that the recording and music industry come up with the "perfect/unbreakable" encryption. They spend much of there resources hiding anything close to raw digital information from the consumer. But this DRMed songs eventually become unpopular. Obviously the DRM mechanism could still not be revealed as they still use it for other songs. They have essentially subverted any copyright limits, to impose an infinite limit. After the point of dis-interest, the DRM songs/movies may just fade away. I suppose Creative Commons music/movies of the time may survive instead. Obviously these may not represent what was seen at the time.
I have a number of files that only Open Office would open and it does a much better job than Word, in general. I do have files it fails on as well, although I don't have the most recent version of OO these days as my Windows machine died and the mac version of OpenOffice is lagging far behind. I suppose I could fire up the GUI on NetBSD. Submitting most of the documents, however, is not an option due to the sensitive nature of the content. Thanks for the suggestion though.
Right, but we run the risk of requiring the broadcast of everything IN LARGE FONTS so that a little signal can penetrate the noise.
Sometimes you need an incorruptible, sane image to focus people's notice on what really matters.
Anybody got one?
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
Becuase without a DRM scheme to lock out people who might want to access our information in the future, everything we store might actually be of some value to people once the current standard becomes defunct!
My 3D Texturing Skinning work (under construction)
for a relatively recent version of oo.org you could also try one of live cds.
;) ) you can file an issue and then povide the document to developer when requested.
unless the content is _extremly_ confidential (or embarrassing
Rich
If you really want to create long term archives of documents to be seen by the eyes of explorers and historians hundreds and thousands of years from now, what you want is analog storage, that is, hardcopy that can simply be read without a computer having to interpret 1s and 0s. Microfiche is one of the best solutions today, as it's "hardcopy" but much smaller than using a printed page on paper. Rather than requiring electronics to read the data, you just need a simple projector-like device (a microfiche viewer would be preferrable, and could be built with ease 100s of years from now, but you could also use an old-style "overhead" projector, a slide-projector build for microfiche cards, an automated microscope, etc.)
-- "I never gave these stories much credence." - HAL 9000
Question is, will you still be able to buy equipment to play your collection on at a realistic price? Also, will you still be alive in 200 years to listen to this collection?
-- Using the preview button since 2005
You are part of the Rebel Alliance and a traitor!
I have news for you: It doesn't matter what format the documents are in. If one format is unreadable, they all are. For example, if I can't read hello from 68 65 6C 6C 6F, then how in the hell would i understand hello from 3C 74 65 78 74 3E 68 65 6C 6C 6F 3C 2F 74 65 78 74 3E?
No way, either they'll be able to read it or they won't, it doesn't make any difference if we tag the text. I personally think sticking to ASCII would at least yield some possibility they could get the text back, because at least then the set of things you're deciphering is limited to the actual content, and not to some goofy markup that they could certainly care less about (we have 30,000 year old cave drawings... so just draw it on a rock if you want the future to have a picture of it).
stuff |
What you really have to consider, is which outcome you personally prefer. I prefer OpenDocument, because I'd like, long term, for us all to be able to exchange information freely. If that means that IBM and Sun sell a bunch of software, that I have an option to use in a competitive market, and make some money, good luck to them.
Absolutely, I agree with that sentiment -- just kind of funny that this is being reported as "news", when it's in effect, a press release. "Microsoft Competitors Endorse Non-Microsoft Solution." Of course it's partisan, and of course they're "anti-Microsoft".
I don't have any issue with OpenDocument becoming the dominant standard, because I'm a fan of transparency & openness... and in fact, I wish them well in their endeavour... but let's also call a spade a spade, and admit that, if 90% of government documents are in Microsoft format, then Microsoft will stay in business, at least in a size large enought to support Office / Word / Powerpoint, etc, simply because it's the "de facto" standard. Even if Windows, IE, MSN Messenger, and every other thing you can think of that Microsoft produces blows up and dies a flaming, spectacular death, Office products will continue to exist, even if they're sold off to another company to maintain & support.
If they ain't agin MicroSoft, they ain't with me ;-)
.. paranoid crackpot leftover from the days of Amiga.
Big Vendors want complex data formats that drive sales.
Libraries want simple easy to format standards that do the job.
IMHO html was that 10 years ago. But the Big Vendor and the like added all that shiny formatting, that added little to no value. To the actual documents being published So now html has a mess of unneeded glitter.
If you take the manufacturing process for Blu-Ray, which has higher capacity, and substitute the proprietary Microsoft codec for Theora, then you have a winner. Skip the DRM. It's not viable in the long run and only hassles your honest customers in the short run.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
LaTeX gets its usefulness and power from packages. Unless you want to standardise on a given reference set of packages, it can't be used sensibly for archival purposes. because you'll have to store all possible packages in all versions along with your data. If you're willing to do that, you could run Word in an emulator, too.
A set of LaTeX packages is much more Free than a copy of Microsoft Word and a copy of Bochs, and LyX is more useful than Word run in Bochs.
There is no universal method for package versioning
As I understand the LaTeX license, each package file name refers to one published version and only that version. See FSF's comments about the LPPL version 1.2.
LaTeX documents are really difficult to parse on a computer
If you want more semantics in your TeX so that you can easily produce both printed and electronic documentation, give Texinfo a try. It'll do until the CSS paged media recommendation becomes widely implemented.
All this talk about the One True Format(tm) is nice, and I'm heartily in favor of using OpenDocument over proprietary formats, but not to prevent a Digital Dark Age.
The Digital Dark Age people talk about is not about file formats. Mostly, it's about data storage and retention. Most of what historians/archeologists know about entire civilizations and time periods comes not from the official documents, but from the personal, off-the-cuff type stuff. Historians love reading journals, diaries and personal letters, and archeologists glean the most information from household and personal items. These are the things that give you insight into the *people* who lived in that age, and how the political events of the times (which are generally well preserved) were perceived.
However, most of our personal letters are now emails, which regularly get deleted, lost, blown away in a formatting, or simply forgotten about and tossed with the computer when we upgrade. Our journals and diaries are now blogs, which are subject to the same problems. In 2500 years when some archaeologist digs up your laptop, he must first decipher the machine to find where the data is stored, then extract the data, then decode it and translate it into his own language, before he can even start working on the meaning and significance of your emails, all of which contain complicated headers and multiple encodings (text, HTML, etc.). Contrast this with his finding a paper letter... the machine deciphering and data extraction is already done. All he has to do is decode the symbols and translate the language.
Data about our society will exist, but most of it will be in a digital form, and this places lots of extra burden on the person trying to understand the data. As a result, there will be many more gaps in our history, because the data is much harder to decipher.
Keeping our data in open formats is not really the issue; they still rely on conventions such as ASCII, XML, and PNG, that may or may not be lost. The truth is that the data only exists as 1s and 0s, and whether the data is in Microsoft Word format or OpenDocument format, it will still need to be deciphered and decoded. If all knowledge of ASCII/Unicode mapping and 32-bit RGBA color encoding is lost, does it matter if the XML schema of the format is documented somewhere in some different string of 1s and 0s?
What the OpenDocument format solves is the problem of near-term data access. In relatively short time spans, say 100 years or so, the OpenDocument will still be readable long after all proprietary formats have been abandoned. For this reason, OpenDocument should be used to keep documents available long after the company that provided the creation software has gone under. This is a noble and very valid goal, but let's not confuse it with the larger issue of the "Digital Dark Age."
For security, the MD5 hash of this message and sig is 09f911029d74e35bd84156c5635688c0.
for it to be truly open and future-accessible, it would have to specify everything from the bottom hardware level up - from how bits are encoded on whatever the storage medium is, through to file system layout, and only then starting to talk about the contents of the data itself.
The "Compact Disc Recordable" and "ISO 9660" formats are already fairly strictly specified, and abridged versions of the specs could be etched onto a durable substrate and included in the time capsule. Trouble is that these specifications are written in English, and future generations may not be able to decipher the English language.
From the article: "The OpenXML format is supported by Intel, Apple Computer, Toshiba, BP and the British Library, among others, Yates said."
OK, so do like or hate Apple today? They're obviously fellating MS in order to continue to have versions of Office created for them, no?
You would still need to have a key for decrypting content that was secret. When the source code is known, it becomes easier to determine where the key is stored in memory and how it is protected (if it even is protected at all).
I won't go so far as to say open source is completing incompatible with DRM in a technical sense, but it's the closest you can possibly come to it. As far as incompatible in a philosophical sense... well, if you can't figure that out maybe you shouldn't worry about philosophy.
---
According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.
the OSS crowd will consider it "teh evi1" without giving a second glance or trying to implement it.
First Glance:
OpenXML is patent-encumbered and Microsoft's covenant not to sue specifically and deliberately excludes revisions to and future versions of the standard from protection against being sued by Microsoft. This means that if any OSS developer attempts to fix any bug or security problem in it, Microsoft could sue for patent infringement.
Wake me up when it's worth having that second glance.
"I've got more toys than Teruhisa Kitahara."
I used to get CAD drawings sent to me as PDFs, so that I could manufacture parts. I would end up redrawing them, when a DXF cad file could more easily just be put straight into the machine. This actualy proved safer from a manufacturing standpoint, because many engineers dimensions didn't match what they actually had drawn.. perhaps they wanted to shorten a part, and instead of redrawing it, they just changed the dimension (lazy).. now if I just made the part from their drawing sent straight to the machine it would be wrong, and I would be wrong, because you manufacture "to print", and the drawings dimensions are what count. If you are sent a PDF, there is no manipulatung of dimensions possible, it prints out as it was sent, and that's it. Cad files on the other hand can be wrong, and they can be edited... and then you end up in a "blame-game" as to who messed it up... PDF's much safer.
waiting for ad.doubleclick.net
When copyright runs out the files will still be encrypted and only playable with the properly-licenced software. That's just one of the ways that DRM shifts the balance of power far too strongly on the side of media corporations and why I don't consider it a good deal to invest in DRM media.
Rosettafiche?
So what happens when the counterparts to hieroglyphic Egyptian, demotic Egyptian, and ancient Greek all end up as forgotten languages?