Slashdot Mirror


OpenDocument Alliance to Fight Digital Dark Age

OSS_ilation writes "A consortium of vendors and academic institutions -- including IBM, Sun Microsystems and the American Library Association -- has announced today that they are forming the OpenDocument Alliance as part of an effort to promote open file standards worldwide. The group will support the one truly open standard file format, OpenDocument, which is an XML-based file format used saving and exchanging editable office documents such as text documents, spreadsheets and presentations. Sun's Simon Phipps said he believed ODF would allow future generations to view all of today's digital docs and prevent a digital Dark Age from occurring."

18 of 185 comments (clear)

  1. THE one truly open format? by b1t+r0t · · Score: 2, Insightful
    I believe they call it "text/plain". Oh, you wanted formatting? Then try "text/html".

    There is more than one "truly open format", so using the word "the" is a bit pretentious.

    --

    --
    "Open source is good." - Steve Jobs
    "Open source is evil." - Microsoft
    1. Re:THE one truly open format? by owlstead · · Score: 1, Insightful

      If they are too stupid to crack ASCII, or even UTF-8 or UTF-16, to hell with them. They would not be able to get the bits out of the drive if they are *that* stupid. Things like NTFS, which is only documented at full at Microsoft (*if* it is fully documented), might pose more of a problem. Or encrypted drives etc.

      But I don't think that a harddrive will hold that much data over thousands of years anyway. Flash maybe...

    2. Re:THE one truly open format? by cwgmpls · · Score: 2, Insightful
      HTML is certainly not very standard and some implementations of it aren't even very open. HTML is basically a bastardization of SGML, while XML is a well-defined, strict subset of SGML. You can write HTML in XML if you want to make it open and standard, but you'd still be writing in XML.

      Open Document Format, on other hand, is a strict XML format so it is both open and standard.

      Sure, plain text is open and standard too, but most applications require a more structured document than you can get with plain text.

  2. Dark age already upon us by saskboy · · Score: 4, Insightful

    The dark age has already happened several times. There are oodles of media formats from the 70's and on that are no longer readable today in the standard computer. Heck, new computers don't even come with floppy drives for 3.5" floppies. I hope they have a strategy to tackle media problems along with file format compatibility, because the medium is the message.

    --
    Saskboy's blog is good. 9 out of 10 dentists agree.
  3. I truly wish them luck by Bullfish · · Score: 4, Insightful

    I really do wish them luck. The thing is the "document" and "content" companies are going to fight like hell to expand proprietary formats as they ultimately look to the MS word format, the sheer number of copies of MSOffice sold, and see the dollar signs available by controlling the format and making everyone dance to their tune. Anyone who remembers the fiasco that occurred when MSOffice 97 wasn't very compatible with the previous version will also remember that companies simply shelled out for converters etc until MS issued a patch. They had no choice.

    While packages like open office etc exist, they have for a while and are perceived as "not being ready for prime time" by most businesses. The only advantage many see is the ability to save as PDF (another proprietary format). For ODF to take hold, governments and some very large publishing concerns are going to have to adopt it. Else, not much will change and the march towards increasingly proprietary formats will continue.

    1. Re:I truly wish them luck by tepples · · Score: 2, Insightful

      Adobe owns the PDF standard outright, and the thing about proprietary formats is the originator can change the spec anytime they want without any input from anyone.

      But that doesn't prevent Free apps from specifying that they read and write PDF 7.0. On the other hand, do you think Free specs don't change? Look at the rewrite-from-scratch that is the XHTML 1.1 to XHTML 2.0 transition.

  4. Digital Dark Age My Ass by brunes69 · · Score: 4, Insightful

    The problem is not that there is no long-term storage. The problem is that we produce more useless data than ever before.

    Really, who gives a f*ck about your 1.25 TB of crap? Or mine? We're just two ants in the anthill. You really think you can look up any substantial amount of information on someone who lived 200 years ago? Hell, try *50* years ago. Aside from public records like tax information and housing details, and maybe some family photos, you are likly to come up with bubkus, unless that person was famous.

    It's going to be no different 200 years from now, and frankly I don't see the problem with that. Only in the past decade has everyone gotten this weird urge to try and archive and record every unimportant detail of their daily lives (see MySpace.com, blogging, etc). What they don't realize is no one really gives a crap today, and they sure as hell won't give a crap in 100 years.

    Historians want to know about culture as a whole, not in bite-sized chunks. Aside from the major move-makers (politicians, *some* celebrities), historians won't be any more interested people's musings on shit like Paris Hilton than I am.

    1. Re:Digital Dark Age My Ass by Random_Goblin · · Score: 4, Insightful

      Really, who gives a f*ck about your 1.25 TB of crap? Or mine? We're just two ants in the anthill.

      actually judging by what modern archeologists find really interesting, it is exactly what future archeologists will be interested in.

      The little bits of detritus that make up individuals lives are far more interesting than the "big picture" history which is usually heavily loaded with spin, and therefore a bit of chore working out what actually was happening as opposed to what people wanted you to believe was happening.

      the fact that people ARE musing on shit like Paris Hilton IS going to be interesting to future historians, because it gives an insight into how people were living their lives and what was important to them at the begining of the 21st Century.

      all of those pictures from our camera phones, and whining live journals may not be a terribly flattering picture of our lives, but for an archeological point of view, it's exactly the sort of evidence you want.

    2. Re:Digital Dark Age My Ass by tgd · · Score: 3, Insightful

      Actually things like personal papers and photographs from random people are immensely valuable to historians, and priceless to family.

      I have nearly a hundred glass negatives of photos from my family a hundred years ago.

      A hundred years from now, its unlikely any of the 500 gig of digital photos or DV videos I have in there will be available to anyone. Hell, I'm worried that a couple of bad failures in quick succession could mean the same for myself or my future children in a lot less time!

      You clearly are not a historian or a history buff... because I don't know any that would make such a blatently rediculous statement that they are not interested personal writings and other forms of media.

  5. Re:Wow! by lxs · · Score: 2, Insightful

    I like the term. Literally the term 'dark' only refers to 'lack of information', but it has great connotations of doom and gloom that make for great PR.

    If it weren't for the end-of-civilisation hype, most Y2K bugs would have remained unfixed until early 2001 by buraucratic laxity, not resulting the end of the world, but in a major headache for many companies.

    Sometimes you need a catchy image to get people to take notice.

  6. Re:The one truly open standard file format? by 99BottlesOfBeerInMyF · · Score: 2, Insightful

    Sure, you can argue that they aren't as "rich" as Word, PDF et al, but they're standard and they're open.

    For that matter PDF is open too. No this basically a crusade against MS proprietary formats and I'm all for it. I have inherited Word files that already cannot be opened with any product available on the market today. Governments especially need to be encouraged to move all the data that belongs to the public into open file formats and one of the best ways to do that is to proscribe an open standard for government use.

    Don't worry about other open formats, there will always be ways to convert them, but this is a good strategic move to stop the use of closed formats. One standard provides a unified front for everyone to collaborate on.

  7. Re:what are the comparisons: openxml vs. open doc? by 99BottlesOfBeerInMyF · · Score: 3, Insightful

    Can anyone clear up exactly what OpenXML is? When I google it, I get vague references leading me to believe OpenXML is more of a container, and not Microsoft's specific document format. So, this sounds like another canard from Microsoft with the claim "open" obfuscating what is probably not.

    MS is offering to license this format to people under particular terms and parts of this format are indeed binaries embedding in XML. It is also patent encumbered. The main objections to the licensing include restrictions making it unimplementable by GPLed programs and licensing for old versions expires as soon as MS releases a new version, thus providing no guarantee that future generations will be able to legally read the files. Basically it is MS trying to confuse the issue and claim their format is just as open as the Open Document, even though in reality it does not confer the benefits Open Document does.

  8. I think GP misunderstands 'digital dark age' by Spy+der+Mann · · Score: 2, Insightful

    The concept of digital dark age assummes that only proprietary document formats and their corresponding applications are lost, while public knowledge (like W3C specs, encoding specifications, internet protocols) is preserved.

    Suppose that a very important document is formatted in Billy's proprietary document format v1.21, but there are no more copies of Billy's wordprocessor which was discontinued 250 years ago, so the format has to be reverse engineered.

    Now what happened if Billy's wordprocessor instead used a public standard format whose specifications have passed through the generations since your great great great grandfather? Ah! Then you can use ZOffice v2500 to read the ancient document and it's compatible! :D The data is safe!

  9. Solved, already by DogDude · · Score: 3, Insightful

    It's already been solved. It's called "paper". It's been used for 1000's of years, and if you take care of it properly, it can last a LONG time, always be readable, and is more open source than any of the FUD the OSS camp spews out. Paper. Written records. Hasn't been beat yet. Kinda' like all of the people thinking that they were re-inventing the wheel with e-books. We've all seen how well that has gone.

    --
    I don't respond to AC's.
  10. Dark Information by GodWasAnAlien · · Score: 4, Insightful

    Historically, the problem of the disappearance "important" information has always existed, but some do not see the possible connection in a modern, digital world.

    Some pieces of information did really exist long ago, but we only have references to the information, not the information itself. This could be from the lack of copies, or from suppression from religion or government.

    In our digital world the same could happen with information, including software, books, music, and movies.

    In an effort to absolutely control the information, different information industries attempt to control the media, using secrets, encryptions, and government control. These industries intend to profit from this information control as long as possible. The end of this control is assumed and mandated not to exist.

    The problem is that at some point in the future the information could become non-valuable to these information industry. But currently, no mechanism exists such that these industries would be required or motivated to reveal the secrets or encryption mechanisms that would make the information useful. One cause could be that other information uses similar encryption or secrets, and the profit possibility of that information may be jeopardized.

    The result is that unprofitable information may silently disappear, as whatever backups of the original expire.

    Some examples would be:

    A software company writes software, selling binaries only to the public. The copyright for the software is 100 years. Far before the end of the 100 years (perhaps 10 years),
    the original source was no longer kept by the company. So in the future, looking back at the state of software in the year 2000, perhaps there may be some pictures of "Windows XP", but it may be unclear what it did, as no source exists, and it's not really worth reverse engineering. While somethings called Linux and BSD did exist, and the complete information/source about these would still be available. History can really focus only on the known, not the hidden.

    Similarly, assume that the recording and music industry come up with the "perfect/unbreakable" encryption. They spend much of there resources hiding anything close to raw digital information from the consumer. But this DRMed songs eventually become unpopular. Obviously the DRM mechanism could still not be revealed as they still use it for other songs. They have essentially subverted any copyright limits, to impose an infinite limit. After the point of dis-interest, the DRM songs/movies may just fade away. I suppose Creative Commons music/movies of the time may survive instead. Obviously these may not represent what was seen at the time.

  11. Re:not that I would be against.. by rxmd · · Score: 4, Insightful
    But there are already such formats. I.e. latex.
    LaTeX is in no way an open document format suitable for storing, let alone archiving data, for a variety of reasons:

    • Firstly, LaTeX gets its usefulness and power from packages. Unless you want to standardise on a given reference set of packages, it can't be used sensibly for archival purposes. because you'll have to store all possible packages in all versions along with your data. If you're willing to do that, you could run Word in an emulator, too.
    • There is no universal method for package versioning, for resolving package dependencies and for maintaining backward and forward compatibility between package versions. This creates lots of problems when you use older documents on a newer TeX system. An example was the rather popular geometry package for easier page geometry setup where version 3 of the package broke compatibility with older versions. The author added a simple switch to make the new version behave like the old ones, but you had to add the switch to the \usepackage declaration to make your documents compile. If you have to modify your documents to keep them useable, you're missing the point of a document archive.
    • There is no consistent way of using Unicode in TeX documents. Basically, with the existing solutions such as Lambda/Omega, UTF-8 inputenc, ucs.sty and proprietary packages, it's "choose two out of: compatibility with most LaTeX packages, compatibility with the Unicode standard, large character repertoire". It's somehow useable, but not really well enough to be called universal.
    • LaTeX documents are really difficult to parse on a computer, making them even more ill-suited for archival storage on a large scale. Try talking to the developers of the TeX->LyX conversion scripts one day. Someone stated that the only good TeX parser is TeX itself. A good archival document format should be parseable using third-party tools.


    LaTeX is a typesetting system. It's designed for getting a nicely formatted PDF or PostScript file out of a source file that you can alter and modify on the spot. Typesetting is what it does really well. If you try to shove and bend it into other roles, it starts to get kludgy, especially when it concerns data exchange between large numbers of users with inconsistent package versions, automated processing of LaTeX documents with third-party tools or heavy use of international character sets.
    --
    As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
  12. Re:I declare SHENANIGANS! by 16K+Ram+Pack · · Score: 2, Insightful
    I often say this on slashdot, but in the end, every company is out to shape things in their favour. Corporations are not really altruistic by nature. Even if the CEO wants to do something just because it's cool, he has to justify himself to the board.

    What you really have to consider, is which outcome you personally prefer. I prefer OpenDocument, because I'd like, long term, for us all to be able to exchange information freely. If that means that IBM and Sun sell a bunch of software, that I have an option to use in a competitive market, and make some money, good luck to them.

  13. OpenDocument not the answer by AeroIllini · · Score: 3, Insightful

    All this talk about the One True Format(tm) is nice, and I'm heartily in favor of using OpenDocument over proprietary formats, but not to prevent a Digital Dark Age.

    The Digital Dark Age people talk about is not about file formats. Mostly, it's about data storage and retention. Most of what historians/archeologists know about entire civilizations and time periods comes not from the official documents, but from the personal, off-the-cuff type stuff. Historians love reading journals, diaries and personal letters, and archeologists glean the most information from household and personal items. These are the things that give you insight into the *people* who lived in that age, and how the political events of the times (which are generally well preserved) were perceived.

    However, most of our personal letters are now emails, which regularly get deleted, lost, blown away in a formatting, or simply forgotten about and tossed with the computer when we upgrade. Our journals and diaries are now blogs, which are subject to the same problems. In 2500 years when some archaeologist digs up your laptop, he must first decipher the machine to find where the data is stored, then extract the data, then decode it and translate it into his own language, before he can even start working on the meaning and significance of your emails, all of which contain complicated headers and multiple encodings (text, HTML, etc.). Contrast this with his finding a paper letter... the machine deciphering and data extraction is already done. All he has to do is decode the symbols and translate the language.

    Data about our society will exist, but most of it will be in a digital form, and this places lots of extra burden on the person trying to understand the data. As a result, there will be many more gaps in our history, because the data is much harder to decipher.

    Keeping our data in open formats is not really the issue; they still rely on conventions such as ASCII, XML, and PNG, that may or may not be lost. The truth is that the data only exists as 1s and 0s, and whether the data is in Microsoft Word format or OpenDocument format, it will still need to be deciphered and decoded. If all knowledge of ASCII/Unicode mapping and 32-bit RGBA color encoding is lost, does it matter if the XML schema of the format is documented somewhere in some different string of 1s and 0s?

    What the OpenDocument format solves is the problem of near-term data access. In relatively short time spans, say 100 years or so, the OpenDocument will still be readable long after all proprietary formats have been abandoned. For this reason, OpenDocument should be used to keep documents available long after the company that provided the creation software has gone under. This is a noble and very valid goal, but let's not confuse it with the larger issue of the "Digital Dark Age."

    --
    For security, the MD5 hash of this message and sig is 09f911029d74e35bd84156c5635688c0.