Slashdot Mirror


National Archive File Format Time Bomb

geordie_loz writes "The BBC is reporting that the UK National Archive is warning of old formats being a 'ticking time-bomb' where data is going to be lost because of incompatibility in newer versions of software, and software not existing at all. More surprisingly, Microsoft has offered a solution via the OOXML format."

8 of 233 comments (clear)

  1. Use SGML by Morgaine · · Score: 5, Funny

    It predates Moses, and is quite likely to survive the heat death of the universe.

    --
    "The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
  2. The big lie... by advocate_one · · Score: 5, Informative
    they keep repeating this everytwhere they go... "Open XML"... their format is not Open... it's closed off with licensing and other restrictions... all the really good stuff in the specification has been obfuscated out and hidden behind indirections to the behaviour of legacy apps that only microsoft know the real ins and outs of... not only that, there's still an easy means for them to merely use XML as a wrapper for binary blobs...

    to give it a proper name, the format is "Microsoft Open Office XML", they deliberately went to a lot of trouble to pick a name that's as easily to confuse as possible with OpenOffice

    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
  3. Obviously... by Colin+Smith · · Score: 5, Funny

    If you have a problem with proprietary formats you go to Microsoft to solve it for you... The word "DOH" springs to mind.

    Oh yeah, their solution? Virtualised Windows 3.1. And obviously in 15 years you'll have to virtualise Vista in order to run the Win3.1 virtual machine to run Word. And Microsoft will be paid a license for each application and level of virtualisation.

    You couldn't make this stuff up.

    --
    Deleted
  4. Re:MS should not own the standard by dvice_null · · Score: 5, Informative

    There is no such thing as Open Office format. Perhaps you mean OpenDocument Format, which is used by several different applications ( http://en.wikipedia.org/wiki/List_of_applications_ supporting_OpenDocument ), including OpenOffice.org.

  5. surprise? by Tom · · Score: 5, Insightful

    What's surprising about that? Someone in MS Spin Control and Public Relations is worth his salary. The story could have exploded into an "avoid MS products if you want your data accessible some years down the road" fiasco (we all know that MS is the worst offender when it comes to changing the document formats, usually undocumented). Instead, it was turned into another push for their next format.

    Brilliant.

    "What, the shit I sold you yesterday stinks? Try this new shit, it's great and it has none of the problems of the old one."

    That's what you hire PR people for.

    --
    Assorted stuff I do sometimes: Lemuria.org
  6. How about some *helpful* suggestions by FreudianNightmare · · Score: 5, Insightful

    Rather than bitching about Microsoft making an offer of 'help' which is just thinly disguised marketing (I mean, come on, par for the course no?), could we get a discussion about real solutions? I know MS bashing is fun, but come on, we do it on just about every other thread... lets have a day off.

    To kick things off here's one:

    Keep EVERYTHING in the simplest possible format. ASCII would seem sensible, since its the content we care about, not the formatting. (although that wouldn't help our Asiatic brethren much). Then Keep decent records of HOW you can read that format. With examples of the software and hardware. do this bit on PAPER. V. Tough Paper (or rock, or plastic or whatever). Update the explanations every other year, to put it in language the next gen will understand. Maybe also have instructions on how to translate the simple format to less simple things.

    I guess, basically, its a case of KISS and then *provide a persistent and regularly updated 'Rosetta Stone'* for latecomers to work from.

    As a side branch, this kind of reminds me of discussions I read about a while back of how to warn future generations about Nuclear Waste dumps (y'know, the really nasty stuff with half-lives in the thousands of years range). I don't think anyone ever came up with a decent answer....

    --
    'Speak softly and carry a beagle'
  7. 1/2 pentabyte = 20 bits? by benhocking · · Score: 5, Funny

    Fine, then you get to be the schmuck who has to organize, sort, label and store about 1/2 a pentabyte of information on paper.

    A pentabyte is 5 bytes, right? How hard is it to store 20 bits on paper? ;)

    (I assume petabyte (10^15 or 2^50, depending on convention) is the word you're looking for.)

    --
    Ben Hocking
    Need a professional organizer?
  8. Re:Doesn't matter. by Bazzargh · · Score: 5, Interesting

    And being a government, these files are INCREDIBLY important.

    Why haven't they been converted? Really, all their DIGITAL archives should be in a single format by now.


    No, they shouldn't. You usually want 3 formats:
    - the original format of the document. Whatever whichever idiot happened to write (or record, or video) it in, you absolutely want the original in your records.
    - a searchable format (eg OCR'd text from scanned image docs)
    - a rendered format. (eg an image or pdf, or svg - something open enough that you can continue to show how the doc would have looked). The appropriate rendered format varies. Paper is not an appropriate format for storing CCTV footage, for example ;)

    If you're very, very lucky the original is both searchable and viewable; like, say, HTML. It gets more complicated too, because you often want to store a redacted copy of the document (think of the Onion story 'CIA realise they've been using black highlighter pen all these years') and you want that searchable too, so you have to keep a redacted searchable format too... and of course, some of the records are on actual paper. Have you started worrying about the fading inks in the originals yet?

    BTW you can't restrict the format of the original. Consider an email from a corporate bidding for a govt contract, with attachments. They need to keep those.

    - Mr. E

    PS, posting anon because I have dealings with the national archives, and don't want to speak for my company.