Slashdot Mirror


National Archive File Format Time Bomb

geordie_loz writes "The BBC is reporting that the UK National Archive is warning of old formats being a 'ticking time-bomb' where data is going to be lost because of incompatibility in newer versions of software, and software not existing at all. More surprisingly, Microsoft has offered a solution via the OOXML format."

45 of 233 comments (clear)

  1. Tagging beta... by Rufty · · Score: 2, Insightful

    ITSATRAP!

    --
    Red to red, black to black. Switch it on, but stand well back.
  2. Idiots by suv4x4 · · Score: 4, Insightful

    The BBC is reporting that the UK National Archive is warning of old formats being a 'ticking time-bomb' where data is going to be lost because of incompatibility in newer versions of software, and software not existing at all. More surprisingly, Microsoft has offered a solution via the OOXML format.

    There are so many idiots in this state of the affairs:

    1. the idiots which decided to build huge archive with undocumented proprietary format
    2. idiots which believe they can't find even a single copy of the software they need
    3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).
    4. idiots who want to convince other idiots that OOXML is an open format (versus straight XML serialization of the whatever binary DOC was in the source code base at the time in MS)

    1. Re:Idiots by xtracto · · Score: 4, Interesting

      2. idiots which believe they can't find even a single copy of the software they need

      Please give me a link to a copy of the Professional Write 3 (PW) software app. for MSDOS 6.

      Yep, I had that very problem some years ago when I was cleaning my room and found several 5 1/4 disquettes which contained the .pw extension. No way to find the program.

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    2. Re:Idiots by CastrTroy · · Score: 3, Insightful

      I believe points 2 and 3 can be lumped into 1 format. It's like creating backup tapes, and then throwing out the tape reader. Who thinks these systems up?

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    3. Re:Idiots by tolan-b · · Score: 4, Insightful

      It's not an archive of files in a single format, it's an archive of files in general, many formats, depending on which format the file was originally in.

      The system wasn't thought up any more than a library thinks up all the books it contains.

    4. Re:Idiots by Kjella · · Score: 2, Interesting

      1. the idiots which decided to build huge archive with undocumented proprietary format

      Which seems reasonable at a time when "everyone" has a computer that'll read it, for example when it comes to image viewers there's software covering literally hundreds of formats without issue.

      2. idiots which believe they can't find even a single copy of the software they need

      It's supposed to be an archive, not a "well we'll have to dig up a copy of the software, I'll get back to you in some months.

      3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).

      Presumably a license issue, AFAIK things like national archives typically only required you to file a copy, not to file the software itself. Probably the laws need to be changed to include a copy of the reader software.

      4. idiots who want to convince other idiots that OOXML is an open format (versus straight XML serialization of the whatever binary DOC was in the source code base at the time in MS)

      If they can make a meaningful interpretation in OOXML it's an improvement, if it's a BLOB in XML or a BLOB in OOXML it doesn't really matter, then you still need the software.

      If Microsoft has taken the job of taking some binary BLOB and make it into something human-readable, OOXML or not, then I say you'd have an easier time converting OOXML to something readable in OpenOffice than not.

      Besides, this might actually be a semi-legitimate use for all the tags in OOXML which says to emulate an old version, because that's what you're doing. Particularly with stupid formatting like inserting line breaks to make pages break where you want them to, it might actually be preferable compared to manually going over them and replacing them with proper format tags.

      --
      Live today, because you never know what tomorrow brings
    5. Re:Idiots by timeOday · · Score: 2, Insightful

      3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).
      That's easier said than done. You'd have to keep multiple copies of everything, including hardware, up to the point where you're confident you have a stable standard - probably the power mains - and that's if you're not worried about violating licenses. Of course, with the advent of online apps, there is no way to snapshot the entire ecosystem of servers and software that actually makes an application run (especially since you never had direct access to it in the first place). The only reasonable solution is to pick some standard, such as jpg and ODF, and consolidate on that.
    6. Re:Idiots by mjensen · · Score: 2, Interesting

      Don't have a link, but have Professional Write on CD of "Work software not used anymore"

      Along with Professional File (database product)....

    7. Re:Idiots by nneonneo · · Score: 3, Interesting

      Step back, though, and think for a minute about the "house of cards" upon which that Word document rests.

      It rests on
      1) Physical storage medium -- whether this is Flash, Hard Drive, Optical Medium, [NV]RAM, etc., all these technologies may be very difficult to retrieve data from, especially if the level of technology happens to go down in the future (say, global thermonuclear war). Even if data is retrieved, there's no guarantee that it's intact after 1000 years (the dyes in CDs will have decomposed by that time; the Flash drives will have leaked all charge; the hard drives will have randomly demagnetized over that period of time, etc.).
      2) 8 bits per byte, 32 bits for most integers in the file, IEEE specification for floating point numbers, ASCII, Unicode, GB2312, etc. -- encodings for our numbers, letters and even bit-packing of binary data will affect retrievability. It would be difficult, I imagine, for some futuristic person to wander along (with possibly a different language) and attempt to interpret all of that.
      3) File format -- the MSOffice OLE2 format is incredibly complex, perhaps overly so. An OLE2 file takes the form of a miniature filesystem, with a Fragment Allocation Table, 512-4096 (variable) sized blocks, a master Double-Indirection FAT, sub-block allocation, etc. Fragments within the OLE2 container are assembled into "files" or streams in this file system and then parsed. Sure, it makes for wickedly fast saving times (since you can write to changed portions only and add fragments as needed, like a real file system), but it also makes it damn hard to parse, compared to plain text formats like XML or RTF.

      There are many more layers to this system, but that's a basic overview of what a researcher (or someone else) 1000 years from now will have to contend with.

      Sure, if you're looking at just extracting text, you can skip the last layer by simply going in the file and pulling as much out as possible, much like I used to do with corrupted Word documents. However, if you're looking to retrieve images, videos, archival audio, etc. then your job is much harder.

    8. Re:Idiots by timeOday · · Score: 2, Insightful

      Just an FYI, governments don't have to worry about licensing. Especially in situations like this. They have the power of eminent domain.
      Think about Valve's Steam software for protecting video games (or any software that requires network activation). Just because you're willing to bypass it doesn't mean you can.
  3. Use SGML by Morgaine · · Score: 5, Funny

    It predates Moses, and is quite likely to survive the heat death of the universe.

    --
    "The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
  4. The big lie... by advocate_one · · Score: 5, Informative
    they keep repeating this everytwhere they go... "Open XML"... their format is not Open... it's closed off with licensing and other restrictions... all the really good stuff in the specification has been obfuscated out and hidden behind indirections to the behaviour of legacy apps that only microsoft know the real ins and outs of... not only that, there's still an easy means for them to merely use XML as a wrapper for binary blobs...

    to give it a proper name, the format is "Microsoft Open Office XML", they deliberately went to a lot of trouble to pick a name that's as easily to confuse as possible with OpenOffice

    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    1. Re:The big lie... by Quarters · · Score: 2, Informative

      Their name choice has certainly worked on you. It's not "Microsoft Open Office XML" like you said. It's "Microsoft Office Open XML".

    2. Re:The big lie... by davester666 · · Score: 3, Funny

      they keep repeating this everytwhere they go... "Open XML"... their format is not Open... it's closed off with licensing and other restrictions... all the really good stuff in the specification has been obfuscated out and hidden behind indirections to the behaviour of legacy apps that only microsoft know the real ins and outs of... not only that, there's still an easy means for them to merely use XML as a wrapper for binary blobs...
      You don't understand the format then. Office Open XML is the ultimate in 'open' because they define it so you can embed any other document, in whatever format, within it. Obviously, the UK National Archive must use this 'master' format for storing all their existing documents in.
      --
      Sleep your way to a whiter smile...date a dentist!
  5. Obviously... by Colin+Smith · · Score: 5, Funny

    If you have a problem with proprietary formats you go to Microsoft to solve it for you... The word "DOH" springs to mind.

    Oh yeah, their solution? Virtualised Windows 3.1. And obviously in 15 years you'll have to virtualise Vista in order to run the Win3.1 virtual machine to run Word. And Microsoft will be paid a license for each application and level of virtualisation.

    You couldn't make this stuff up.

    --
    Deleted
  6. More surprisingly!? No, UNsurprisingly by erroneus · · Score: 3, Interesting

    No. The obvious solution for the predicted problem of data being unavailable due to being in unsupported proprietary formats is to move it to a widely supported non-proprietary format.

    As "well intentioned" as Microsoft may be, Microsoft's Open XML cannot be anything but proprietary when its code references Windows and Office API functions rather than more precise data format information as with ODF. (For more information about this, you might search out the arguments against making OOXML an ISO standard.)

  7. Re:MS should not own the standard by dvice_null · · Score: 5, Informative

    There is no such thing as Open Office format. Perhaps you mean OpenDocument Format, which is used by several different applications ( http://en.wikipedia.org/wiki/List_of_applications_ supporting_OpenDocument ), including OpenOffice.org.

  8. Doesn't open source solve this by wile_e_wonka · · Score: 4, Informative

    It seems to me that this is really a nonproblem--OOo is compatible with lots of "dead" formats (or, can read them at least), as well as many other open source office programs. I can't imagine they're going to begin throwing away this compatability--it isn't like it takes extra coding (as far as I know). Also, I have found Microsoft Word's "Extract text from any file" to work pretty well (I had a roommate with a corrupted Mac-formatted disk that had his deceased grandmother's journal on it in some old Mac Word file (a format still readable in Word, but the disk was corrupted so I couldn't just open the file). I popped it in my parents' now deceased iMac and the only program I found that opened it was Word, using the "Extract text from any file" function. I emailed him the journal and he thanked me profusely).

    Also--as noted, the OOXML format is a nonsolution for this nonproblem. It seems like it would be a waste of effort--why convert a bunch of files to a format that may die just as quickly as any other format, when you can just leave the file as is and open it in OOo (assuming I'm correct that they won't stop read support for dead formats)?

    Also, it seems to me that no current format or any future format will ever solve this nonproblem because formats will always change as new functionality is continually added. The better solution is to keep this a nonproblem by having open source software that can read old file formats.

  9. surprise? by Tom · · Score: 5, Insightful

    What's surprising about that? Someone in MS Spin Control and Public Relations is worth his salary. The story could have exploded into an "avoid MS products if you want your data accessible some years down the road" fiasco (we all know that MS is the worst offender when it comes to changing the document formats, usually undocumented). Instead, it was turned into another push for their next format.

    Brilliant.

    "What, the shit I sold you yesterday stinks? Try this new shit, it's great and it has none of the problems of the old one."

    That's what you hire PR people for.

    --
    Assorted stuff I do sometimes: Lemuria.org
  10. How about some *helpful* suggestions by FreudianNightmare · · Score: 5, Insightful

    Rather than bitching about Microsoft making an offer of 'help' which is just thinly disguised marketing (I mean, come on, par for the course no?), could we get a discussion about real solutions? I know MS bashing is fun, but come on, we do it on just about every other thread... lets have a day off.

    To kick things off here's one:

    Keep EVERYTHING in the simplest possible format. ASCII would seem sensible, since its the content we care about, not the formatting. (although that wouldn't help our Asiatic brethren much). Then Keep decent records of HOW you can read that format. With examples of the software and hardware. do this bit on PAPER. V. Tough Paper (or rock, or plastic or whatever). Update the explanations every other year, to put it in language the next gen will understand. Maybe also have instructions on how to translate the simple format to less simple things.

    I guess, basically, its a case of KISS and then *provide a persistent and regularly updated 'Rosetta Stone'* for latecomers to work from.

    As a side branch, this kind of reminds me of discussions I read about a while back of how to warn future generations about Nuclear Waste dumps (y'know, the really nasty stuff with half-lives in the thousands of years range). I don't think anyone ever came up with a decent answer....

    --
    'Speak softly and carry a beagle'
    1. Re:How about some *helpful* suggestions by Ceriel+Nosforit · · Score: 2, Interesting

      ASCII would seem sensible, since its the content we care about, not the formatting. No, the formatting is important as well. Sometimes 'the medium is the message', and that whole bunch of artsy crap we geeks would prefer to ignore. - Just think of it as an engineering challenge in order to make the pain go away.

      You always archive the original (unless you have a batch; then you sample one and call it the original), and that original can be in just about any format, hand-written, coffee-stained, in sanskrit. When scanning a document into an electronic archive the ideal would be to have OCR create a font and layout on the fly while running, so that the electronic version of the original would still look like the dead-tree original, and yet be machine-readable.
      Dead-tree is still the standard, as it has been for the past few millennia. Directives on national level will often not even recognize that electronic archiving exists since there is no standard to be used. Once we have a format of SGML such as .odt standardized, electronic archiving might come into existence.

      In practice your organization will often receive a document per snail-mail from an external source and an electronic version might not even exist. To make it electronic you have to scan it and to make it machine-readable you have to OCR it. Then the final results needs to be retrievable 50 years from now, or until the end of time. Yes, many things are actually archived without an expiration date. The cost, measured in money, of archiving something permanently is literally infinite.

      From this perspective we geeks who are used to Moore's law etc. seem pretty darn impatient and narrow-minded. There are factors involved ranging from constructing immortal dinosaur pens to training staff who 'have been doing it their way for the past 30 years' and are hellbent on continuing so until they retire. ...But that too could be seen as an interesting engineering challenge. And that's why it's taking so long; because the project is just so gosh darn gargantuan.
      --
      All rites reversed 2010
    2. Re:How about some *helpful* suggestions by Kjella · · Score: 2, Interesting

      "There is always an easy solution to every human problem--neat,
      plausible, and wrong." -- H.L.Mencken, The Divine Afflatus (1917).

      Ok, you started by identifying one problem - asian languages. In fact, pretty much every non-US language since you said ASCII and not Latin1. So we can extend that to UTF-8 with no problems, except there's probably a huge table just for the 100000 characters or so, even though the spec is quite short.

      But then, you have only characters, which is probably fine for basic text. How about works containing formulas? Uh-oh. You need some kind of math language for that, something like MathML. Of course, if you don't need math, you don't need MathML either.

      But wait... what if it's not all text? What if you're trying to store diagrams, illustrations or graphs which can arguably be just as meaningful as the rest, for example if you're trying to preserve the works of Leonardo to Vinci. And uncompressed bitmaps are so incredibly inefficient, maybe you need a picture spec with some basic picture formats (vector, lossless scalar, lossy scalar at least), you could call them something like SVG, PNG and JPG. Of course, if you don't need graphics you don't need those.

      But then, maybe you're looking to add references. Yes, you could do plain-text matching but it's much reliable and maintainable to make anchors and point to them, and they let you build tables of content and such etc. too.

      Then, maybe there's times where fonts and layout really does matter, for example the lines of a poem? Perhaps we really should have a system to preserve that. Of course you can try to do this by embedding coding into plaintext the way say project Gutenberg does but it's not very userfriendly, it's a bit like having "magic numbers" in code. Most of the time it's easier to just have a file format, and say "and these bits are the plaintext".

      In short, if you have a problem with storing this in OOXML and the solution is to use ASCII, then I think you're solving only a very very small subset of archive problems. If you're looking at the other way and say "I want one document format, to archive all my documents of every shape and form" then you'll be lucky if the 738 pages of OpenDocument standard (which is actually a lot more through referring to other standards) will suffice.

      --
      Live today, because you never know what tomorrow brings
    3. Re:How about some *helpful* suggestions by davecb · · Score: 2, Interesting

      We fought a lot with this at Siemens (Sietec) about fifteen years ago, when trying to decide what format to use on stackers full of 12" WORM disks, which were just nicely becoming useful for large-scale archival storage in those days. We needed format that would outlast the disks, which probably meant 50-100 years assuming normal replacement/turnover.

      We ended up with the bottom level being a WORM standard, which was served out to users via the NFS standard, which was reasonably close to a Unix filesystem, and was usable by Windows clients, and finally we stored the data in quit simple random files with tables of contents, so we could handle multi-page documents.

      In practice we found the data we were storing was almost always images, as that what businesses wanted to store: scanned images of legal, business and medical documents. As the parent suggested, we used as simple a format as possible, but no simpler (;-))

      For text documents, I recollect we did support some commercial formats, but only ones for which we knew the full specification and had a translator in source form. Our own data was mostly LaTeX, the typesetting language, expressed as ascii characters, and occasionally postscript or pdf, ditto.

      --dave

      --
      davecb@spamcop.net
  11. It's not just about the software... by WIAKywbfatw · · Score: 2, Interesting

    It's not just about the software. It's the hardware, too.

    I'm sure that most of the archive data created today is stored on something like DVDs but, as recently as the early 1990s, the official long-term storage medium for the UK government was Syquest 44MB removable cartridge hard drives.

    I know that I have a working 44MB drive (well, when I last fired it up, which would have been sometime last decade) somewhere in my attic but I doubt that too many of these drives are still in existance.

    I only hope that the data that was once stored on thousands of these was successfully transferred to a more readily accessible storage format and that that new format is just as durable - media these days just seems to disintegrate after a few years.

    --

    "Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
  12. I've never understood this arguement... by ferrellcat · · Score: 2, Informative
    ...this argument that files and data will one day just magically become inaccessible in the future. I have tape and diskette media for my Commodore Pet machines that goes back to 1978. That's 29 years ago, and guess what? The great majority of this media is STILL READABLE. Furthermore, the tools necessary to transfer any of my old media to modern PCs have been around for well over a decade. Once you have the data on a modern PC the rest can be handled with emulation or virtualization. For someone to complain...

    "If you stored something on a floppy disc just three or four years ago, you'd have a hard time finding a modern computer capable of opening it"
    ...when I can easily run programs I wrote in the SEVENTIES is pure nonsense.
  13. IBM by ushering05401 · · Score: 2, Informative

    If you are going to choose a proprietary vendor to safeguard your data wouldn't IBM be the obvious choice. They have proven their ability to keep 20 year old programs running in modern environments without modification.

    It has been a while since I worked on an AS/400 system... so anyone with updated info please feel free to correct me if things have changed.

    It seems like a no-brainer.

    Link: http://en.wikipedia.org/wiki/AS/400

    1. Re:IBM by LiquidCoooled · · Score: 2, Informative

      Actually, MS have done quite well with forwards compatibility.

      I can still double click on .com executable files written well back in the mists of time and run usable programs.

      For example, here is a version of Visicalc from 1981!

      http://www.bricklin.com/history/vcexecutable.htm

      --
      liqbase :: faster than paper
    2. Re:IBM by TheRaven64 · · Score: 2, Insightful

      I have run that same version of Visicalc, in DOSBox, on a PowerPC Mac. Actually, I've run a few programs in that environment that don't run on Windows without the aid of DOSBox. To me, this says that third parties are better than Microsoft themselves for backwards compatibility with Microsoft programs. I wonder how long it will be before WINE has better support for old Windows apps. I think this is already the case for a few win16 programs...

      --
      I am TheRaven on Soylent News
  14. Comment removed by account_deleted · · Score: 4, Funny

    Comment removed based on user account deletion

  15. Use TeX by user1003 · · Score: 2, Interesting

    I wanted to design something that would be still usable in 100 years. (Donald E. Knuth, more than 20 years ago)

    Also, LaTeX will get you nicer documents than any WYSIWYG word processor in less time (once you know it ..). Oh and smaller filesize, too.

  16. Re:Chicken Little by Macthorpe · · Score: 2, Informative

    The video is of the managing director of Microsoft UK, not someone associated with the British library. Hence the caption 'Microsoft UK Managing Director Gordon Frazer running Windows 3.1 on a Vista PC'.

    Yes, that was sarcastic, but you deserved it.

    --
    "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
  17. Re:MS should not own the standard by esmrg · · Score: 3, Informative

    OpenOffice.org does have its own native format; "OpenOffice.org 1.0 Text", extension: .sxw. It was introduced with the original release, but no longer the default since the introduction of OpenDocument.
    While the GP may or may not have been exactly sure what they were referring to, it doesn't make them wrong.

  18. 1/2 pentabyte = 20 bits? by benhocking · · Score: 5, Funny

    Fine, then you get to be the schmuck who has to organize, sort, label and store about 1/2 a pentabyte of information on paper.

    A pentabyte is 5 bytes, right? How hard is it to store 20 bits on paper? ;)

    (I assume petabyte (10^15 or 2^50, depending on convention) is the word you're looking for.)

    --
    Ben Hocking
    Need a professional organizer?
  19. Bright people don't make tech decisions by Cheesey · · Score: 4, Interesting
    The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous idea is insulting.

    Unfortunately, those bright people don't get to make technical decisions.

    The British Library recently introduced SED, an electronic document delivery system. With SED, you can order electronic copies of journal papers and articles from their archives. Great idea! Previously, you had to wait for the documents to come through the post, and that would take a week or so. Now you get them by email in a couple of working days.

    Except that the documents are crippled by Adobe DRM, which imposes the following restrictions:
    • You can only view them using certain specific versions of Acrobat Reader (6 or 7) - the latest version is not recommended.
    • The software only works on Windows 2000 or XP. No Linux support, no Mac support. Vista might work, but again, it's not recommended.
    • You can only look at each document for a limited time, and you can only print it once.
    So, if you want to use the service, you'd better hope that you have (a) the right version of Windows, (b) the right version of Acrobat Reader, (c) a reliable net connection, and, most importantly, (d) a very reliable printer that won't chew up the document. Unless you're a filthy dirty pirate, of course.

    If Adobe managed to convince the British Library to put up with this ridiculous system, I am sure that Microsoft will have no difficulty convincing them about their archive "solution". If SED is anything to go by, it'll be another awful implementation of a great idea.
    --
    >north
    You're an immobile computer, remember?
    1. Re:Bright people don't make tech decisions by innocent_white_lamb · · Score: 2, Interesting

      a very reliable printer that won't chew up the document. Unless you're a filthy dirty pirate, of course.
       
      What about printing it on this?

      --
      If you're a zombie and you know it, bite your friend!
  20. Re:Doesn't matter. by bheer · · Score: 2, Insightful

    Whatever is worth keeping for a long time should be on paper and translated in more than one language.
    Er, even if you translate it into other languages, they'll evolve too. Try reading Old French much? And translation also leaves you with the headache of reconciling various translations and figuring out which is "more correct" (IIRC the Bible has this problem). It would be a much better idea to make redundant copies, to guard against bitrot and store them as physically apart as possible.

    I doubt that the now common CD/DVD/BlueRay/HD-DVD will be available in a few centuries.
    And it won't matter. The important stuff would be migrated to archival formats. For example, I keep a copy of DOS and Win3.1 ISOs (about 20MB total) and Norton Commander (3 floppy images!) on a DVDR, along with a copy of Virtual PC. This lets me recreate a Windows 3.1 virtual PC anytime I want. I wouldn't be surprised if I were copying DVDR ISOs to a holographic memory drive in the next ten years.

    As for the next century, most of this material will lose value, but the important stuff will get backed up professionally and successively remastered on new media (esp with things like the UK National Archive). And amateur historians, genealogy buffs and private collectors will have their hands full in the future with stuff that you can't find in the official archives but in people's attics, just like people are fascinated with Stone Age, Roman or Victorian artifacts today.

  21. Re:Lilttle known fact... by Cheesey · · Score: 2, Funny

    Yes, it's true. Sadly, early transcribers of the book left out the stuff they didn't understand. In addition to a number of now-forgotten sections describing the role of evolution in the creation of life, this included the following cryptic verses:

    2:2 And on the seventh day God said :wq and then make.

    2:3 And God watched gcc running and sanctified it, because it would have taken Him at least two weeks to write the whole thing in machine code.

    --
    >north
    You're an immobile computer, remember?
  22. Re:Doesn't matter. by Bazzargh · · Score: 5, Interesting

    And being a government, these files are INCREDIBLY important.

    Why haven't they been converted? Really, all their DIGITAL archives should be in a single format by now.


    No, they shouldn't. You usually want 3 formats:
    - the original format of the document. Whatever whichever idiot happened to write (or record, or video) it in, you absolutely want the original in your records.
    - a searchable format (eg OCR'd text from scanned image docs)
    - a rendered format. (eg an image or pdf, or svg - something open enough that you can continue to show how the doc would have looked). The appropriate rendered format varies. Paper is not an appropriate format for storing CCTV footage, for example ;)

    If you're very, very lucky the original is both searchable and viewable; like, say, HTML. It gets more complicated too, because you often want to store a redacted copy of the document (think of the Onion story 'CIA realise they've been using black highlighter pen all these years') and you want that searchable too, so you have to keep a redacted searchable format too... and of course, some of the records are on actual paper. Have you started worrying about the fading inks in the originals yet?

    BTW you can't restrict the format of the original. Consider an email from a corporate bidding for a govt contract, with attachments. They need to keep those.

    - Mr. E

    PS, posting anon because I have dealings with the national archives, and don't want to speak for my company.

  23. Re:Doesn't matter. by Bazzargh · · Score: 2, Insightful

    Hum now. completely failed to tick the posting anon box :) good job I held back from expressing opinions in there.

  24. Such precise terms as by Anonymous Coward · · Score: 4, Informative

    "Spacing like WP6"? "Caclculate incorrect leap year like Excel"?

    Becuase if you want to include bugs etc, then no, it doesn't support each and every 2007 feature.

    If you mean supporting tables, nested documents, embedded graphs, scripting and so on, yes.

    It may not be "click the same buttons" feature correct nor probably the "run the same VB code" compatible.

    Take a look at some of the people on the board that devised ODF. They include the US National Archives. Print media. Archivists.

    Y'know, people who KNOW DOCUMENTS.

    As to the remainder of your questions, there is a process, it does have to go through comittee (else how does everyone else know how to implement the new standard? MS doesn't have this problem since they only want themselves to know their updated standard). It is XML so it is extensible (decode the initialism). The process will take as long as it takes. Much the same as Vista will take as long as it takes to get SP 1 out.

    I don't see how these latter issues are something that is a part of ODF and not any form of standardisation that OfficeXML will have to have to go through for anyone other than MS to implement...

  25. Re:One thing I'd like to know (ODF question) by a.d.trick · · Score: 4, Informative

    Does the ODF specification support each and every Word/Excel/Powerpoint 2007 feature?
    Thank goodness no. "Auto Space like Word 95"? That's in the OOXML spec (and there's no explanation on how Word 95 does spacing either).

    If not, is it extensible?

    Yeah, it's XML. Also, unlike OOXML, ODF uses namespaces, so you can create a separate standard if you don't want to muck around with ODF.

    If it is extensible, do changes have to go through some sort of committee to be incorporated? How frequently are changes incorporated? How long is the process?

    It would depend. The thing about changing standards is that it causes problems for all sorts of people. There is a real need for a stable and standardized document format that just doesn't change, or if it does, very slightly.

  26. Re:MS should not own the standard by syousef · · Score: 4, Funny

    There is no such thing as Open Office format.

    Rubbish. I've worked at places with an Open Office format. Basically they open the office to any monkey who turns up for a job interview and a handful of people have to make up for their incompetence.

    --
    These posts express my own personal views, not those of my employer
  27. Re:Doesn't matter. by Corporate+Troll · · Score: 3, Interesting

    For example, I keep a copy of DOS and Win3.1 ISOs (about 20MB total) and Norton Commander (3 floppy images!) on a DVDR, along with a copy of Virtual PC. This lets me recreate a Windows 3.1 virtual PC anytime I want.

    Now.... You can do that now. However, in 100 years, will this be possible? You do not know what the future brings. Let's not even talk about 1000 years and beyond. Now; you backed this stuff up on a DVD and you die tomorrow. Your kids keep the data, and when they die a historian specialising on the 20th century wants to analyse the daily life of 20th century person. VMWare is long dead, you backed it up... Sure, but his platform can't run it. We're at least 10 operating system versions later, and they run on an new platform. x86 is long forgotten and they moved to quantum computers.

    Perhaps the guy is lucky, and can run an emulator in an emulator in an emulator in an emulator what you backed up. Perhaps....

    I have to this day zip files containing Wordperfect 5.1 files of the letters with a girl I was penpal with (and to whom I ultimately lost my virginity, but that is another story). Those letters, documenting life in the mid eighties to mid nineties might be interesting to a historian someday. (Historians love the daily lives of long dead people). Will they be able to read it, in 100 years? I don't know, especially in a proprietary format like Wordpefect.

  28. Re:OOXML isn't a solution to the existing problem by Vombatus · · Score: 2, Informative

    For a solution which converts documents to openly specified file formats (not OOXML), see XENA at https://sourceforge.net/projects/xena

    --
    This sig is intentionally blank
  29. Re:Doesn't matter. by ozmanjusri · · Score: 2, Interesting
    Whoever modded this "Funny" is wrong. It should be insightful.

    My copy of Office XP won't activate on any of the computers I currently own (the hardware it was originally activated on is long-dead), and that's only 5 years old.

    --
    "I've got more toys than Teruhisa Kitahara."