Slashdot Mirror


National Archive File Format Time Bomb

geordie_loz writes "The BBC is reporting that the UK National Archive is warning of old formats being a 'ticking time-bomb' where data is going to be lost because of incompatibility in newer versions of software, and software not existing at all. More surprisingly, Microsoft has offered a solution via the OOXML format."

233 comments

  1. OOXML isn't a solution to the existing problem by Anonymous Coward · · Score: 0

    Unless Microsoft is going to write converters for every existing file format in the world. In the past, I'd have the most luck with old document formats using open source products like AbiWord.

    1. Re:OOXML isn't a solution to the existing problem by Vombatus · · Score: 2, Informative

      For a solution which converts documents to openly specified file formats (not OOXML), see XENA at https://sourceforge.net/projects/xena

      --
      This sig is intentionally blank
  2. easy fix by Anonymous Coward · · Score: 1, Insightful

    Just make a torrent.

    1. Re:easy fix by jimbug · · Score: 1

      How would that help, from what I understand the problem is that they will not be able to open the files because they have an old format. Plus, that would be a 580 terabyte torrent.

      --
      Bite my shiny metal ass.
  3. Tagging beta... by Rufty · · Score: 2, Insightful

    ITSATRAP!

    --
    Red to red, black to black. Switch it on, but stand well back.
  4. MS should not own the standard by BroadbandBradley · · Score: 1

    don't give in to MS on this one, some states in the US have already and it's no better than standard word format because it's owned by a private entity. use the Open Office format if you want to be sure that you won't get the rug pulled out from under you some years down the road.

    1. Re:MS should not own the standard by dvice_null · · Score: 5, Informative

      There is no such thing as Open Office format. Perhaps you mean OpenDocument Format, which is used by several different applications ( http://en.wikipedia.org/wiki/List_of_applications_ supporting_OpenDocument ), including OpenOffice.org.

    2. Re:MS should not own the standard by esmrg · · Score: 3, Informative

      OpenOffice.org does have its own native format; "OpenOffice.org 1.0 Text", extension: .sxw. It was introduced with the original release, but no longer the default since the introduction of OpenDocument.
      While the GP may or may not have been exactly sure what they were referring to, it doesn't make them wrong.

    3. Re:MS should not own the standard by jack455 · · Score: 1
      It needs slarification because odf is xml-based.

      MS named theirs "Office Open XML". As far as the XML goes it is by its nature not eXtensible and lacks Language to explain what the application is supposed to Markup.

      From TFA:

      Mr Frazer said Microsoft had shifted its position on file formats.

      "Historically within the IT industry, the prevailing trend was for proprietary file formats. We have worked very hard to embrace open standards, specifically in the area of file formats." Embracing Open Standards, when they were/are part of Oasis (standards group that proposed odf and had it accepted by ISO) then rejecting full support of odf, spreading FUD about odf, and deliberately causing consumer confusion regarding the name belies that claim. And besides;

      She was speaking at the launch of a partnership with Microsoft to ensure the Archives could read old formats. Disengenious at best. Screw them. Nice astroturf article; congratulations to the National Archives on their well-informed partnership with such a philanthropistic company for their launch.

      I look forward to rewriting UK history in 5 years when no one "remembers" how Word-like spacing and WordPerfect whatever formating apply to extensible markup languages.

    4. Re:MS should not own the standard by aichpvee · · Score: 1

      Definitely shouldn't keep the stuff in microsoft formats, but is ODF really any better? I've not found it to be very consistently handled between apps that supposedly support it. Anyone else having better luck with it?

      --
      The Farewell Tour II
    5. Re:MS should not own the standard by syousef · · Score: 4, Funny

      There is no such thing as Open Office format.

      Rubbish. I've worked at places with an Open Office format. Basically they open the office to any monkey who turns up for a job interview and a handful of people have to make up for their incompetence.

      --
      These posts express my own personal views, not those of my employer
    6. Re:MS should not own the standard by Ravnen · · Score: 1

      The Office Open XML (OOXML) format created by Microsoft is an open standard, Standard ECMA-376. There is no need for Microsoft to support a competing standard such as ODF, unless it becomes the de facto standard, which is rather unlikely.

    7. Re:MS should not own the standard by sproot · · Score: 1

      Why don't I have any mod points (-1 troll)

  5. Idiots by suv4x4 · · Score: 4, Insightful

    The BBC is reporting that the UK National Archive is warning of old formats being a 'ticking time-bomb' where data is going to be lost because of incompatibility in newer versions of software, and software not existing at all. More surprisingly, Microsoft has offered a solution via the OOXML format.

    There are so many idiots in this state of the affairs:

    1. the idiots which decided to build huge archive with undocumented proprietary format
    2. idiots which believe they can't find even a single copy of the software they need
    3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).
    4. idiots who want to convince other idiots that OOXML is an open format (versus straight XML serialization of the whatever binary DOC was in the source code base at the time in MS)

    1. Re:Idiots by xtracto · · Score: 4, Interesting

      2. idiots which believe they can't find even a single copy of the software they need

      Please give me a link to a copy of the Professional Write 3 (PW) software app. for MSDOS 6.

      Yep, I had that very problem some years ago when I was cleaning my room and found several 5 1/4 disquettes which contained the .pw extension. No way to find the program.

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    2. Re:Idiots by CastrTroy · · Score: 3, Insightful

      I believe points 2 and 3 can be lumped into 1 format. It's like creating backup tapes, and then throwing out the tape reader. Who thinks these systems up?

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    3. Re:Idiots by Rude+Turnip · · Score: 1

      That's funny, because I found a 3.5" disk (or several, I forget) with Professional Write for DOS while cleaning out my old room at my mom's house...which I promptly threw in the trash.

    4. Re:Idiots by paleo2002 · · Score: 1

      Certainly an open, well-documented format is useful for data storage. But, is it really going to be impossible to read a .pdf 100 years from now? Individuals and museums collect old hardware and software. Corporations keep archives of their own products. Programs exist that covert old formats to new. Digital archeology may become the hot growth industry of the 22nd century.

    5. Re:Idiots by tolan-b · · Score: 4, Insightful

      It's not an archive of files in a single format, it's an archive of files in general, many formats, depending on which format the file was originally in.

      The system wasn't thought up any more than a library thinks up all the books it contains.

    6. Re:Idiots by Kjella · · Score: 2, Interesting

      1. the idiots which decided to build huge archive with undocumented proprietary format

      Which seems reasonable at a time when "everyone" has a computer that'll read it, for example when it comes to image viewers there's software covering literally hundreds of formats without issue.

      2. idiots which believe they can't find even a single copy of the software they need

      It's supposed to be an archive, not a "well we'll have to dig up a copy of the software, I'll get back to you in some months.

      3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).

      Presumably a license issue, AFAIK things like national archives typically only required you to file a copy, not to file the software itself. Probably the laws need to be changed to include a copy of the reader software.

      4. idiots who want to convince other idiots that OOXML is an open format (versus straight XML serialization of the whatever binary DOC was in the source code base at the time in MS)

      If they can make a meaningful interpretation in OOXML it's an improvement, if it's a BLOB in XML or a BLOB in OOXML it doesn't really matter, then you still need the software.

      If Microsoft has taken the job of taking some binary BLOB and make it into something human-readable, OOXML or not, then I say you'd have an easier time converting OOXML to something readable in OpenOffice than not.

      Besides, this might actually be a semi-legitimate use for all the tags in OOXML which says to emulate an old version, because that's what you're doing. Particularly with stupid formatting like inserting line breaks to make pages break where you want them to, it might actually be preferable compared to manually going over them and replacing them with proper format tags.

      --
      Live today, because you never know what tomorrow brings
    7. Re:Idiots by Anonymous Coward · · Score: 0

      That's right. Everybody's an idiot apart from you. If only you were in charge of everything, then the world would be OK.

      We love you,

      The World

    8. Re:Idiots by timeOday · · Score: 2, Insightful

      3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).
      That's easier said than done. You'd have to keep multiple copies of everything, including hardware, up to the point where you're confident you have a stable standard - probably the power mains - and that's if you're not worried about violating licenses. Of course, with the advent of online apps, there is no way to snapshot the entire ecosystem of servers and software that actually makes an application run (especially since you never had direct access to it in the first place). The only reasonable solution is to pick some standard, such as jpg and ODF, and consolidate on that.
    9. Re:Idiots by tolan-b · · Score: 1, Informative

      1. They didn't. They receive things to archive (in the old sense of the word). They didn't put anything in any format, they receive things in a format and put it in a box.
      2. We're not talking about 1 piece of software. Potentially there are hundreds.
      3. See 2.
      4. I agree, but also RTFA. MS aren't actually getting OOXML in here, they're helping the archive by providing virtualised installs of older OSs (probably all MS OSs) to run the old software needed to access the old data.

      Maybe I can add a couple.

      5. Idiots who proclaim how everyone is an idiot without even reading the article.
      6. Idiots who moderate without even reading the article.

    10. Re:Idiots by mjensen · · Score: 2, Interesting

      Don't have a link, but have Professional Write on CD of "Work software not used anymore"

      Along with Professional File (database product)....

    11. Re:Idiots by Adult+film+producer · · Score: 1

      Programs exist that covert old formats to new. Digital archeology may become the hot growth industry of the 22nd century.

      In a hundreds years microsoft will no longer exist so what do the historians do when they uncover a stack of magnetic tapes/dvds that contains documents in .doc format? (.doc or whatever other format you can think of.) They may have the hardware to transfer the data to newer computers and storage but the secrets to translating .DOC were lost years ago when microsoft went bankrupt..

    12. Re:Idiots by Anonymous Coward · · Score: 0

      The people who expect to be employed in the future to design a new reader, obviously. People aren't quite the idiots that they are taken for.

    13. Re:Idiots by graphicsguy · · Score: 1

      Please give me a link to a copy of the Professional Write 3 (PW) software app. for MSDOS 6. Yep, I had that very problem some years ago when I was cleaning my room and found several 5 1/4 disquettes which contained the .pw extension. No way to find the program.

      Try this.

    14. Re:Idiots by RexRhino · · Score: 1

      In a hundreds years microsoft will no longer exist so what do the historians do when they uncover a stack of magnetic tapes/dvds that contains documents in .doc format? (.doc or whatever other format you can think of.) They may have the hardware to transfer the data to newer computers and storage but the secrets to translating .DOC were lost years ago when microsoft went bankrupt.. It isn't that complicated. If historians can piece together egyptian hieroglyphics from the Rossetta Stone, then they certainly can extract some plain text from a .doc file.

      And if they can't figure out a .doc file, they probably won't be able to figure out opendocument any better, because it is just as silly to believe that opendocument will be any more common 1000 years from now than microsoft word documents.
    15. Re:Idiots by bladesjester · · Score: 1

      You beat me to it. :P

      While it may be hard to find the program itself anymore, you can usually find something that can read the files.

      --
      Everything I need to know I learned by killing smart people and eating their brains.
    16. Re:Idiots by Anonymous Coward · · Score: 0

      Here ya go! My finders fee is only $500. Please post your bank account numbers so I may withdraw my fee. Thanks!

    17. Re:Idiots by suv4x4 · · Score: 1

      Yep, I had that very problem some years ago when I was cleaning my room and found several 5 1/4 disquettes which contained the .pw extension. No way to find the program.

      You're suggesting the National Archives have the resources and intelligence (as in, research and know-how) of a single guy who found several 5 1/4 disks while cleaning his room.

      Well, thanks. I laughed a lot.

    18. Re:Idiots by mikael · · Score: 1

      The National Archive is required to maintain a copy of every newspaper, magazine or journal published in the UK. In many cases, some magazines came with floppy disks, and CD-ROM's containing programs, data and applications submitted by users.
      This is the case, especially with computer magazines. Sensible publishers will have used self unpacking executables and/or the ZIP format.

      Finding a device to read floppy disks and CD-ROM's is straightforward enough. But trying to find the relevant application which runs on current hardware may be a problem.

      Here's a typical story
      Looking for Fastback Plus

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    19. Re:Idiots by Reece400 · · Score: 1

      Well duh, they'd keep a copy of the sourcecode stored in open document format as well!

    20. Re:Idiots by uhlume · · Score: 1

      5. Idiots who apparently have never read the OOXML DTD, which, as I recall, includes certain type definitions for backward compatability with the binary DOC format, but explicitly deprecates them?

      --
      SIERRA TANGO FOXTROT UNIFORM
    21. Re:Idiots by Warbothong · · Score: 1

      National Archives: Excuse me Microsoft, but all of these documents we have won't render properly in the latest version of your software Microsoft: Don't worry, we can sell you copies of each of the versions of our Office software used to make them National Archives: Oh, they'll all run in Vista? That's OK then Microsoft: Actually, they won't. But we can sell you copies of each of the versions of our Windows software that will run them National Archives: How will we run all of these different versions of Windows? Microsoft: Well we can sell you copies of our virtualisation software to run the different versions of our Windows software to run the different versions of our Office software to open the different versions of our Office files National Archives: This seems a little over the top. Now that we have realised how serious this problem is we should take action and use an open format! Microsoft: Well we can sell you another copy of our Office software which uses a format we call "Office Open XML" National Archives: And this format is open is it? We will know exactly how to render these documents several decades in the future? Microsoft: Well, it has "Open" in the name we gave it, so it must be open. And here are the instructions you need to understand the files. *forklift truck enters room carrying OOXML spec* National Archives: That is really long, we can't be bothered to read all of it. If you say it is open then we believe you 3 Years Later National Archives: Excuse me? Microsoft? Microsoft: Yes? National Archives: Erm, this new version of your Office software won't display these Open XML files properly Microsoft: Well that format is obsolete. We have a new format now National Archives: Why have a new format? Microsoft: Well, Open XML was just a dump of Office 2007's data structures, and the specification didn't allow any expansion or generalisation of the format so now we have Office 2010 those structures are different and we need a new format to put them in. This new format is open by the way. You can tell because we've called it "Open 2.0", and the 2.0 means better than. National Archives: Well how can we use this new format? Microsoft: Just use Office 2010 National Archives: But it won't run on our Vista machines! Microsoft: That is because it uses DirectX 12 features that are only available on Windows Vienna so it can have these new 3D window effects we just invented called Window WobbleTM and Cube SpaceTM National Archives: But our computers won't run Vienna! Microsoft: That's OK, just buy new computers National Archives: So what do we do about these Open XML documents? Microsoft: You'll need to have a virtual machine to run Office 2007 in to open them National Archives: OK, we'll do that. Wow Microsoft, your advice always helps us through! Microsoft: You do realise that you'll need to pay for new Vista licenses don't you? The versions you are using aren't allowed to be virtualised National Archives: Oh, well that seems reasonable 20 Years Later Gordon Brown: OK, the rioting seems to be under control now that our CCTV-mounted lasers have dealt with those unemployed protestors. How the hell did the economy get into this state?! I'm looking at your and your computer budgets National Archives National Archives: How dare you accuse us of spending all of the country's money without any evidence! Gordon Brown: OK, I want to see every invoice, every receipt, everything documenting your spending! National Archives: Well, the thing is, they are all "Open Open Open Open Openy Open Honestly-This-Time-It-Is-Open XML" files which won't load on the Windows WeOwnTheWorldNowHaHaHaHa computers we've just deployed

    22. Re:Idiots by iago-vL · · Score: 1

      There's one important distinction: OpenDocument is, well, open. If they have access to our documents in 100 years, they will likely have access to the specifications to the OpenDocument format (hopefully in plaintext and not in .odf..), and will be able to re-implement it to open the files.

    23. Re:Idiots by tjwhaynes · · Score: 1

      If Microsoft has taken the job of taking some binary BLOB and make it into something human-readable, OOXML or not, then I say you'd have an easier time converting OOXML to something readable in OpenOffice than not.

      If it's a text document, then you might be able to parse the OOXML regardless and understand most of the formatting. If it's a spreadsheet, then many of the parts of the OOXML spec are ALSO binary LOBs and you are no better off. If it's something that OOXML doesn't support, you are out of luck. At least ODF provides ways to package other formats along with itself in a transparent fashion, is completely documented and supported by multiple vendors.

      Cheers,
      Toby Haynes

      --
      Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
    24. Re:Idiots by MightyMartian · · Score: 1

      We have a format that has been around nearly half a century. It is universal (or nearly so), has countless applications that can open it. It can be used to store, via extensions to character sets, data in many international formats. It's not the most efficient method, but can be used to store documents, spreadsheets and databases.

      It's ASCII, of course. There's nothing wrong with using Word, VisiCalc, Wordstar or whatever, but just save a bloody text version of your document, and its guaranteed that something will always be able to open the file.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    25. Re:Idiots by nneonneo · · Score: 3, Interesting

      Step back, though, and think for a minute about the "house of cards" upon which that Word document rests.

      It rests on
      1) Physical storage medium -- whether this is Flash, Hard Drive, Optical Medium, [NV]RAM, etc., all these technologies may be very difficult to retrieve data from, especially if the level of technology happens to go down in the future (say, global thermonuclear war). Even if data is retrieved, there's no guarantee that it's intact after 1000 years (the dyes in CDs will have decomposed by that time; the Flash drives will have leaked all charge; the hard drives will have randomly demagnetized over that period of time, etc.).
      2) 8 bits per byte, 32 bits for most integers in the file, IEEE specification for floating point numbers, ASCII, Unicode, GB2312, etc. -- encodings for our numbers, letters and even bit-packing of binary data will affect retrievability. It would be difficult, I imagine, for some futuristic person to wander along (with possibly a different language) and attempt to interpret all of that.
      3) File format -- the MSOffice OLE2 format is incredibly complex, perhaps overly so. An OLE2 file takes the form of a miniature filesystem, with a Fragment Allocation Table, 512-4096 (variable) sized blocks, a master Double-Indirection FAT, sub-block allocation, etc. Fragments within the OLE2 container are assembled into "files" or streams in this file system and then parsed. Sure, it makes for wickedly fast saving times (since you can write to changed portions only and add fragments as needed, like a real file system), but it also makes it damn hard to parse, compared to plain text formats like XML or RTF.

      There are many more layers to this system, but that's a basic overview of what a researcher (or someone else) 1000 years from now will have to contend with.

      Sure, if you're looking at just extracting text, you can skip the last layer by simply going in the file and pulling as much out as possible, much like I used to do with corrupted Word documents. However, if you're looking to retrieve images, videos, archival audio, etc. then your job is much harder.

    26. Re:Idiots by aichpvee · · Score: 1

      Just an FYI, governments don't have to worry about licensing. Especially in situations like this. They have the power of eminent domain. If they needed a license for the software and were in some way not able to pay for a boxed copy and store it in a vault or whatever, they could (and should) just take it.

      --
      The Farewell Tour II
    27. Re:Idiots by Lord_Sintra · · Score: 1

      With almost any kind of text document (not spreadsheets and the like), you can open it up in notepad(++) and access it fine, you just lose most of the formatting.

    28. Re:Idiots by Jay+L · · Score: 1

      There are so many idiots in this state of the affairs:

      You forgot the idiots who don't remember/were never aware of a time when there was no common platform, OS, hardware, or even, for that matter, alphabet encoding, and when nearly all files were saved as a dump of the in-memory structure for efficiency.

      At my old print shop, we used to save all the cool/funny business cards that came through the door. And one of the nicest Helvetica versions I've ever seen ("Helios").

      Someone probably still has all that. On 8" floppies for a CompuGraphic typesetting machine. Who's the idiot who didn't save the CompuGraphic machine? In 100 years, will anyone have software to read the files? How about sixbit, or Fieldata? How about something much more obscure which surely exists?

      (Side note: When my school district sold off its PDP-10s in the late 1980s for scrap, I tried to convince my folks to let me bring one home. Sadly, I failed.)

    29. Re:Idiots by X0563511 · · Score: 1

      People decrypt documents created with One Time Pads using "cribs" - assuming that a particular word is in the cyphertext somewhere.

      If people can break documents explicitly coded to prevent reading, they can certainly reverse engineer a file format.

      In a few hundred years, as you put it, I'm sure an AI will be able to do so extremely quickly.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    30. Re:Idiots by kabz · · Score: 1

      Please send me a copy of Access 2.0 in which format I have quite a lot of IP stored, none of which is easily convertible to anything else.

      And if you suggest using Access 2007, just forget it, I already tried, and the documenter add-on is the biggest POS ever. This is using a legally installed copy of Office 2007 on XP Pro on my wife's desktop.

      I am sticking with .doc format generated by OpenOffice. Ionically I run OOo on Windows, but Office 97 on Linux. Hehe.

      --
      -- "It's not stalking if you're married!" My Wife.
    31. Re:Idiots by Anonymous Coward · · Score: 0

      You are forgetting files encrypted under DRM (Vista?) which would add AES decryption to the recreation task.

    32. Re:Idiots by Anonymous Coward · · Score: 0

      Apparently, you are experiencing some incompatibility with end of line markers.

      Seriously, I am now building an NT 4.0 VM that should be relatively resistant to MS tampering, and will install WP, Office 97 etc., on it precisely to allow myself to access old documents.

      I already posted about not being able to err access old Access 2.0 IP that I have. I do not want this to happen again. I already keep a non-iTunes backup of all my music on a Linux box. We all need to be vigilant about losing data in old formats.

      We should *already* be vigilant. MS Visio is largely a management only app at my office, so lots of documents are closed to me unless exported to PDF.

    33. Re:Idiots by Chandon+Seldon · · Score: 1

      And if they can't figure out a .doc file, they probably won't be able to figure out opendocument any better, because it is just as silly to believe that opendocument will be any more common 1000 years from now than microsoft word documents.

      The question isn't if the format will still be in use, the question is this: How hard will it be to write a converter to whatever the standard is in the future? With ODF, you need to uncompress a zip archive and parse some reasonably simple XML. With OOXML, you need to deal with all sorts of crap - and even if you have the standard, you won't necessarily be able to implement parts of it.

      --
      -- The act of censorship is always worse than whatever is being censored. Always.
    34. Re:Idiots by jelle · · Score: 1

      Until an enterprising young student finds the magical command 'apt-get install antiword' on an archive of this obscure thing called 'a website' with the name slashdot.org, and upon trying discovers the command still works...

      --
      --- Hindsight is 20/20, but walking backwards is not the answer.
    35. Re:Idiots by Anonymous Coward · · Score: 0

      "In a hundreds years microsoft will no longer exist"

      Well, somebody could have said the same on 1907 about that 19-year old company, haemmm... "International Bussiness Machines" but look: there it is IBM!

      Really, why do you think there won't be a Microsoft in, say, 300 years? I bet there will be a Microsoft, and an IBM and a Coca-cola, and most of the real big companies of today (sorrily I won't be there to take my money) just like there were a Pope 300 years ago and there's still a Pope today and for almost the very same reasons: power decoupled of genetical inheritance? it's a no brainer bet!

      "were lost years ago when microsoft went bankrupt.."

      Well, I think that by the year 1700 enemies of María Teresa Álvarez de Toledo would hope the same: by the XXI century Alba's family will be lost, ruined and forgotten (and is much, much, really much much easier to destroy a family than a big corporation -and ruining a big corporation will only will be harder in the future).

    36. Re:Idiots by AnotherDaveB · · Score: 1

      4. idiots who want to convince other idiots that OOXML is an open format (versus straight XML serialization of the whatever binary DOC was in the source code base at the time in MS)
      "The agreement between the National Archives and Microsoft centres on the use of virtualisation." The
    37. Re:Idiots by timeOday · · Score: 2, Insightful

      Just an FYI, governments don't have to worry about licensing. Especially in situations like this. They have the power of eminent domain.
      Think about Valve's Steam software for protecting video games (or any software that requires network activation). Just because you're willing to bypass it doesn't mean you can.
    38. Re:Idiots by aichpvee · · Score: 1

      Do you honestly think Steam can't be cracked? In fact it probably has been, given Valve's inability to program anything securely. If I gave a fuck about Valve or windows gaming I'd probably know, but I don't. Pretty sure any of these documents that are being archived don't rely on phone home activation to be read though.

      --
      The Farewell Tour II
    39. Re:Idiots by ozmanjusri · · Score: 1

      Costs money, but if you need it...
      http://www.file-convert.com/onl_pric.htm

      --
      "I've got more toys than Teruhisa Kitahara."
    40. Re:Idiots by CrossChris · · Score: 1

      There are so many idiots in this state of the affairs

      This is entirely typical of British IT projects. Blair was bribed (with a house) by Bill Gates so no alternative to MS is ever considered. The people responsible for commissioning IT projects are fired if they even suggest anything other than proprietary, MS-based "solutions" to IT problems.

      No British "Government" IT project has ever worked!

      Windows Vista: the end of Microsoft.

    41. Re:Idiots by Kiffer · · Score: 1

      http://www.file-convert.com/free_trial.htm

      You could get the software for free... on an unlimited free trial... which supports all the same formats as the full version but adds in spelling errors...
      If it adds in different spelling errors each time then you could convert the file several times and compare the results to create an accurate file but would you trust it to be correct?

      Or you could realise that if you have an archive of thousands of documents that you really want to be able to read you should be willing to pay for software. Yes, I know the idea of paying for things is abhorrent to many people, I, myself, hate paying for anything on the internet but if I had all my old love letters on a 3.5 floppy I'd pay the $10 dollars to read them.

      A large archive should be ok with paying a few grand for this sort of software...
      http://www.file-convert.com/fmn_pric.htm

    42. Re:Idiots by Anonymous Coward · · Score: 0

      If only they'd done an "Ask Slashdot", heh?

    43. Re:Idiots by mattpalmer1086 · · Score: 1

      Ummm... no, you're thinking of the British Library. Very roughly, libraries deal with published material. Archives deal with unique material. The National Archives contains the unique records of the UK central government and law courts. Very little private material, although there are occasional "gift" deposits.

    44. Re:Idiots by jZnat · · Score: 1

      I don't mind paying for (and thus supporting financially) Free software, but paying for proprietary, non-free software is just against my typical morals...

      --
      'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
    45. Re:Idiots by fritsd · · Score: 1
      Thank you. I'll help you pray that someone, hopefully even in a position of influence at the National Archives, actually ever reads this.

      You exaggerated on the OOXML specification size though; surely you don't need a forklift to carry a 5219 page document? Or just download the 34 Mb thing (part 4 that is).

      --
      To be, or not to be: isn't that quite logical, Slashdot Beta?
    46. Re:Idiots by Anonymous Coward · · Score: 0
  6. Use SGML by Morgaine · · Score: 5, Funny

    It predates Moses, and is quite likely to survive the heat death of the universe.

    --
    "The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
    1. Re:Use SGML by Anonymous Coward · · Score: 0

      No, they should just store all text in 7-bit ASCII. Anything can read that!

    2. Re:Use SGML by T.E.D. · · Score: 1

      It predates Moses, and is quite likely...


      Little known fact: God experimented with MS Office for the Ten Commandments, (found in Exodus 20:2-17). Unfortunately, he left revision tracking on. Some clever Jewish hacker figured out the edits and posted the original version in Deuteronomy 5:6-21. Now we are stuck with two different versions.
  7. The big lie... by advocate_one · · Score: 5, Informative
    they keep repeating this everytwhere they go... "Open XML"... their format is not Open... it's closed off with licensing and other restrictions... all the really good stuff in the specification has been obfuscated out and hidden behind indirections to the behaviour of legacy apps that only microsoft know the real ins and outs of... not only that, there's still an easy means for them to merely use XML as a wrapper for binary blobs...

    to give it a proper name, the format is "Microsoft Open Office XML", they deliberately went to a lot of trouble to pick a name that's as easily to confuse as possible with OpenOffice

    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    1. Re:The big lie... by Anonymous Coward · · Score: 0

      And other people keep repeating the lie that it isn't open right behind them. The licensing is permissive and allows any program written by anybody to read and write the formats without prior consent, fees, royalties or considerations. And those tags to which you refer exist for the specific and sole purpose of converting documents that are written in specific versions of applications such as WordPerfect 6.

      Or you can just use OpenOffice which will butcher the format anyway, then you don't need to care about those old and obsoleted features.

      Fucking spedtards, I swear.

    2. Re:The big lie... by Quarters · · Score: 2, Informative

      Their name choice has certainly worked on you. It's not "Microsoft Open Office XML" like you said. It's "Microsoft Office Open XML".

    3. Re:The big lie... by throup · · Score: 1

      to give it a proper name, the format is "Microsoft Open Office XML" I believe it is "Office Open XML", although the potential to confuse it with OpenOffice.org is undoubtedly there.
    4. Re:The big lie... by davester666 · · Score: 3, Funny

      they keep repeating this everytwhere they go... "Open XML"... their format is not Open... it's closed off with licensing and other restrictions... all the really good stuff in the specification has been obfuscated out and hidden behind indirections to the behaviour of legacy apps that only microsoft know the real ins and outs of... not only that, there's still an easy means for them to merely use XML as a wrapper for binary blobs...
      You don't understand the format then. Office Open XML is the ultimate in 'open' because they define it so you can embed any other document, in whatever format, within it. Obviously, the UK National Archive must use this 'master' format for storing all their existing documents in.
      --
      Sleep your way to a whiter smile...date a dentist!
    5. Re:The big lie... by hilton_a · · Score: 1

      Second that.

  8. upgrades free money for MS by wizardforce · · Score: 1

    MS benefits a lot from upgrades, that way you end up "needing" to pay for an upgrade down the road regardless of whether you bought a new computer or not. they stand to lose everything if open source is seen to be nearly or just as good by people at large/the government so they do just what they are required but not enough to weaken future cash streams from upgrades in the future.

    --
    Sigs are too short to say anything truly profound so read the above post instead.
  9. Open Formats by MrSteveSD · · Score: 1

    This is why we should be using open formats, particularly for things that are really complex like video codecs.

  10. Linus is right by Anonymous Coward · · Score: 1, Funny

    I am with Linus on this one.

    1. Re:Linus is right by Anonymous Coward · · Score: 0

      No, RMS is right. Surely, you can't deny he made a better point than Linus on this one?

      Look at the facts man! .. oh yeah, and pass the bong.

  11. Obviously... by Colin+Smith · · Score: 5, Funny

    If you have a problem with proprietary formats you go to Microsoft to solve it for you... The word "DOH" springs to mind.

    Oh yeah, their solution? Virtualised Windows 3.1. And obviously in 15 years you'll have to virtualise Vista in order to run the Win3.1 virtual machine to run Word. And Microsoft will be paid a license for each application and level of virtualisation.

    You couldn't make this stuff up.

    --
    Deleted
    1. Re:Obviously... by joe+155 · · Score: 1

      I wish you weren't modded "funny", what you say is both true and insightful but its not funny... it's very, very sad

      --
      *''I can't believe it's not a hyperlink.''
    2. Re:Obviously... by Macthorpe · · Score: 1

      It's not marked 'insightful' because everyone except you got laughed. Whether it was intended to be funny is another matter.

      Microsoft are providing virtualisation so they can run old software in order to convert it into newer formats, not to have a load of nested virtualised operating systems like Russian dolls that have to be paid for in perpetuity.

      The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous idea is insulting.

      --
      "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
    3. Re:Obviously... by CaptnMArk · · Score: 1

      Conversion doesn't work since there will likely be data loss in each step.. Just think .doc -> ...

      You really need open / published formats for archiving.

    4. Re:Obviously... by Macthorpe · · Score: 1

      What are you converting .doc to which causes data loss? I've never experienced any.

      --
      "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
    5. Re:Obviously... by Colin+Smith · · Score: 1

      Microsoft are providing virtualisation so they can run old software in order to convert it into newer formats That doesn't make any sense. Microsoft already know the file format, just write a bit of software which will read in the old and write out the new format. We're talking terabytes of information here. it isn't as if you can just open each file manually and choose "export as".

      The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous idea is insulting. You seem to have a remarkable faith in institutions. They're planning to use OOXML. I think insulting them is entirely fair.

      Let me quote:

      Adam Farquhar, head of e-architecture at the British Library, praised Microsoft for its adoption of more open standards. Stop and consider that quote for a moment. Given your vast experience of the history of IT, does something not quite ring true. The pertinent qualification with "more" is highly significant.

      He said: "Microsoft has taken tremendous strides forward in addressing this problem. There has been a sea change in attitude." Yes... Except for all the proprietary bits in OOXML.

      --
      Deleted
    6. Re:Obviously... by hilton_a · · Score: 1

      "The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous idea is insulting." Either that or you're incredibly stupid.

    7. Re:Obviously... by Macthorpe · · Score: 1

      To paraphrase Bender: That was very mean... Sorry, I meant that other thing - pathetic.

      --
      "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
    8. Re:Obviously... by Bert64 · · Score: 1

      Even more ironic, is that microsoft are pretending to offer a solution to the problem they themselves created.
      The national archive really shouldn't let themselves fall for this, or they will just find themselves in the same situation again in a few years time.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    9. Re:Obviously... by Anonymous Coward · · Score: 0

      Correct. The media reporting is way off base here. It's not a delivery mechanism for readers at the archives. There is a web site for electronic records, if you care to have a look (www.nationalarchives.gov.uk/ero).

      No lock-in to commercial or platform-dependent formats is used for delivery wherever possible. It will never be acceptable for the archives to only make public information available to those who can afford to buy proprietary software. De facto proprietary standards are sometimes employed, as long as the reader software is freely available and no platform lock-in results.

      The virtual machines are nice to have for some purposes, but don't represent the preservation strategy of the National Archives. This is actually a format-migration strategy, so all the documents, whether WordPerfect, Word 95, whatever, are usually migrated into some archival format (e.g. PDF/A), retaining the originals of course.

      Having a virtual machine that runs old versions of common software can be useful. For example, when migrations fail or produce odd results, it becomes possible to check what the document looked like in its original context.

      Posting anonymously as I have links with the National Archives.

    10. Re:Obviously... by Pofy · · Score: 1

      According to Microsoft, any other format than their own new OOXML. The whole reason they created it is, according to themselves, that no other format could handle and be coverted to from their own old formats.

  12. More surprisingly!? No, UNsurprisingly by erroneus · · Score: 3, Interesting

    No. The obvious solution for the predicted problem of data being unavailable due to being in unsupported proprietary formats is to move it to a widely supported non-proprietary format.

    As "well intentioned" as Microsoft may be, Microsoft's Open XML cannot be anything but proprietary when its code references Windows and Office API functions rather than more precise data format information as with ODF. (For more information about this, you might search out the arguments against making OOXML an ISO standard.)

  13. PDF's by Anonymous Coward · · Score: 0

    As sad as it is, I've run into some PDF's that don't open with newer PDF viewers. Sort of defeated the purpose, I thought.

  14. Trust us, we did it by Anonymous Coward · · Score: 0

    Ah, brilliant. It's partly Microsoft's fault that this problem exists, so their suggestion is to trust them to offer a solution that won't cause this problem later down the line. Especially smart, given this sort of longevity and interoperability would annihilate their business model. Yes, good, excellent. Explicit trust ahoy!

  15. Chicken Little by Nybble's+Byte · · Score: 1, Informative

    The guy sounds like a Microsoft salesman, not someone who should be in that position of responsibility. Look at all the MS software boxes behind the computer. A puppet.

    Carry on.

    1. Re:Chicken Little by Macthorpe · · Score: 2, Informative

      The video is of the managing director of Microsoft UK, not someone associated with the British library. Hence the caption 'Microsoft UK Managing Director Gordon Frazer running Windows 3.1 on a Vista PC'.

      Yes, that was sarcastic, but you deserved it.

      --
      "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
    2. Re:Chicken Little by Nybble's+Byte · · Score: 0

      Yes, that was sarcastic, but you deserved it. Sarcasm noted, but there's no need for the latter part of your sentence which is just rude.
    3. Re:Chicken Little by Macthorpe · · Score: 1

      There was nothing personal about it, just good-natured ribbing, though I appreciate that doesn't carry over the internet very well.

      --
      "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
  16. Dumb 'solution' by hcdejong · · Score: 1, Insightful

    I can't believe the National Archives partnered with the company that caused this mess in the first place, ie Microsoft.

    Second, why on earth do they think virtualisation is a long-term solution? Sure, you can emulate Windows 95 within Windows XP today, but what happens in another ten years? Another layer gets wrapped around XP? So in 100 years, you're relying on a stack of emulators to access the old software. You better hope Moore's law holds up, because you're going to need it. Also, who will know how Word 95 worked in 10 years, let alone 100?

    IMO translation of the old documents would be a better solution. Translate the documents into a well-documented, open format, and throw away all of the old formatting idiosyncrasities while you're at it. That way, you only have to maintain one way to access the documents with the software-du-jour, instead of having to prop up the entire teetering stack of virtualisation layers.

    1. Re:Dumb 'solution' by Professor_UNIX · · Score: 1, Insightful

      That's a silly argument. You just have to emulate Windows95 on whatever platform you're using 100 years from now, not all the intermediate platforms. For example, high speed computers today can still play old arcade games from 30+ years ago through emulation, but we're not doing it by emulating a Pentium that is emulating an Amiga that is emulating a Commodore 64 that is emulating an arcade machine, we're just emulating the arcade machine. It's not a good solution to file format issues, but virtualization doesn't have to be prohibitively CPU intensive unless you're trying to emulate the latest current-generation alternative architectures like a PowerPC G5 running MacOS X and Virtual PC trying to emulate a Pentium 4 system running Windows XP.

    2. Re:Dumb 'solution' by hcdejong · · Score: 1

      Good point. My idea was to avoid having to build an emulator for a 100 year-old platform [1]. By stacking them, the emulator you're writing only needs to understand software that was written 10 years ago.

      1: I figured that to do this, you need to know how the 100-year-old system works. That's no problem now, since there's still enough of the old hardware (and its documentation) lying around. But some day, those will have turned to dust. Your archive better contain complete information on all the old data formats (in a robust format) and software if you plan on using the old data in 100 years.

    3. Re:Dumb 'solution' by Blo · · Score: 1

      Correct. And given the timespan these organisations plan for (well, at least the National Archives here in the Netherlands which I work for does), virtualisation is not an option - sooner or later the platform you're using becomes obsolete, and so any virtualisation software you have stops working. Remember that virtualisation can only reproduce the platform you're running on. That is why emulation is a possible solution. And that's what we're working on at the moment (shameless plug: http://dioscuri.sourceforge.net./ Currently we're using Java and the JVM as an intermediate virtual machine, but the idea is to in the future create an independent virtual layer (see "Key Features", http://dioscuri.sourceforge.net/dioscuri.html), which is the only thing that needs to be adjusted; the emulator should run forever!

    4. Re:Dumb 'solution' by Kazoo+the+Clown · · Score: 1

      So how does a parent OS do a text search of a document that's only readable by a program that's buried inside a VM?

  17. Doesn't open source solve this by wile_e_wonka · · Score: 4, Informative

    It seems to me that this is really a nonproblem--OOo is compatible with lots of "dead" formats (or, can read them at least), as well as many other open source office programs. I can't imagine they're going to begin throwing away this compatability--it isn't like it takes extra coding (as far as I know). Also, I have found Microsoft Word's "Extract text from any file" to work pretty well (I had a roommate with a corrupted Mac-formatted disk that had his deceased grandmother's journal on it in some old Mac Word file (a format still readable in Word, but the disk was corrupted so I couldn't just open the file). I popped it in my parents' now deceased iMac and the only program I found that opened it was Word, using the "Extract text from any file" function. I emailed him the journal and he thanked me profusely).

    Also--as noted, the OOXML format is a nonsolution for this nonproblem. It seems like it would be a waste of effort--why convert a bunch of files to a format that may die just as quickly as any other format, when you can just leave the file as is and open it in OOo (assuming I'm correct that they won't stop read support for dead formats)?

    Also, it seems to me that no current format or any future format will ever solve this nonproblem because formats will always change as new functionality is continually added. The better solution is to keep this a nonproblem by having open source software that can read old file formats.

    1. Re:Doesn't open source solve this by stsp · · Score: 1

      It seems to me that this is really a nonproblem--OOo is compatible with lots of "dead" formats (or, can read them at least), as well as many other open source office programs. I can't imagine they're going to begin throwing away this compatability--it isn't like it takes extra coding (as far as I know).
      Well there is always maintenance work involved. Things change all the time, so does software. It could well be that in 20 years OOo won't support MS formats anymore for whatever reason, unless people actively work on keeping it alive in the code base. Sometimes things just stop working. For example, Linux 2.6.17 boots fine on my BP6 motherboad in SMP mode. Linux 2.6.18 only boots with one CPU, else it locks up (I still have to debug this properly). This is a classic dual Celeron motherboard, you would not expect support for it to be dropped, and I don't think someone broke it on purpose. But it happens :(
    2. Re:Doesn't open source solve this by MightyMartian · · Score: 1

      I think the whole problem is somewhat overstated. I've had to work with some godawful formats (mainly mainframe-style), and while it's a real pain in the ass, I don't imagine for a skilled programmer that finding some file in a format lost to antiquity is going to be impossible to break. It might take some effort. I guess the issue is the time and money spent cracking a file, but still, short of a media failure, none of this going to be truly lost.

      What needs to be done is for all these archival agencies to insist that documents sent to them be in an open, fully documented format. You can't do much about old formats still in place, but there are enough truly open formats (and anyone who works for an archive who insists on using Microsoft's "open" formats should immediately be canned) that you can prevent the problem from happening in the future. With openly published specifications, programmers in the future will be able to easily break open such files.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    3. Re:Doesn't open source solve this by InakaBoyJoe · · Score: 1

      the disk was corrupted so I couldn't just open the file). I popped it in my parents' now deceased iMac and the only program I found that opened it was Word,

      Huh? iMacs don't have internal disk drives! Microsoft shill!

    4. Re:Doesn't open source solve this by DrXym · · Score: 1
      Your friend's grandmother's journal is one thing. A 150 report containing multiple columns, break out boxes, indexes, cross references, images, embedded fields, revisions, possibly multiple languages, embedded objects etc. is quite another.

      Why should the burden be on OpenOffice or any other app to second guess the proprietary format of something like MS Word? At best you might get a passable representation of the original. At worst you get a complete mess which could screw up the rendering and possibly miss out whole chunks that it can't handle. And as time goes on, the burden of supporting old and possibly rare document formats becomes onerous and virtually impossible for the project to bear.

      I too have been saved by OpenOffice which managed to read an old Mac Word file which even MS Word on the PC would not read. It almost got the rendering right but splattered question marks all over the place instead of some missing character. So the first the thing I did was repair it and save a copy in .odt format. I'm glad I could import but simply do not trust OpenOffice to support the format in perpetuity or for the format to receive sufficient testing to ensure it continues to work for ever and ever. So perhaps the same could happen with .odt? Not really since even if OpenOffice died, there are a growing number of apps that support the same format. Better yet, someone could even extract the text and styling if they had to since .odt splits presentation from the content.

      The solution is to store documents in a well defined open format to begin with. Adopt the new format and write converters for the legacy documents. Then at least there is a common denominator going forward, one that is implemented by multiple applications and interchangeable between them. Even a pretty shoddy standard like HTML shows the sort of revolution that can happen when every vendor adopts it.

    5. Re:Doesn't open source solve this by Anonymous Coward · · Score: 0

      OOo is compatible with lots of "dead" formats (or, can read them at least), as well as many other open source office programs.
      That's what I thought until I found that OOo 2 did no longer read my documents written with an ancient version of Star Office. I had to downgrade to OOo 1 to convert them to the new format. I don't remember the specific versions involved but I certainly remember that it was a major pain in the lower back.
    6. Re:Doesn't open source solve this by jZnat · · Score: 1

      .odt is the file extension for OpenDocument Text, an ISO standard. It is well-documented and widely available, and most modern office suites support OpenDocument. So unless ISO (and everyone else with a copy) loses all their copies of the ODF standard and any related standards, I think we should be safe.

      --
      'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
    7. Re:Doesn't open source solve this by RespekMyAthorati · · Score: 1

      The thing I don't get is why changing document standards should be such a big problem. POT (Plain Old Text) contains most of the important information in any document, as the formatting is rarely crucial to understanding it. And, almost all common office programs have a "Save As Text" option. So, everybody who saves a document, in any format, should also save a backup in ascii text form, so that most of the information can be recovered no matter what happens later on.

      At worst, you will be left with a huge quantity of POT.

  18. Real Issue by saibot834 · · Score: 1

    I think I've read something that they are already unable to read some data stored on computers in the Ex-German Democratic Republic.

    The only solution IMHO is _open and documented_ interfaces, protocols, programs, data types and hardware. In the future they won't be able to read our disks and files. They just can try to build a machine that reads our disks and files - for which they need documentation how they work.

    1. Re:Real Issue by Anonymous Coward · · Score: 0

      Poor John Titor http://www.johntitor.com/ had to travel back in time to take back an IBM computer built in 1975 called a 5100. If only they had used open formats !!!

      Cheers

  19. surprise? by Tom · · Score: 5, Insightful

    What's surprising about that? Someone in MS Spin Control and Public Relations is worth his salary. The story could have exploded into an "avoid MS products if you want your data accessible some years down the road" fiasco (we all know that MS is the worst offender when it comes to changing the document formats, usually undocumented). Instead, it was turned into another push for their next format.

    Brilliant.

    "What, the shit I sold you yesterday stinks? Try this new shit, it's great and it has none of the problems of the old one."

    That's what you hire PR people for.

    --
    Assorted stuff I do sometimes: Lemuria.org
  20. How about some *helpful* suggestions by FreudianNightmare · · Score: 5, Insightful

    Rather than bitching about Microsoft making an offer of 'help' which is just thinly disguised marketing (I mean, come on, par for the course no?), could we get a discussion about real solutions? I know MS bashing is fun, but come on, we do it on just about every other thread... lets have a day off.

    To kick things off here's one:

    Keep EVERYTHING in the simplest possible format. ASCII would seem sensible, since its the content we care about, not the formatting. (although that wouldn't help our Asiatic brethren much). Then Keep decent records of HOW you can read that format. With examples of the software and hardware. do this bit on PAPER. V. Tough Paper (or rock, or plastic or whatever). Update the explanations every other year, to put it in language the next gen will understand. Maybe also have instructions on how to translate the simple format to less simple things.

    I guess, basically, its a case of KISS and then *provide a persistent and regularly updated 'Rosetta Stone'* for latecomers to work from.

    As a side branch, this kind of reminds me of discussions I read about a while back of how to warn future generations about Nuclear Waste dumps (y'know, the really nasty stuff with half-lives in the thousands of years range). I don't think anyone ever came up with a decent answer....

    --
    'Speak softly and carry a beagle'
    1. Re:How about some *helpful* suggestions by nogginthenog · · Score: 1

      I've been to the National Archives researching 1st world war records (fascinating place BTW). These were stored on reel to reel tapes (similar to microfiche) that you viewed with a special machine. These are pretty much future proof, other than the fact that they will decay over time. ASCII would not be a good medium as they contain hand written comments etc.

      We was looking for records of a certain A.J. Wheeler (whose name is carved in the basement wall of my GFs fathers house in the Somme area, France). You wouldn't believe how many A.J.Wheelers there were!

    2. Re:How about some *helpful* suggestions by Colin+Smith · · Score: 1
      --
      Deleted
    3. Re:How about some *helpful* suggestions by FreudianNightmare · · Score: 1

      I take your point about ASCII being no good for things where the format is important (i.e. handwriting) but I wouldn't really look to be translating physical media anyway. I might digitise it for ease of access and/or analysis, but for longevity I'd only be looking to improve the chances of the physical medium surviving (that maybe means transferring to new media, maybe chemically improving the existing). Good luck with your hunt.

      --
      'Speak softly and carry a beagle'
    4. Re:How about some *helpful* suggestions by FreudianNightmare · · Score: 1

      Still too complex. I'm not saying you'll be completely able to avoid more complex formats, and obviously where you have too, opens better than proprietary... but as far as possible make it as simple as possible. ASCIIs just about the simplest encoding for text I can think of, but I could stand to be corrected on that.

      --
      'Speak softly and carry a beagle'
    5. Re:How about some *helpful* suggestions by Ceriel+Nosforit · · Score: 2, Interesting

      ASCII would seem sensible, since its the content we care about, not the formatting. No, the formatting is important as well. Sometimes 'the medium is the message', and that whole bunch of artsy crap we geeks would prefer to ignore. - Just think of it as an engineering challenge in order to make the pain go away.

      You always archive the original (unless you have a batch; then you sample one and call it the original), and that original can be in just about any format, hand-written, coffee-stained, in sanskrit. When scanning a document into an electronic archive the ideal would be to have OCR create a font and layout on the fly while running, so that the electronic version of the original would still look like the dead-tree original, and yet be machine-readable.
      Dead-tree is still the standard, as it has been for the past few millennia. Directives on national level will often not even recognize that electronic archiving exists since there is no standard to be used. Once we have a format of SGML such as .odt standardized, electronic archiving might come into existence.

      In practice your organization will often receive a document per snail-mail from an external source and an electronic version might not even exist. To make it electronic you have to scan it and to make it machine-readable you have to OCR it. Then the final results needs to be retrievable 50 years from now, or until the end of time. Yes, many things are actually archived without an expiration date. The cost, measured in money, of archiving something permanently is literally infinite.

      From this perspective we geeks who are used to Moore's law etc. seem pretty darn impatient and narrow-minded. There are factors involved ranging from constructing immortal dinosaur pens to training staff who 'have been doing it their way for the past 30 years' and are hellbent on continuing so until they retire. ...But that too could be seen as an interesting engineering challenge. And that's why it's taking so long; because the project is just so gosh darn gargantuan.
      --
      All rites reversed 2010
    6. Re:How about some *helpful* suggestions by Kjella · · Score: 2, Interesting

      "There is always an easy solution to every human problem--neat,
      plausible, and wrong." -- H.L.Mencken, The Divine Afflatus (1917).

      Ok, you started by identifying one problem - asian languages. In fact, pretty much every non-US language since you said ASCII and not Latin1. So we can extend that to UTF-8 with no problems, except there's probably a huge table just for the 100000 characters or so, even though the spec is quite short.

      But then, you have only characters, which is probably fine for basic text. How about works containing formulas? Uh-oh. You need some kind of math language for that, something like MathML. Of course, if you don't need math, you don't need MathML either.

      But wait... what if it's not all text? What if you're trying to store diagrams, illustrations or graphs which can arguably be just as meaningful as the rest, for example if you're trying to preserve the works of Leonardo to Vinci. And uncompressed bitmaps are so incredibly inefficient, maybe you need a picture spec with some basic picture formats (vector, lossless scalar, lossy scalar at least), you could call them something like SVG, PNG and JPG. Of course, if you don't need graphics you don't need those.

      But then, maybe you're looking to add references. Yes, you could do plain-text matching but it's much reliable and maintainable to make anchors and point to them, and they let you build tables of content and such etc. too.

      Then, maybe there's times where fonts and layout really does matter, for example the lines of a poem? Perhaps we really should have a system to preserve that. Of course you can try to do this by embedding coding into plaintext the way say project Gutenberg does but it's not very userfriendly, it's a bit like having "magic numbers" in code. Most of the time it's easier to just have a file format, and say "and these bits are the plaintext".

      In short, if you have a problem with storing this in OOXML and the solution is to use ASCII, then I think you're solving only a very very small subset of archive problems. If you're looking at the other way and say "I want one document format, to archive all my documents of every shape and form" then you'll be lucky if the 738 pages of OpenDocument standard (which is actually a lot more through referring to other standards) will suffice.

      --
      Live today, because you never know what tomorrow brings
    7. Re:How about some *helpful* suggestions by Colin+Smith · · Score: 1

      I keep my knowledgebases in Wiki format.

      http://pardus-larus.student.utwente.nl/~pardus/pro jects/zim/

      --
      Deleted
    8. Re:How about some *helpful* suggestions by MightyMartian · · Score: 1

      I've been to the National Archives researching 1st world war records (fascinating place BTW). These were stored on reel to reel tapes (similar to microfiche) that you viewed with a special machine. These are pretty much future proof, other than the fact that they will decay over time. ASCII would not be a good medium as they contain hand written comments etc.
      I think we're talking about electronic documents here, not about other mediums. ASCII isn't perfect, and it still leaves problems as far as binary data (images, sounds, etc.), although I have a hunch that bitmaps, jpegs and MPEG formats will probably have a lot longer lifespan than Word 97.
      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    9. Re:How about some *helpful* suggestions by FreudianNightmare · · Score: 1

      Flip Answer: Well, it IS the *UK* national archive. Also, I think you'll find its the *English* language. Even as mangled by my colonial friends. Sarcastic Answer: I maybe thought you had a point until you said the Font matters for... Poems. Really, the font matters for poetry? I think you're kind of missing the point of poems (to make angsty teenagers feel arty, at length, of course...) Reasoned Answer: Hmm, you're kind of proving my point. Your trying to find an electronic answer to every single problem before you even get started. Its the wrong approach. You stay as simple as possible and complicate as required. Basically, text can be captured as ASCII or UTF or whatever (Its not that important, as long as its simple), and anything else can be scanned in the simplest possible BITMAP format. Voila, everything covered. I'm kinda posted out now, but remember, the primary goal for this kind of archive is to MAKE IT LAST A LONG TIME IN AN EASILY UNDERSTOOD OR TRANSLATED MEDIUM. Everything else needs to be subservient to that. Really. Its a bit different for company archives. The goals there aren't for true longevity in the national interest, just long enough to last till you shred it when the fraud office comes calling...

      --
      'Speak softly and carry a beagle'
    10. Re:How about some *helpful* suggestions by davecb · · Score: 2, Interesting

      We fought a lot with this at Siemens (Sietec) about fifteen years ago, when trying to decide what format to use on stackers full of 12" WORM disks, which were just nicely becoming useful for large-scale archival storage in those days. We needed format that would outlast the disks, which probably meant 50-100 years assuming normal replacement/turnover.

      We ended up with the bottom level being a WORM standard, which was served out to users via the NFS standard, which was reasonably close to a Unix filesystem, and was usable by Windows clients, and finally we stored the data in quit simple random files with tables of contents, so we could handle multi-page documents.

      In practice we found the data we were storing was almost always images, as that what businesses wanted to store: scanned images of legal, business and medical documents. As the parent suggested, we used as simple a format as possible, but no simpler (;-))

      For text documents, I recollect we did support some commercial formats, but only ones for which we knew the full specification and had a translator in source form. Our own data was mostly LaTeX, the typesetting language, expressed as ascii characters, and occasionally postscript or pdf, ditto.

      --dave

      --
      davecb@spamcop.net
    11. Re:How about some *helpful* suggestions by Blo · · Score: 1

      Unfortunately (like the Asian example you provide) this approach stops being useful very quickly. Images are also very important, and cannot be easily transformed to a 'simple' format, although you could argue PNGs could be used. However, in the National Archives I currently work at (as a code monkey), a high importance is attached to the look and feel of the original, i.e. they want to preserve the original layout whenever they can. This makes sense, because migration, especially for things like Word documents and databases has a terrible track record over a longer period of time; think 50 - 100 years here. Two or three migrations (and yes, even though using a simple ASCII format seems a solution now, people in 20 years will want something different) on and you've lost important information in the document. One approach we're looking at right now is emulation (and we've got a first version going: http://dioscuri.sourceforge.net/), allowing old software (and documents) to be run on any platform in the future. Of course, this brings along new problems such as storage, updating the emulator, etc.

    12. Re:How about some *helpful* suggestions by fritsd · · Score: 1

      Well if you unzip ODF it's ascii (well, UTF-8). So then all you need is an uncompressed source archive of the unzip program (and documentation of the ZIP file format for re-implementation in a newer computer language than C). ZIP file format is cited in the ODF specification but IMHO it should be described on paper in a well-archived journal :-) Have you never tried to unzip a .odt file? (tip: make a subdirectory for it first). It's nice for .odp presentations, because all the images are in a seperate subdirectory. You'd need xsltproc and a small xsl stylesheet to get rid of the forest of tags though.

      --
      To be, or not to be: isn't that quite logical, Slashdot Beta?
  21. Microsoft lecturing about open standards?!? by daveewart · · Score: 1

    I just can't believe that Microsoft think they can get away with lecturing others about open standards.

    --
    "If you think the problem is bad now, just wait until we've solved it." --- Arthur Kasspe
  22. You can almost hear the slime dripping. by Colin+Smith · · Score: 1

    Microsoft's UK head Gordon Frazer warned of a looming "digital dark age" A dark age caused by... Microsoft... Actually they're just doing what comes naturally.

    The real problem seems to be the credulous morons in charge of the National Archives project.

    --
    Deleted
  23. WTF? by Anonymous Coward · · Score: 0

    simple solution - DON'T UPGRADE THE MACHINES - just keep all the old computers and associated stuff for looking at the archives!
    If they have reasonable hardware then it should last a long time.
    Call me a luddite, but if it ain't broke don't fix it.

    1. Re:WTF? by SanityInAnarchy · · Score: 1

      And what do you do when it does break?

      (Not if. When.)

      --
      Don't thank God, thank a doctor!
  24. It's not just about the software... by WIAKywbfatw · · Score: 2, Interesting

    It's not just about the software. It's the hardware, too.

    I'm sure that most of the archive data created today is stored on something like DVDs but, as recently as the early 1990s, the official long-term storage medium for the UK government was Syquest 44MB removable cartridge hard drives.

    I know that I have a working 44MB drive (well, when I last fired it up, which would have been sometime last decade) somewhere in my attic but I doubt that too many of these drives are still in existance.

    I only hope that the data that was once stored on thousands of these was successfully transferred to a more readily accessible storage format and that that new format is just as durable - media these days just seems to disintegrate after a few years.

    --

    "Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
    1. Re:It's not just about the software... by iago-vL · · Score: 1
      Wouldn't it be awesome if you ended up with the only Syquest drive in existence, and the government wanted to recover the data? You could rule the world!

      Now pardon me while I go out and start destroying the rest of the Syquest drives. But when you're leader of the world, I call being vice president!

    2. Re:It's not just about the software... by Torvaun · · Score: 1

      Or, you could end up having the drive stolen from you. What do you think is more likely?

      Wrong! It was actually secret option C: accidentally drop the last Syquest drive, thus dooming civilization!

      --
      I see your informative link, and raise you a pithy comment.
  25. I've never understood this arguement... by ferrellcat · · Score: 2, Informative
    ...this argument that files and data will one day just magically become inaccessible in the future. I have tape and diskette media for my Commodore Pet machines that goes back to 1978. That's 29 years ago, and guess what? The great majority of this media is STILL READABLE. Furthermore, the tools necessary to transfer any of my old media to modern PCs have been around for well over a decade. Once you have the data on a modern PC the rest can be handled with emulation or virtualization. For someone to complain...

    "If you stored something on a floppy disc just three or four years ago, you'd have a hard time finding a modern computer capable of opening it"
    ...when I can easily run programs I wrote in the SEVENTIES is pure nonsense.
    1. Re:I've never understood this arguement... by FreudianNightmare · · Score: 1

      give it a hundred years. Yes, their hyperbole was a bit extreme, but in the long run... Fancy having a go at translating an old wax cylinder with just your modern PC? (No cheating and whipping out that wax cyclinder player you have sitting round in the attic.) And also, that said, I went to transfer some stuff from a floppy at work to day... ulp, my PC (standard company issue) had no floppy drive.

      --
      'Speak softly and carry a beagle'
    2. Re:I've never understood this arguement... by Colin+Smith · · Score: 1

      Yes. Can I just point out that you are still alive and clearly have all of the equipment you used to have in the seventies... Though I suggest you try the tapes, I suspect they'll be gone.

      --
      Deleted
    3. Re:I've never understood this arguement... by 0123456 · · Score: 1

      The other problem is that getting the data off the tape is the easy part; what do you do with those ZX Spectrum word processor files that you recovered from tape?

      I got some old files from the early 90s off a backup CD recently. Word wouldn't open the Word files, so I had to install an old copy of Word from the Windows 3.0 days and save them out in a format that Open Office could import, and nothing I found could import the Ventura Publisher files. OK, those are mostly in ASCII text, but they're ASCII with formatting and separate style information so, while they could be recovered from that, doing so would be a long and complicated task if I had thousands of them.

    4. Re:I've never understood this arguement... by jimicus · · Score: 1

      Data stored on audio tape, yes?

      Well, let's see:

      1. Good luck getting a cassette player. They exist, but they're getting hard to find now. Won't be long before the only real source is eBay and car boot sales.
      2. Good luck turning the output into something clean enough to read. Analogue tape degrades rapidly, particularly when you're trying to encode data at any sensible resolution. That's why it was only really used in small home microcomputers, and why even then it only lasted a few years.

  26. IBM by ushering05401 · · Score: 2, Informative

    If you are going to choose a proprietary vendor to safeguard your data wouldn't IBM be the obvious choice. They have proven their ability to keep 20 year old programs running in modern environments without modification.

    It has been a while since I worked on an AS/400 system... so anyone with updated info please feel free to correct me if things have changed.

    It seems like a no-brainer.

    Link: http://en.wikipedia.org/wiki/AS/400

    1. Re:IBM by Anonymous Coward · · Score: 0

      Yeah well, then Microsoft would be a better choice, no?

    2. Re:IBM by Feyr · · Score: 1

      i have a customer who's been told by ibm, with a 2 weeks notice, that they'd have to change their whole network because the firewall module for their as/400 (or something to that effect) would not run after applying the patch, and they had no plan to make it work

      so much for 100% compatibility

    3. Re:IBM by LiquidCoooled · · Score: 2, Informative

      Actually, MS have done quite well with forwards compatibility.

      I can still double click on .com executable files written well back in the mists of time and run usable programs.

      For example, here is a version of Visicalc from 1981!

      http://www.bricklin.com/history/vcexecutable.htm

      --
      liqbase :: faster than paper
    4. Re:IBM by gerrysteele · · Score: 1

      Works perfectly well in DOSbox on MS's sworn enemy platform.

      How is it such an achievement on a platform of their own design?

    5. Re:IBM by TheRaven64 · · Score: 2, Insightful

      I have run that same version of Visicalc, in DOSBox, on a PowerPC Mac. Actually, I've run a few programs in that environment that don't run on Windows without the aid of DOSBox. To me, this says that third parties are better than Microsoft themselves for backwards compatibility with Microsoft programs. I wonder how long it will be before WINE has better support for old Windows apps. I think this is already the case for a few win16 programs...

      --
      I am TheRaven on Soylent News
    6. Re:IBM by toddestan · · Score: 1

      I have run that same version of Visicalc, in DOSBox, on a PowerPC Mac. Actually, I've run a few programs in that environment that don't run on Windows without the aid of DOSBox. To me, this says that third parties are better than Microsoft themselves for backwards compatibility with Microsoft programs.

      That's not backwards compatibility, that's just emulation. I can run C64 programs on Windows too using emulation, but it would be wrong to say Windows is backwards compatible with the C64. Windows is pretty good at backwards compatibility, and a surprisenly large amount of old DOS/Windows stuff will run on it. Though Windows would likely be better off if Microsoft decided to forget about all that legacy stuff, and instead had a compatibility layer like "Classic" in OSX to run that stuff.

    7. Re:IBM by ozmanjusri · · Score: 1
      Windows is pretty good at backwards compatibility, and a surprisenly large amount of old DOS/Windows stuff will run on it.

      That's not backwards compatibility, that's just emulation.

      There is no DOS in Windows XP What is called the "command prompt" is not really DOS ... it can be thought of as more of a simulation of DOS.
      --
      "I've got more toys than Teruhisa Kitahara."
    8. Re:IBM by sasdrtx · · Score: 1

      Try 40 years. On z/OS, you can easily run any valid program compiled in the mid-60s (for System/360). Except it will usually run faster on a modern machine (or on a simulated system on a PC). There's undoubtedly some code deep in the nucleus (kernel) that hasn't really changed since then.

      Data created in the 60s is also still easily accessible, although it would need to be copied from old to new media from time to time. But even the media has a pretty long life.

      --
      Most people don't even think inside the box.
    9. Re:IBM by Ravnen · · Score: 1
      There are two command prompts in Windows: CMD.EXE, which is the 32-bit (or 64-bit) Windows command prompt, and COMMAND.COM, which is the 16-bit MS-DOS command prompt, and runs via an emulation subsystem on 32-bit (but does not run on 64-bit) Windows.

      The key reason the (32-bit) Windows support for DOS is different from something like DOSBox is simply that it is integrated into the rest of the system. MS-DOS applications see any of the system drives (NTFS, SMB, etc.) as DOS drives, 16-bit Windows applications can create windows and such that are fully integrated into the desktop (because the 16-bit calls are actually translated and passed to the 32-bit Windows API), etc.

      On either Windows or Linux, one can run the overwhelming majority of historical software through various emulators, virtual machines and so on. However, these are far from seamless, and if one frequently uses such legacy software, it can become a nuisance. The subsystem model in Windows, which is used not only for MS-DOS emulation but also for Unix emulation, and formerly for limited OS/2 1.x emulation, is far more seamless.

      The Unix subsystem in Windows would actually be brilliant if Microsoft put more effort into it. It could easily offer an alternative to Linux, but since Microsoft only offer it on the more expensive versions of Windows and appear to allocate very little in the way of resources to it, its usefulness is unfortunately limited. I still prefer it to a virtual machine in most cases, but the quality of the ports to it (e.g. of the GNU development tools and the OpenBSD command-line tools) is quite poor.

    10. Re:IBM by terrywin · · Score: 1

      Note: The firewall module (written by IBM) has been outdated for years and IBM told everyone this way back when...if this shop is still running it, they should have shifted to an alternative by now as was previously recommended by Big Blue (they have had ample time to do so...).

      As far as actual user/business programs are concerned, the parent is correct...code written over 25 years ago on earlier versions of this platform (S/38, S/36) will still run today. Not to mention the fact that IBM changed from a CISC to RISC CPU architecture years ago and in most businesses not one program had to be recompiled!

      It truly is a remarkable platform :)

    11. Re:IBM by Feyr · · Score: 1

      they're not running it anymore :) it's been at least 4 years (probably more) since this incident, but the point was that not everything written by IBM is as easily portable as the OP's wanted you to believe

      though it might still very well be a remarkable platform ;)

  27. Doesn't matter. by khasim · · Score: 1

    It's not an archive of files in a single format, it's an archive of files in general, many formats, depending on which format the file was originally in.

    And being a government, these files are INCREDIBLY important.

    Why haven't they been converted? Really, all their DIGITAL archives should be in a single format by now.

    The system wasn't thought up any more than a library thinks up all the books it contains.

    All those books are in a single format. And paper records can last a LOT longer than digital records. They still have the original Constitution and that's more than 200 years old.

    They've found papyrus records that were 2,000 years old.

    It looks like paper is a better choice for keeping records than digital formats.
    1. Re:Doesn't matter. by Anonymous Coward · · Score: 0

      It looks like paper is a better choice for keeping records than digital formats. Fine, then you get to be the schmuck who has to organize, sort, label and store about 1/2 a pentabyte of information on paper.

      While you're at it, you can do double-duty as the information "search-and-retrieval" system for the database...

      -AC
    2. Re:Doesn't matter. by dimeglio · · Score: 1

      I agree. My biggest concern is also the language and the medium rather than the file format.

      Whatever is worth keeping for a long time should be on paper and translated in more than one language. Who knows, in 700-800 years, the English we know today will likely no longer be the same. I can barely read Shakespeare and that's only 500 years old and not too technical.

      I doubt that the now common CD/DVD/BlueRay/HD-DVD will be available in a few centuries.

      Then again, maybe no one will likely care about this old stuff anyways. We tend to like to give ourselves a lot more importance than we probably deserve.

      --
      Views expressed do not necessarily reflect those of the author.
    3. Re:Doesn't matter. by c1ay · · Score: 1

      And being a government, these files are INCREDIBLY important. Why haven't they been converted? Really, all their DIGITAL archives should be in a single format by now. Yeah, something like ascii. How long will it be before vi is no longer available?

      --

    4. Re:Doesn't matter. by bheer · · Score: 2, Insightful

      Whatever is worth keeping for a long time should be on paper and translated in more than one language.
      Er, even if you translate it into other languages, they'll evolve too. Try reading Old French much? And translation also leaves you with the headache of reconciling various translations and figuring out which is "more correct" (IIRC the Bible has this problem). It would be a much better idea to make redundant copies, to guard against bitrot and store them as physically apart as possible.

      I doubt that the now common CD/DVD/BlueRay/HD-DVD will be available in a few centuries.
      And it won't matter. The important stuff would be migrated to archival formats. For example, I keep a copy of DOS and Win3.1 ISOs (about 20MB total) and Norton Commander (3 floppy images!) on a DVDR, along with a copy of Virtual PC. This lets me recreate a Windows 3.1 virtual PC anytime I want. I wouldn't be surprised if I were copying DVDR ISOs to a holographic memory drive in the next ten years.

      As for the next century, most of this material will lose value, but the important stuff will get backed up professionally and successively remastered on new media (esp with things like the UK National Archive). And amateur historians, genealogy buffs and private collectors will have their hands full in the future with stuff that you can't find in the official archives but in people's attics, just like people are fascinated with Stone Age, Roman or Victorian artifacts today.

    5. Re:Doesn't matter. by TrickyRick · · Score: 1

      If you read the article and watch the video... For some reason they are avoiding converting nad trying to keep things in their original format.

    6. Re:Doesn't matter. by Bazzargh · · Score: 5, Interesting

      And being a government, these files are INCREDIBLY important.

      Why haven't they been converted? Really, all their DIGITAL archives should be in a single format by now.


      No, they shouldn't. You usually want 3 formats:
      - the original format of the document. Whatever whichever idiot happened to write (or record, or video) it in, you absolutely want the original in your records.
      - a searchable format (eg OCR'd text from scanned image docs)
      - a rendered format. (eg an image or pdf, or svg - something open enough that you can continue to show how the doc would have looked). The appropriate rendered format varies. Paper is not an appropriate format for storing CCTV footage, for example ;)

      If you're very, very lucky the original is both searchable and viewable; like, say, HTML. It gets more complicated too, because you often want to store a redacted copy of the document (think of the Onion story 'CIA realise they've been using black highlighter pen all these years') and you want that searchable too, so you have to keep a redacted searchable format too... and of course, some of the records are on actual paper. Have you started worrying about the fading inks in the originals yet?

      BTW you can't restrict the format of the original. Consider an email from a corporate bidding for a govt contract, with attachments. They need to keep those.

      - Mr. E

      PS, posting anon because I have dealings with the national archives, and don't want to speak for my company.

    7. Re:Doesn't matter. by Bazzargh · · Score: 2, Insightful

      Hum now. completely failed to tick the posting anon box :) good job I held back from expressing opinions in there.

    8. Re:Doesn't matter. by eat+here_get+gas · · Score: 0

      [quote]...I doubt that the now common CD/DVD/BlueRay/HD-DVD will be available in a few centuries....[quote]

      Centuries? Hell, look at Betamax..30 years?

      --
      the significance of a signature is insignificant
    9. Re:Doesn't matter. by Anonymous Coward · · Score: 1, Funny

      Good luck with the 'on-line activation' then...

    10. Re:Doesn't matter. by Corporate+Troll · · Score: 3, Interesting

      For example, I keep a copy of DOS and Win3.1 ISOs (about 20MB total) and Norton Commander (3 floppy images!) on a DVDR, along with a copy of Virtual PC. This lets me recreate a Windows 3.1 virtual PC anytime I want.

      Now.... You can do that now. However, in 100 years, will this be possible? You do not know what the future brings. Let's not even talk about 1000 years and beyond. Now; you backed this stuff up on a DVD and you die tomorrow. Your kids keep the data, and when they die a historian specialising on the 20th century wants to analyse the daily life of 20th century person. VMWare is long dead, you backed it up... Sure, but his platform can't run it. We're at least 10 operating system versions later, and they run on an new platform. x86 is long forgotten and they moved to quantum computers.

      Perhaps the guy is lucky, and can run an emulator in an emulator in an emulator in an emulator what you backed up. Perhaps....

      I have to this day zip files containing Wordperfect 5.1 files of the letters with a girl I was penpal with (and to whom I ultimately lost my virginity, but that is another story). Those letters, documenting life in the mid eighties to mid nineties might be interesting to a historian someday. (Historians love the daily lives of long dead people). Will they be able to read it, in 100 years? I don't know, especially in a proprietary format like Wordpefect.

    11. Re:Doesn't matter. by yoyoq · · Score: 1

      I've already forgotten who you are and what you said.

    12. Re:Doesn't matter. by bheer · · Score: 1

      Well, the way the UK national archives are structured, anyone (in the UK, I guess) can forward them their letters to the archive and they'll take care of the backups (if you trust them to do it right). Even otherwise, it won't be hard to buy/download a WP 5.1 filter and convert your letters to plain text and then store that.

      > VMWare is long dead, you backed it up... Sure, but his platform can't run it.

      Recreating old platforms is trivial -- a modern 32 bit processor can recreate 16-bit environments without breaking a sweat. Think of emulators for old games, like MAME.

      Also, maybe they won't have to rely on emulation. Just like present-day archaeologists get specialized equipment for their digs, I'm sure a historian from 3007 who really wants to read your letters in the original Wordperfect software will be able to recreate a (for him, primitive) PC. Since the PC platform is pretty well documented in manuals and source for FreeDOS is available (if Microsoft doesn't release the source for MSDOS by then), the only bottleneck is making sure your letters are stored in a future-proof format. Maybe someone can start an internet business offering laser etching on stone tablets :-)

    13. Re:Doesn't matter. by Anonymous Coward · · Score: 0

      think of the Onion story 'CIA realise they've been using black highlighter pen all these years'
      A little help? I can't think of it...
    14. Re:Doesn't matter. by ozmanjusri · · Score: 2, Interesting
      Whoever modded this "Funny" is wrong. It should be insightful.

      My copy of Office XP won't activate on any of the computers I currently own (the hardware it was originally activated on is long-dead), and that's only 5 years old.

      --
      "I've got more toys than Teruhisa Kitahara."
    15. Re:Doesn't matter. by Znork · · Score: 1

      "It looks like paper is a better choice for keeping records than digital formats."

      Not quite the same problem; there are two factors at work here. The first is the medium on which the data is stored, which has to last.

      The second is the encoding of the data. Had they stored the constitution in Word format they could have written it in stone and still have the same trouble reading it.

      Of course, our ancestors thankfully weren't quite as ... challanged... as some members of the current generations.

      Take a clue from the forefathers, store your data in ASCII. Or Unicode.

      Personally I always keep original data in a pure text format. Typeset it afterwards if you wish a nice presentation, otherwise typesetting just gets in the way of both the writing and the reading.

    16. Re:Doesn't matter. by Corporate+Troll · · Score: 1

      Recreating old platforms is trivial

      ...

      I'm sure a historian from 3007 who really wants to read your letters in the original Wordperfect software will be able to recreate a (for him, primitive) PC.

      You are grossly overestimating the capabilities of future archaeologists. You assume that they will have the documentation of old formats, that are stored on old media, that can be read with old computers. Recreate a PC? Okay, no problem... However, they first have to know that the shiny disk they found is a CD, to be read with a 780 nm laser, and that the bits on it are a filesystem called ISO9660 that describe files, each of these files might be anything. In my case they are zip files, so now they have to know the decompression algorithm, inside they find more files, these files are Wordperfect format, which -of course- is not documented. (I didn't even start about endianity of the bits, and other technical subtleties)

      So many points of failure....

      This is entirely different from finding a scrap of paper with writings on it.... You assumed in your recreation statement, that our civilization would prevail. Nothing, but nothing guarantees that! A national archive might survive, partially or in totality even if our civilization disappears. Note that I say: *might*.... but at least if it does, the paper documents will be readable.

    17. Re:Doesn't matter. by Steve001 · · Score: 1

      Znork wrote:

      "It looks like paper is a better choice for keeping records than digital formats."

      Not quite the same problem; there are two factors at work here. The first is the medium on which the data is stored, which has to last.

      The second is the encoding of the data. Had they stored the constitution in Word format they could have written it in stone and still have the same trouble reading it.

      True. I see the largest problem with the MS Word format is not that it is proprietary, but that it is not (as far as I know) publically documented. This makes it difficult for anyone, other than Microsoft, to accurately render a document in the MS Word format. This is not a problem with OpenDocument formats.

      Of course, our ancestors thankfully weren't quite as ... challanged... as some members of the current generations.

      Take a clue from the forefathers, store your data in ASCII. Or Unicode.

      Personally I always keep original data in a pure text format. Typeset it afterwards if you wish a nice presentation, otherwise typesetting just gets in the way of both the writing and the reading.

      I agree about storying your data in various formats. Unless the text is intended for a specific use in a specific medium, it is best to keep it in a form that can easily be formatted for various uses.

      I save my word processing documents in RTF for this reason. It gives me the basic formatting I need, and I can take the same document and use it in my word processor, e-book reader, and PDA without having to reformat it for each use. Plus, its easy enough to strip out the text of the document by hand if that become necessary.

    18. Re:Doesn't matter. by Anonymous Coward · · Score: 0

      Er, even if you translate it into other languages, they'll evolve too. Try reading Old French much?
      Actually, I found it more understandable than a lot of what passes for English these days, but I think that's a problem with my own vocabulary.
    19. Re:Doesn't matter. by Pope · · Score: 1

      Those letters, documenting life in the mid eighties to mid nineties might be interesting to a historian someday. (Historians love the daily lives of long dead people). Will they be able to read it, in 100 years? I don't know, especially in a proprietary format like Wordpefect.

      So print them out? If you're going to keep something for posterity, keep it in a format that's still accessible.
      --
      It doesn't mean much now, it's built for the future.
    20. Re:Doesn't matter. by richlv · · Score: 1

      BTW you can't restrict the format of the original. Consider an email from a corporate bidding for a govt contract, with attachments.

      as an archive, probably not. but as the government, that should be a requirement for any information exchange to be only in completely documented, patent unencumbered and whatnot document formats - well, the correct definition of open document formats would help here.

      of course, it's not a thing you do on a single day, but it should have been started a long time ago, to allow transitioning to these formats.
      --
      Rich
  28. solution by r00t · · Score: 1

    You need to run the original software in an emulator, OS and all.

    That emulator itself needs to be Open Source so that you can port it to future platforms. Otherwise, you'd be faced with running an emulator in an emulator in an emulator in an emulator in an emulator...

    Keeping around multiple conversions certainly doesn't hurt. Converters vary in quality and the resulting conversions will themselves vary in future compatibility.

    1. Re:solution by Anonymous Coward · · Score: 0

      Hmmm, an emulator isn't a solution, but at best a desperate hack.

      How do you make a search engine that can search in those formats only accessible from software running in an emulator?

      Conversions are a must.

  29. OOXML is not a solution by argent · · Score: 1

    OK, the deal is this. Let's say you have a bunch of files in some old format, and a spec for that format, and you need some information out of those files. That spec ill be useful to you if - and only if - the cost of implementing that format from the spec is less than the cost of losing those files, AND it's less than the cost of reverse-engineering enough of the format to extract the information you want from the files.

    The OOXML spec is huge (expensive to implement from the spec) and complex, and the meaning of many components can't really be determined without looking at the way Office behaves (so it's incomplete, this implementing a reader for it will require a fair amount of trial and error). Reverse-engineering Office's format may be much easier, depending on what information you're looking for... just extracting the text strings from a Word document has often been MY preferred method of reverse-engineering it...

    Which means that OOXML is a poor archival format... unless you want to lock people who want to use the archives in the meantime into using Microsoft Office to read them.

    1. Re:OOXML is not a solution by DaleGlass · · Score: 1
      Bizarre.

      OOXML is XML -- if you want to extract plain text from it just feed it through a XML parser and strip all the tags. You can do something similar with Office's format, but the solution will be far less perfect and contain lots of junk.

      In fact, I just tried that. One of my .doc files filtered through strings is unreadable. There are newlines at weird points in the output, some text is outright missing (I imagine that because internally .doc is at least part memory dump, and so the text inside isn't necessarily stored in order), and there's random junk like this in various places. Here you have an excerpt:

      Documentaci
      n, apoyo, formaci
      n y actualizaciones.
      Copia de seguridad.
      Compatibilidad y enlaces.
      FALTA ALGO
      PAGE
      PAGE
      &`#$
      Xkf#
      EXkf#
      TEMA 5 - APLICACIONES DE PROP
      SITO GENERAL Y ESPEC
      FICO
      galmarro
      Normal
      sanleged
      Microsoft Word 9.0


      Here's what it took to get a very readable plaintext out of an OOXML file:

      perl -MXML::Parser -e '$p = new XML::Parser(Handlers => {Char => sub { print $_[1]; }}); $p->parsefile("content.xml");'
      It's pretty much the data in plaintext. Unlike what results from .doc it's actually readable, and could be printed with minimal and very easily automatable reformatting. In fact, looking at OOXML, the document itself is separated from various misc stuff like settings, so parsing content.xml you get just the content.
    2. Re:OOXML is not a solution by DaleGlass · · Score: 1

      Ok, just finally realized that OOXML is the MS format and not the Open Office one. Seems like I should get some sleep, heh.

  30. Comment removed by account_deleted · · Score: 4, Funny

    Comment removed based on user account deletion

  31. Use TeX by user1003 · · Score: 2, Interesting

    I wanted to design something that would be still usable in 100 years. (Donald E. Knuth, more than 20 years ago)

    Also, LaTeX will get you nicer documents than any WYSIWYG word processor in less time (once you know it ..). Oh and smaller filesize, too.

  32. 1/2 pentabyte = 20 bits? by benhocking · · Score: 5, Funny

    Fine, then you get to be the schmuck who has to organize, sort, label and store about 1/2 a pentabyte of information on paper.

    A pentabyte is 5 bytes, right? How hard is it to store 20 bits on paper? ;)

    (I assume petabyte (10^15 or 2^50, depending on convention) is the word you're looking for.)

    --
    Ben Hocking
    Need a professional organizer?
    1. Re:1/2 pentabyte = 20 bits? by uzytkownik · · Score: 1

      > (I assume petabyte (10^15 or 2^50, depending on convention) is the word you're looking for.)

      Peta(P) is SI prefix meaning 10^15
      Correct prefix for 2^50 is pebi (Pi)

      --
      I've probably left my head... somewhere. Please wait untill I find it.
      Homepage: http://blog.piechotka.com.pl/
    2. Re:1/2 pentabyte = 20 bits? by glitch23 · · Score: 0

      A pentabyte is 5 bytes, right? How hard is it to store 20 bits on paper? ;)

      20 bits probably isn't hard to get stored on paper but when you consider 40 bits (which is what 5 bytes would really be) maybe that's when you start running into problems. Maybe use both sides of the paper maybe just like both sides of a hard disk platter are used?

      --
      this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
    3. Re:1/2 pentabyte = 20 bits? by benhocking · · Score: 1

      20 bits probably isn't hard to get stored on paper but when you consider 40 bits (which is what 5 bytes would really be) maybe that's when you start running into problems.

      Right, 1 pentabyte = 40 bits, which is why I stated that 1/2 pentabyte (taken from the GGP post, and in the subject heading) is 20 bits. 1/2 of 40 is 20. ;)

      --
      Ben Hocking
      Need a professional organizer?
    4. Re:1/2 pentabyte = 20 bits? by glitch23 · · Score: 0

      Correct you are.

      --
      this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
  33. Overestimate? by benhocking · · Score: 1

    You're suggesting the National Archives have the resources and intelligence (as in, research and know-how) of a single guy who found several 5 1/4 disks while cleaning his room.
    So, do you think that's a bit of an overestimate or an underestimate? ;)
    --
    Ben Hocking
    Need a professional organizer?
  34. Lilttle known fact... by Anonymous Coward · · Score: 0

    The book of Genesis was originally done in SGML.

    1. Re:Lilttle known fact... by Cheesey · · Score: 2, Funny

      Yes, it's true. Sadly, early transcribers of the book left out the stuff they didn't understand. In addition to a number of now-forgotten sections describing the role of evolution in the creation of life, this included the following cryptic verses:

      2:2 And on the seventh day God said :wq and then make.

      2:3 And God watched gcc running and sanctified it, because it would have taken Him at least two weeks to write the whole thing in machine code.

      --
      >north
      You're an immobile computer, remember?
    2. Re:Lilttle known fact... by Anonymous Coward · · Score: 0

      Heretic. God used Lisp.

    3. Re:Lilttle known fact... by lidden · · Score: 1

      No.

  35. called "hiring the fox to guard the henhouse"(n/t) by HiThere · · Score: 1

    I believe this is called "hiring the fox to guard the henhouse".

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  36. Bright people don't make tech decisions by Cheesey · · Score: 4, Interesting
    The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous idea is insulting.

    Unfortunately, those bright people don't get to make technical decisions.

    The British Library recently introduced SED, an electronic document delivery system. With SED, you can order electronic copies of journal papers and articles from their archives. Great idea! Previously, you had to wait for the documents to come through the post, and that would take a week or so. Now you get them by email in a couple of working days.

    Except that the documents are crippled by Adobe DRM, which imposes the following restrictions:
    • You can only view them using certain specific versions of Acrobat Reader (6 or 7) - the latest version is not recommended.
    • The software only works on Windows 2000 or XP. No Linux support, no Mac support. Vista might work, but again, it's not recommended.
    • You can only look at each document for a limited time, and you can only print it once.
    So, if you want to use the service, you'd better hope that you have (a) the right version of Windows, (b) the right version of Acrobat Reader, (c) a reliable net connection, and, most importantly, (d) a very reliable printer that won't chew up the document. Unless you're a filthy dirty pirate, of course.

    If Adobe managed to convince the British Library to put up with this ridiculous system, I am sure that Microsoft will have no difficulty convincing them about their archive "solution". If SED is anything to go by, it'll be another awful implementation of a great idea.
    --
    >north
    You're an immobile computer, remember?
    1. Re:Bright people don't make tech decisions by innocent_white_lamb · · Score: 2, Interesting

      a very reliable printer that won't chew up the document. Unless you're a filthy dirty pirate, of course.
       
      What about printing it on this?

      --
      If you're a zombie and you know it, bite your friend!
    2. Re:Bright people don't make tech decisions by jabuzz · · Score: 1

      Yeah tell me about it. However you can print it once. Let's just assume that I print it to a network attached PostScript printer. Except it really is not a network attached PostScript printer but a small program running on a Linux box saying, thank you very much and saving the entire stream to a file. At which point you can fire up your favourite PostScript distiller and turn it right back into a PDF.

      Oh and by the way you can use Acrobat 8 now.

    3. Re:Bright people don't make tech decisions by Cheesey · · Score: 1

      Oh and by the way you can use Acrobat 8 now.

      Ah, a minor improvement. It did strike me as particularly incompetent that Adobe's DRM scheme did not even work with the latest version of their own product, but then DRM is all about incompatibility and frustration for legitimate users.

      --
      >north
      You're an immobile computer, remember?
    4. Re:Bright people don't make tech decisions by lordtoran · · Score: 1

      KDE already has an integrated virtual PDF printer. In addition, you can tell KPDF (KDE's PDF viewer) not to obey DRM restrictions in the first place.

      --
      Want to hear the voice of GOD? cat /boot/vmlinuz > /dev/dsp
  37. One thing I'd like to know (ODF question) by sid0 · · Score: 1

    I haven't gone through the ODF specification yet, but there's one thing I'd like to know:

    Does the ODF specification support each and every Word/Excel/Powerpoint 2007 feature?
    If not, is it extensible?
    If it is extensible, do changes have to go through some sort of committee to be incorporated? How frequently are changes incorporated? How long is the process?

    1. Re:One thing I'd like to know (ODF question) by a.d.trick · · Score: 4, Informative

      Does the ODF specification support each and every Word/Excel/Powerpoint 2007 feature?
      Thank goodness no. "Auto Space like Word 95"? That's in the OOXML spec (and there's no explanation on how Word 95 does spacing either).

      If not, is it extensible?

      Yeah, it's XML. Also, unlike OOXML, ODF uses namespaces, so you can create a separate standard if you don't want to muck around with ODF.

      If it is extensible, do changes have to go through some sort of committee to be incorporated? How frequently are changes incorporated? How long is the process?

      It would depend. The thing about changing standards is that it causes problems for all sorts of people. There is a real need for a stable and standardized document format that just doesn't change, or if it does, very slightly.

  38. depressing by bytecolor · · Score: 1

    One of the most depressing IT related articles I've read in a long time.

    --
    bytecolor
  39. Re:More surprisingly!? No, UNsurprisingly by MightyMartian · · Score: 1

    It's partly MS's fault, but also partly a whole bunch of organizations' faults. Microsoft isn't the only one with big, ugly proprietary formats. There's still a helluva lot of documents from the age of WordPerfect. The real fault lies in the fact that the old push for standards like ASCII, which was meant to overcome much of this, were ignored in the halcyon days of the personal computer, when companies, whether through dreams of lock-in or simply because they didn't give a damn, ignored decades of work that had produced ASCII, TeX and the like.

    To my mind, the very best way to amend this problem is for the archival agencies to insist that only certain formats be accepted. The applications exist to translate Word or WordPerfect documents to open file formats, and it doesn't take a rocket scientist to do it. Don't bloody well go to companies like Microsoft for solutions.

    --
    The world's burning. Moped Jesus spotted on I50. Details at 11.
  40. Such precise terms as by Anonymous Coward · · Score: 4, Informative

    "Spacing like WP6"? "Caclculate incorrect leap year like Excel"?

    Becuase if you want to include bugs etc, then no, it doesn't support each and every 2007 feature.

    If you mean supporting tables, nested documents, embedded graphs, scripting and so on, yes.

    It may not be "click the same buttons" feature correct nor probably the "run the same VB code" compatible.

    Take a look at some of the people on the board that devised ODF. They include the US National Archives. Print media. Archivists.

    Y'know, people who KNOW DOCUMENTS.

    As to the remainder of your questions, there is a process, it does have to go through comittee (else how does everyone else know how to implement the new standard? MS doesn't have this problem since they only want themselves to know their updated standard). It is XML so it is extensible (decode the initialism). The process will take as long as it takes. Much the same as Vista will take as long as it takes to get SP 1 out.

    I don't see how these latter issues are something that is a part of ODF and not any form of standardisation that OfficeXML will have to have to go through for anyone other than MS to implement...

  41. What they need is... by ThePerfGuy · · Score: 1

    ... one good tester, an open mind and a week. There is almost always a simpler/cheaper solution than "adopt MS's newest buzzword" and testers love to find those solutions!

    --
    Scott Barber
    Chief Technologist, PerfTestPlus
    Executive Director, Association for Software Testing
  42. Let's see by Ullteppe · · Score: 1
    Step 1: When you archive a document, make a PDF of it (pretty much anything with a CPU in it will display PDFs these days, I even have homebrew for my PSP that does it). For older text-only documents, ASCII text does the same job. For graphics, make sure to keep converting data to newer file formats (ie. .pcx -> .png)

    Step 2: Make sure to copy the files to new storage media once they become widely available (ie. copy the documents from the tape onto DVDs and hard disks). Continue doing this when new holographic/whatever media become available.

    For not easily-convertible formats (databases, binary code, etc.), make sure to archieve the original program, and hunt for emulators that will emulate the appropriate hardware.

    1. Re:Let's see by tqk · · Score: 1

      For not easily-convertible formats (databases, binary code, etc.), make sure to [archive] the original program, and hunt for emulators that will emulate the appropriate hardware.
      What? For databases, do an export. "Binary code" means what, exactly? Executable? Get the source code. "The original program" is pretty much worthless once you can't find the machine and OS that program runs on (emulators, my a$$). Binaries are by definition not portable, including between OSs and over time. Don't store important information in formats that can't be read by multiple, open standard reading tools.

      I agree, this is one of the most depressing stories I've read in a while. None of this problem was necessary and all of it is, and was, avoidable. That they've still not yet learned their lesson and are about to sign up for another round of it is pretty sad.
      --
      "Tongue tied and twisted, just an Earth bound misfit ..." -- Pink Floyd.
    2. Re:Let's see by Ullteppe · · Score: 1
      OK, so maybe databases was a bad example. However, I can see plenty of cases where you might not be able to get the source code (preservation of old commercial games etc.). Don't discount emulators. I can run C64, Amiga, CPM and so on in a very decent fashion using emulators. This also solves the problem of obsolecent media (C64 tapes can be stored on harddisk, for example). I agree that open formats should be used whenever possible. However, I can see that there are some cases where you have to store binaries.

      I think the main lesson to learn from this is if that if you are tasked with preserving information, you need to take an active stance and make sure to do conversion and storage on new media all the time. You can't just put your old tapes in a closet and expect to be able to read them in 20 years. This goes for individuals as well; make sure to transfer those old pictures and wedding videos to digital and then back up, back up, back up!

    3. Re:Let's see by jimicus · · Score: 1

      Step 1: When you archive a document, make a PDF of it (pretty much anything with a CPU in it will display PDFs these days, I even have homebrew for my PSP that does it). For older text-only documents, ASCII text does the same job. For graphics, make sure to keep converting data to newer file formats (ie. .pcx -> .png)

      Will it still do so in 20 years time?

      For all we know, PDF will have changed substantially and we'll find that the majority of readers have broken the code which renders really old versions.

    4. Re:Let's see by Ullteppe · · Score: 1

      Maybe, but at least there is a multitude of readers, which bodes well for longevity. But again, the best approach is to continue to re-save your documents in more recent versions of the format. Simply open the document in Acrobat Pro, and re-save it. Being an archievist doesn't mean you should be constantly working on your files.

    5. Re:Let's see by jimicus · · Score: 1

      Being an archievist doesn't mean you should be constantly working on your files.

      You will be if you have any serious number of files and the solution to keeping the file in a sensible format is "Open in Acrobat Pro and re-save it".

    6. Re:Let's see by Ullteppe · · Score: 1

      :-) I'm pretty sure this can be automated.

  43. Time bomb ? by Yvanhoe · · Score: 1

    Good! The word is carefully chosen. It now has a chance to be heard by politicians. Wouldn't there just be a way to link proprietary formats to Al-Quaeda ? Come on ! I'm sure we can !

    --
    The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
  44. Here in the colonies ... by IchBinEinPenguin · · Score: 1

    ... we already have a solution: http://www.naa.gov.au/recordkeeping/preservation/d igital/applications.html The Archives' approach to digital preservation relies on converting digital records from their original format into preservation formats. Xena (XML Electronic Normalising of Archives) is the program created by the National Archives to complete these processes.
    Xena converts digital records into two preservation formats.
    * Bitstream version. This is a metadata-wrapped bitstream version of the record, which is considered a secure original copy of the record. This version contains all of the information from the original, but requires access to the original hardware, operating system and application software for performance.
    * Normalised version. This version is also wrapped in metadata. The process of normalising converts the record from its original format into eXtensible Mark-up Language (XML). The XML version is not considered to be an original copy of the record as some information may be lost during the normalisation process. However, the performance of the normalised object is the closest to the original that is currently possible. Xena is being continually improved so, over time, the performance of normalised versions is expected to more closely replicate the original.

    1. Re:Here in the colonies ... by DavidKlemke · · Score: 1

      I used to work as a Sys Admin for the guys who worked on Xena, and I can say you should be modded far up bringing XENA to our attention.

      Digital obsolescence is a big problem that all Digital Archives are looking into ever since the Domesday problems encountered a while back (google it and you'll see what I mean. The UK has been bitten by this bug before). The National Archives of Australia is really one of the pioneers in this field as they're actively seeking out file formats and making converters for them.

      I can tell you now that they have an incredible set up over there for dealing with all sorts of files. XENA is exceptionally good with pretty much anything text and it's almost there for sound and video. Even though people will scream and shout about the upcoming file format problem they seem to fail to realize that there are so many people working on this issue.

      There are many other projects that are attempting to try and solve this problem but none of them are as complete as XENA is.

    2. Re:Here in the colonies ... by Anonymous Coward · · Score: 0

      Yeah, XENA is pretty good in it's 3rd implementation, although there is a lot of coding involved to properly encapsulate a document to meet your metadata requirements (and to view said metadata). I've done this, as well as integrated XENA with a 3rd party EDRMS system (for automated archiving) and it was a hard slog... but nothing a government department couldn't handle.

  45. You have *got* to be kidding me... by NerveGas · · Score: 1


        We live in an age when brand-new, undocumented, *encrypted* file formats are deciphered within days or weeks. You're telling me that in a few decades, NOBODY will be able to figure out a spreadsheet or word-processing document?

    --
    Oh, you're not stuck, you're just unable to let go of the onion rings.
    1. Re:You have *got* to be kidding me... by Verte · · Score: 1

      Ssh! Next thing you know, M$ will have their legal department designing the file formats. Imagine a document format in legalese!

      --
      We at slashdot are scientists, specialists and kernel hackers. Your FUD will be found out.
    2. Re:You have *got* to be kidding me... by Ant+P. · · Score: 1

      No, that part won't be a problem. It's just that doing it without being killed by the DRM chip in their brain is the tricky part.

  46. That's why I said "depending on convention" by benhocking · · Score: 1

    Not everyone accepts the Pebi designation, so I included the phrase "depending on convention" in an effort to bypass arguments on both sides. Obviously, my effort failed. ;)

    --
    Ben Hocking
    Need a professional organizer?
  47. could get funny by AlgorithMan · · Score: 1

    I'm curious to see what happens, if Blai^H^H^Hrown or Bush learn that the UK National Archive has got a time-bomb...

    --
    The MAFIAA is a bunch of mindless jerks who will be the first up against the wall when the revolution comes
  48. This isn't a Problem by JamesRose · · Score: 1

    It's a time bomb, going to explode any second causing massive data loss sending us into an eternal dark age where no one can access old copies of childrens TV programs!

    Come on, I know I shouldn't be surprised, you can only expect such FUD about a news company, but still, this is crap. You know what they need to do, they need to keep all the original recordings, then, in a digital database they have the recordings saved digitally standardised recordings (in highest possible qualities necessary) along with information in the database about where it is saved in original format, what format it was originally in, and a final piece of information, about where the necessary equipment to veiw the original recordings should be, and they should have a store, backed up, with every single piece of playing equipment to playback every file in the library. This sounds like overkill, but is only a good method of backing up, and the complications are from the silliness of how badly standardised in the past they were, but with an effort they can maintain that library. Not only that but they should be standardising future recordings, so backing up and future proofing can be done more easily in future.

    See, two minutes and you can think up a simple strategy to preserve all data, make it future proof, and in actuality, with a concerted effort over time it can be simple, as past data is safe, and by standardising future data you are significantly cut down on future efforts. All that and I didn't even need to refer to explosives terrorists or any scare tactics, weird isn't it.

    That, or stick all the files on a torrent and they'll float round the internet for years :)

  49. Maybe my understanding of an archive is wrong ... by The+Sith+Lord · · Score: 1

    ... but isn't an archive supposed to be future proof ?

  50. MOD PARENT UP by Anonymous Coward · · Score: 0

    +1 Funny if I had modpoints :)

  51. Thanks for explaining why OOXML is not a solution by argent · · Score: 1

    OOXML is XML -- if you want to extract plain text from it just feed it through a XML parser and strip all the tags.

    Precisely my point. If the layout and non-text information in the file matters, then you've thrown it away. If it doesn't, then why are you bothering to put it in the archive?

    You can do something similar with Office's format, but the solution will be far less perfect and contain lots of junk.

    Yes, and (as I noted) I've done the same thing, and it's a relatively crude way of reverse-engineering the format.

    On a spectrum of "what's a good archive format" OOXML is a bit better than older office formats.

    But compared to even something like HTML that's got an open specification that's actually open enough to have multiple independent implementations, and easy enough to implement that you can do it in BASIC for display on a dumb terminal, OOXML is just daft.

  52. And that's why Microsoft calls it OOXML. :p by argent · · Score: 1

    Ok, just finally realized that OOXML is the MS format and not the Open Office one.

    No prize for guessing why they used that name. :p

  53. Are they mad? by Nazlfrag · · Score: 1

    They want to use Microsoft to convert file formats? The same company who can't even save/load their own proprietary formats in their tiny little locked-in world... I think these records are in serious danger of being lost forever.

  54. This is truly funny! by Anonymous Coward · · Score: 0

    The very king of incompatible proprietary formats, where every version of their formats requires upgrading to the next version of software to continue using their format, is promising that all will be compatible with their newest proprietary format!

    George Orwell could not have dreamed up this scenario!

  55. That's actually what they're doing .. by Anonymous Coward · · Score: 0

    It's worth going to one of their lectures, actually.

    No, the amusing bit is that MS has shuffled its way in there and is flogging the single most important threat to longevity of digital information: an ill documented, proprietary standard that only pretends to be open. The problem the British library has is very obvious, so it's not like a sales rep has to think hard - the MS solution on offer is ludicrous to anyone who's been near a standards process and is simply a path to establish credibility for, well, the blatant after-the-facts rigging of standards to claw back the proprietary grip on the market after ODF gave that a good and well needed kicking.

    IMHO, MS is yet again able to use British establishment figures to do its selling for them.

    If I recall correctly, Blair was so impressed by money that he let himself being talked into being present at the UK W2K launch (or the version before that), thus providing a Government endorsement, and it seems history is about to repeat itself.

    You'd think the British Library would know about history..

  56. Why open standard matters by Omega+Blue · · Score: 1

    This example perfectly illustrates the problems with proprietary formats. Once the software that interprets a proprietary format vanishes any information written in it is gone. Okay, it's not gone gone. I am sure you can get a bunch of good cryptanalysts to pour over binary dumps of these files. Eventually they will crack it - if your information is worth the cost, that is.

    This is why we need open standard formats such as ODF and reject pretenders such as OpenXML. Just because the name has "open" in it does not mean the information to completely read and write OpenXML is freely available to the public. This makes OpenXML a proprietary format dispite the name.

    OpenXML should be placed where it belongs - the rubbish bin.

  57. store a computer with the backups... by martin · · Score: 1

    This isn't new. For years (decades) many large defense and government projects have all the source code, documentation etc stored along with a computer all setup with the required software in order to read all this stuff.

    What's new is the fact businesses are starting to realise they have the same problem.

  58. Re:called "hiring the fox to guard the henhouse"(n by caluml · · Score: 1

    AKA The wolf will hire himself out cheaply as a shepherd.

  59. Australian Nat Archives open standards & forma by lucychili · · Score: 1

    http://www.linux.org.au/conf/2007/talk/55.html
    Michael Carden explains it well

  60. Nothing new here, move along by rfc1394 · · Score: 1

    This has been a problem with ALL media that is not readable without technology. Or even if the people who know the language die off; we couldn't read hieroglyphics if we hadn't found the Rosetta Stone.

    Anyone have a wire recorder handy? They were very popular back in the (19)20s and 30s. Oh yes, can someone loan me a Dictaphone or a dictaphone belt? How about a phonograph that plays 78rpm records? How about even having a phonograph? 8 Track tape? Now, as for computer formats, does anyone have any 80 Column punch cards? Teletype or a paper tape reader? 12" magnetic tape reels, or tape drive that reads 7 track coding (as opposed to newer 9 track), presuming that they even have tape any more? Or most of the stuff used with mainframe computers. How about 8 inch or 5 1/2 inch diskette? Got any Zip disks? Now, do you have any .LBR or .ARC archive files? What about EBCDIC, read any files coded using it lately?

    When was the last time you handled a photograph that had a negative? I handle probably a dozen images or more a day when I'm going through digital pictures on my computer, but it's probably been ten years since I had a picture that had a photographic negative. But we might have pictures and plates as far back as the 1890s when the camera was first developed, it's highly unlikely you can get duplicates made, or if you can, it's going to require a specialty photographic processor and is probably expensive. Does anyone even use film anymore for "home movies" or are we using video tape and now video disc? The cost differential between video and film is about 50 to 1, e.g. for $3 you can buy a high-quality tape that will record 2 hours vs. 3 minutes for 8mm, if film is even that cheap; I haven't had to buy 8mm film for twenty years. What happens to those old movies? If we we can even view them, it's usually because they have been converted to tape or disc.

    Oh, yes, video tape. Movies are going all disc now, and as a result most video stores are selling their tape collections at low prices ($1 per tape) because the space and cost of disc has become much more advantageous; in the space of three video tapes you can probably store ten or more discs. Which begs the question, if either the HD-DVD or BLU RAY format wars get settled, shouldn't we expect all videos to go to that format? (Or maybe they'll just release in both, in either case, you'll either need two machines os eventually they'll have to develop a dual-format machine to read both.) Oh yes, I forgot the earlier videodisc format that came out long before CDs.

    The changing of storage formats has caused problems even with open format standards - let alone troubles over files using proprietary or non-standard formats - as we have changed technology. This has been noted for years and is a big problem with non-profits with limited resources - such as libraries - which might have to convert data from one device or file format to another as older systems become obsolete and data is trapped on those systems if not converted. Lots and lots of data produced at significant expense have either been lost or is inaccessible because the systems that coded it are failing as parts become unavailable and machines cannot be maintained, and where they can be, it's a huge expense to do so.

    Paul Robinson - My Blog
    --
    The lessons of history teach us - if they teach us anything - that nobody learns the lessons that history teaches us.
  61. The REAL problem isn't the format, but the media by elrous0 · · Score: 1

    Finding software support for obsolete file formats isn't NEARLY as serious a problem as the media that these files are stored on. Maintaining digital media (optical discs, tapes, floppies, hard drives, etc.) over an archival length of time (500+ years) is likely going to be VERY tough. Simply READING the media will be a challenge in the future. You can open a book from 1,000 years ago and read the thing (you may have to learn a specialized language, but it's easily possible). How are you going to read a hard drive 1,000 years from now--long after the hardware to read it has ceased to be manufactured and there aren't even blueprints still around to make 20th/21st century computers? Getting the bits to make sense after you access them may prove trivial next to the challenge of getting to them in the first place.

    --
    SJW: Someone who has run out of real oppression, and has to fake it.
  62. Embrace and ?????? by LeadSongDog · · Score: 1

    Mr Frazer said Microsoft had shifted its position on file formats. "Historically within the IT industry, the prevailing trend was for proprietary file formats. We have worked very hard to embrace open standards, specifically in the area of file formats."
    Where have we heard this before?
    --
    Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.
    1. Re:Embrace and ?????? by Anonymous Coward · · Score: 0
  63. Dead Media by LeadSongDog · · Score: 1
    Actually, there's quite a lot of people working with obsolete recording technologies. Even DVD-R ;/)

    See http://www.cedu.niu.edu/blackwell/multimedia/high/ library.html for some fascinating lookback, including

    * bonobo trail blazes

    * the Edison electric pen

    * Baird mechanical television

    and my personal favourite

    * Rene Dagron, Pigeon Post Microfilm Balloonist

    --
    Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.