Slashdot Mirror


Vint Cerf: Data That's Here Today May Be Gone Tomorrow

dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'" We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.

267 of 358 comments (clear)

  1. XML? by AlphaWolf_HK · · Score: 1

    I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

    --
    Careful with names containing L slashdot.org/~AiphaWolf_HK slashdot.org/~AlphaWoif_HK slashdot.org/~AiphaWoif_HK
    1. Re:XML? by Nerdfest · · Score: 1, Insightful

      The same applies to any *open* format.

    2. Re:XML? by cheater512 · · Score: 2

      In to a usable document from scratch? Pretty hard. Ever looked at the XML of a moderately complex document?

    3. Re:XML? by ShanghaiBill · · Score: 2

      I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

      Yes, the problem is not "data" but "data in proprietary formats" ... and even that is becoming less of a problem. A converter to/from almost anything is usually just a google search away. With VMs and emulators, even proprietary binary programs are easier than ever to deal with. I can run any CP/M or C64 program on my desktop Linux computer using free emulators. This was indeed a "hard problem", but today it is mostly solved.

    4. Re:XML? by fuzzyfuzzyfungus · · Score: 2

      I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

      Binary formats were standard for everything up through Office 2003. Office 2007(2003 with optional converter pack and some weird bugs) could output something XML based, though I have the vague memory from the OpenDocument/Open Office XML slugfest that 2007 produced something that deviated from the theoretical ideal of OOXML in some respects, and that full conformity happened at 2010 or 2013. I might be remembering that wrong; but anything before 2003, and a lot from 2003 were definitely binary.

    5. Re:XML? by Why2K · · Score: 1

      They are binary, but at least they are documented: http://msdn.microsoft.com/en-us/library/cc313105(v=office.12).aspx

    6. Re:XML? by belmolis · · Score: 2

      Both have published specifications, so reverse engineering shouldn't be necessary. However, Microsoft's XML includes things that are not defined in the specification. That was one of the objections to giving it status as an open standard.

    7. Re:XML? by Hamsterdan · · Score: 1

      The problem is not just related to the format, but the medium it's stored on. I can still read C64 floppies because I have some drives, but everything I have for my Apple ][ is considered lost until I find both a drive and a working machine.

      --
      I've got better things to do tonight than die.
    8. Re:XML? by KGIII · · Score: 1

      Hell, even the non-open formats are pretty easy to get to a readable level of functionality. They won't contain the markup necessarily and certain features won't be available but, frankly, if we're able to decode all the other ancient languages I'm pretty sure someone will be able to decode these as well.

      Speaking of ancient... Err.. When did Vint go to Google? That's kind of cool that he has but that is, in itself, news to me. I must have missed the announcement as I'm sure there was one.

      --
      "So long and thanks for all the fish."
    9. Re:XML? by gweihir · · Score: 4, Insightful

      Have you seen what some people (and MS) do with XML? And what convoluted structures they use? Coded in binary? With compression and other eminently hard to understand stuff? Most of these things will be readable just as long as the applications that created them are around, but not longer.

      Forget XML. Forget Unicode as well. Plain ASCII is the only thing that works. Simple PDF or PostScript will work also, because the standards and open-source tools to read them will still be around. But nothing as complicated as a MS office document will survive. LibreOffice formats may have a chance, because LibreOffice may still be compilable and runnable (being FOSS), but only because of that and I would not bet on it.

      Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    10. Re:XML? by Anonymous Coward · · Score: 1, Funny

      Holy shit, yeah, you're right - it's totally impossible to strip out the XML tags and be left with readable plain text content!

      I bet nobody could ever decode it!

    11. Re:XML? by Anonymous Coward · · Score: 1

      Hell, even the non-open formats are pretty easy to get to a readable level of functionality

      Hell, ya, as if the real world works like the series "24", where their super efficient CTU lab can identify any type of files, and once identified, they can decrypt anything

    12. Re:XML? by Joce640k · · Score: 1

      I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

      You know how I know you haven't read the OOXML standard?

      --
      No sig today...
    13. Re:XML? by rvw · · Score: 1

      Holy shit, yeah, you're right - it's totally impossible to strip out the XML tags and be left with readable plain text content!

      I bet nobody could ever decode it!

      Well we could of course describe the entire Windows 95 OS, Office 95 and even Mac OS 8 or something in an XML CDATA tag.

    14. Re:XML? by Dr_Barnowl · · Score: 4, Informative

      Not even Microsoft can implement their Office XML "standard" ; from examination it's pretty much a direct name-for-name serialization of their internal binary structs, with some of the more obvious gaffes like explicitly saying "do this like this old version of Word" hastily renamed to placate ISO. It needs you to implement a whole bunch of specific behaviours if you want it to work in the MS software (things like "if you update this bit, you also have to update this other bit just so or it won't work"), but these aren't documented.

      You've got more of a chance, sure, just because the structs are marked and you don't have to infer where their boundaries are, but it's a far cry from ODF which was designed from the outset to be an open XML format rather than just hastily being bunged together to permit large purchasing bodies (like governments) to tick the "Open format" box on their form.

    15. Re:XML? by Dr_Barnowl · · Score: 1

      After they were forced to, for interoperability purposes ; you can see this from the 6-monthly release dates on the documents, even if the formats haven't changed, it's obvious a court order is compelling them to go to the effort of releasing a new document.

      The document bundle also has over 6000 pages (6,154) ; Excel accounting for the lions share at over 2600 pages. Coincidentally I think this is about the same size as the initial MOO-XML format submission.

      It's quite a task to re-implement (presuming these documents are clear, concise, and accurate).

    16. Re:XML? by Dr_Barnowl · · Score: 1

      MOO-XML is transparently just a serialization of the internal binary formats of Office produced in response to the threat of large buyers (like governments) insisting on open document formats ; unlike ODF which was designed to be an XML format from scratch. It let their government buyers tick the box and push through the procurement order - "Hey, it's what we use already, so it's definitely compatible, and it supports all that open format jazz - so it's the best value for money, even if this other thing is free."

      The fact that the XML formats are now the default is just the final piece of delicious irony.

    17. Re:XML? by Dr_Barnowl · · Score: 1

      To give him his fair due, he's talking about reverse engineering, presumably in the absence of the standard.

      The markup does make it much easier - you do at least get to see the structs, and the names of their elements, instead of just inferring them by poking around in a hex editor.

      But I'd lay odds that the guy re-implementing ODF from scratch and a few sample documents would be done long before the guy with the MOO-XML documents had recovered from his first nervous breakdown.

    18. Re:XML? by dkf · · Score: 1

      Both have published specifications, so reverse engineering shouldn't be necessary. However, Microsoft's XML includes things that are not defined in the specification. That was one of the objections to giving it status as an open standard.

      Vendor extensions are sometimes a necessary evil, but just how much you object to them depends on how much they impact on the comprehensibility of the document by tools other than the ones by the original vendor. Are they generating those in newly-created documents or are they just there in documents converted from a previous format? The latter, while not nice, would be not a great problem as it would be possible to get them documented as vendor extensions for legacy support (even if it was "guerilla documentation" and not official), but if critical new/current features require lots of vendor extensions then that's highly problematic.

      If a tool can read in the document, throw away all the vendor extensions, and still completely understand the document, those extensions cannot be deeply objectionable. (Very few people get worked up about one-pixel layout tweaks, but putting the actual content inside an extension isn't good at all.)

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    19. Re:XML? by gl4ss · · Score: 1

      In to a usable document from scratch? Pretty hard. Ever looked at the XML of a moderately complex document?

      it's doable though.
      if he really wanted to open his powerpoints.. it would be trivial to find sw to open them.

      that's a funny thing now. you can run almost any sw on your modern pc. from practically any system that sold more than 20 000 units.

      --
      world was created 5 seconds before this post as it is.
    20. Re:XML? by Half-pint+HAL · · Score: 2

      Holy shit, yeah, you're right - it's totally impossible to strip out the XML tags and be left with readable plain text content!

      I bet nobody could ever decode it!

      You seem to be assuming a flat-text file with predictable order. Strip the XML out of anything in a tabular format (eg a spreadsheet -- see TFS) and you lose vital data. Blank cells are lost and the tabulated data no longer lines up.

      It gets worse in a filetype with unstructured formatting, eg DTP and slideware. You've got a collection of elements that are only ordered by their metadata. The explanatory labels you want to overlay on top of that image? They're no longer linked to it and you've no way of knowing what they're their for. Multiple news stories on the same page merge into one, and have been divorced from their headlines.

      Readable != useful.

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    21. Re:XML? by gmack · · Score: 1

      It would seem you aren't entirely out of luck. The FC5025 Floppy controller can be combined with the TEAC FD55GFR in order to read Apple II disks.

    22. Re:XML? by DarkOx · · Score: 1

      To be really technical about it no data is lost, but information is. The structure of an xml document describes relationships between its elements.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    23. Re:XML? by NJRoadfan · · Score: 1

      Apple II hardware is still readily available. All the schematics and ROM code for the disk controller is online too (typical minimalist Woz design). The bigger problem is file formats, not so much the physical media. In many cases you can read the disk, but not decode the files into something usable.

    24. Re:XML? by cusco · · Score: 1

      Look at a file from that time and see if you can tell me whether FileName.doc was created in Word 1, Word 2, Word 3, Word Perfect 3, Word for Mac (which was a different file format), Word Perfect 4, AMI Pro, Pfs First Choice, or any of the other programs from that time period which would/could use that extension. Or maybe a Wang word processor? Or an IBM DisplayWriter word processor? There are companies that can do that, but it's far from "trivial".

      --
      "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
    25. Re:XML? by drinkypoo · · Score: 1

      It's hard to do by eye and by hand, but presumably digital archaeologists of the future will have access to some sort of pattern-mining software which will bring order from chaos for them.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    26. Re:XML? by jedidiah · · Score: 1

      Even in 1997, I could run a 1985 OS in emulation and therefore the entire tool chain associated with any file format you would care to name. This problem is not nearly as hard as some people make it out to be. Although it's made artificially difficult by the sort of company Vint is trying to make excuses for here.

      If you are really worried about stuff being readable 20 years from now then perhaps you should really act like it.

      This problem didn't just magically appear yesterday.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    27. Re:XML? by jedidiah · · Score: 1

      > Look at a file from that time and see if you can tell me whether FileName.doc was created in...

      Or I could just use simple tools that can tell me the pedigree of a file regardless of what it's named.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    28. Re:XML? by gweihir · · Score: 1

      Indeed. For MOO-XML, they could have wrapped a Base64 encoding of the old format, it would be about as useful.

      I have my misgivings about ODF though. It may still be too complicated.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    29. Re:XML? by gweihir · · Score: 1

      When you have just the binary output and need to reverse-engineer it, yes, very much so. May even be infeasible in practice.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    30. Re:XML? by cusco · · Score: 1

      Such as?

      --
      "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
    31. Re:XML? by cheater512 · · Score: 1

      Well technically there is a nifty Linux command called 'file'. It will detect exactly what file format any file is in pretty much.

    32. Re:XML? by cusco · · Score: 1

      Didn't know such a thing existed. So it would look at FileName.doc and tell me that it was created by Pfs FirstChoice Version 2.1 or Word Perfect 5.1 for AS/400? That's almost worth having a Linux installation around all by itself.

      --
      "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
    33. Re:XML? by cheater512 · · Score: 1

      Yep. Here are a couple of outputs for various files:

      $ file index.php
      index.php: PHP script, ASCII text, with very long lines, with CRLF line terminators
      $ file doc.pdf
      doc.pdf: PDF document, version 1.2
      $ file doc.docx
      doc.docx: Microsoft Word 2007+
      $ file archive.zip
      archive.zip: Zip archive data, at least v2.0 to extract

      It won't tell you the program used to make it, but it will give intricate details on the exact file format.

    34. Re:XML? by mcswell · · Score: 1

      I don't think you understand "intricate." Reverse engineering a data format from 20 years ago ain't like dustin' crops, boy.

    35. Re:XML? by mcswell · · Score: 1

      "Plain ASCII is the only thing that works." Ever try to encode Chinese in ASCII? Sure, you can do it, in the sense that you can encode any 32-bit sequence as a sequence of five 7-bit characters, but that doesn't mean anyone else will be able to figure it out. Including yourself ten years from now.

    36. Re:XML? by gweihir · · Score: 1

      Actually, it is \latex. Sorry about that.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    37. Re:XML? by gweihir · · Score: 1

      The Chinese have a larger problem, agreed. But Unicode is not going to solve it. They likely need to create a compact, human-readable transliteration in ASCII, or they need to drop their broken and obsolete system. Yes, they are not going to like that, but it is already happening and there is a price to pay for arriving in the modern world.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    38. Re:XML? by Half-pint+HAL · · Score: 1

      Only if you consider "metadata" not to be data, but metadata is data -- data about data.

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    39. Re:XML? by Half-pint+HAL · · Score: 1

      DNA is made up of five elements: carbon, nitrogen, oxygen, hydrogen and phosphorous. If I was to take a single molecule of my DNA and break all the molecular bonds (ie strip out all structural information) and hand you a collection of the resulting atoms, would you have all the "content" of my DNA in any useful sense?

      So please dial back the insults.

      XML may be a structured plain-text representation of the document, but the structure itself has semantics that are not always trivial to decode. In order to "write a piece of software that opens the document and preserves the import information", I would have to decode the semantic links between elements. The whole point of this discussion is that step in the process. Plaintext certainly makes the job easier, but it doesn't make it easy.

      There may be nothing arcane about opening an XML document and doing something with its [NB: no apostrophe] contents, but there may be something very arcane about doing the right thing with its contents. If there's a non-obvious interaction between two elements, for example, as happens in Microsofts OOXML.

      Speaking of which, why don't you go and download a large PowerPoint presentation in PPTX format from slideshare, open it up in a text editor and then come back and call me a "dumbfuck" again when you find it trivially easy to process....

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    40. Re:XML? by RockDoctor · · Score: 1

      frankly, if we're able to decode all the other ancient languages I'm pretty sure someone will be able to decode these as well.

      We're not able to decode all ancient languages. Some of those with a significant corpus of work remain incomprehensible. One example at the borderline of what should be possible is the Phaistos Disc language. Linear-A remains undeciphered, while it's descendent Linear-B has been deciphered. There was probably a more-or-less common language amongst the cities of the "Indus Valley Civilisation", but only scattered fragments of it's (syllable-based ?) written language have been found. And with that list, I've not even left the Indo-Aryan language group - probably.

      Speaking of ancient... Err.. When did Vint go to Google?

      I don't remember exactly ; it was a while ago, after he was doing work for NASA on high-latency networks - i.e. Interplanetary Internet. (Wikipedia says he went to Google in 2005, but work on "Interplanetary Internet" has been wobbling on since the early 1980s.

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
    41. Re:XML? by mcswell · · Score: 1

      Ok, if you don't like that example, try Hangul (the writing system used for Korean) or Perso-Arabic (the writing system used for Arabic, Persian, Urdu, Punjabi, Pashto, and many other languages), or Tifinagh (used for various Berber languages) or Devanagari (Hindi, Marathi,...) or Bengali (Bangla) or Syriac (for Syrian) or Greek (language left as exercise to the student) or Cyrillic (Russian, Ukranian, and almost any Slavic language except Polish and Czech) or Hebrew or...

      But you get the idea. Even many languages that use a "Latin" writing system have diacritics (accent marks, tildes, etc.) that aren't in ASCII.

  2. My data will be readable by drinkypoo · · Score: 3, Informative

    My data will be readable because I use bog-standard formats. If I get really froggy I use HTML, and you can just strip the tags and read that.

    If his data won't be readable, that's his problem. Anything you want to save for posterity, export it now.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:My data will be readable by Anonymous Coward · · Score: 1

      Export it? Sure. Or.... If he took that 1997 PowerPoint, and opened it with each successive Office version, and re-saved it as the latest version, he'd be fine.

      I'm sure there's an automated way to do that with numerous files.

    2. Re:My data will be readable by Bremic · · Score: 5, Insightful

      Until HTML includes DRM and half the stuff you create ends up being unreadable.

      Well, really we are probably good for anything that can be opened in a text editor for a long long while; but the point is there. Anything can be lost to data format shifts.

      As someone who had to re-type a 80 page document because the company stopped using the software the document was created on, and didn't have a licence for it an no converter found online worked - I can say this does happen.

      How many people are going to shell out $600 for software to open something they want to make an edit on? How many are going to just give up and find someone to rekey it, or just give it up as a loss?

      With more and more systems including format locks, in 50+ years historians will likely have a lot of trouble finding out details from today. Kind of like it is now when we go to look at archival film from WWII and find it's all faded into obscurity. We have the same problems, just with different causes. Then it was lack of preservation of a medium with a limited lifespan. Now it's storing stuff in formats that will go away as they are improved upon, blocked, or just forgotten about.

      Sure if your in your 20s, or even 30s, you probably haven't realized the copy of your grandfathers photos are sitting on a floppy disk in a proprietary format. But when you get older you may encounter these issues.

    3. Re:My data will be readable by Nutria · · Score: 4, Informative

      Or NASA data from deep space probes that's stored in now-unknown formats on mag tapes from long, long, long gone manufacturers.

      --
      "I don't know, therefore Aliens" Wafflebox1
    4. Re:My data will be readable by starburst · · Score: 2

      From a 2002 slashdot story:

      mccalli writes :
      "Thought people might find this amusing. In 1986, the UK compiled an electronic [copy of the] domesday book. They used BBC Master computers to do it, and the result was put on laserdisc. I actually used this project whilst at school. This article states that nothing can now read these merely 15-year old discs. The original, written approx. 1086, is still doing fine thank you very much."
      Sounds like a good candidate for Bruce Sterling's Dead Media Project. (Speaking of Sterling, the "graying cyberpunk" has an interesting article in the Austin Chronicle on the upcoming SXSW Interactive conference called "Information Wants to be Worthless" -- thanks to reader ag3n7.)

    5. Re:My data will be readable by ganjadude · · Score: 2

      why didnt you OCR and then make the edits? There are numerous OCR options that would have fit that need no?

      --
      have you seen my sig? there are many others like it but none that are the same
    6. Re:My data will be readable by geniice · · Score: 2

      In fairness they did manage to transfer the stuff off the discs and put the stuff without copyright issues online.

    7. Re:My data will be readable by Concerned+Onlooker · · Score: 2

      "How many people are going to shell out $600 for software to open something they want to make an edit on?"

      The upside to this is that when somebody wants to update that nifty company Flash web site and discovers that Flash now costs an arm and a leg, the site gets re-written in html.

      --
      http://www.rootstrikers.org/
    8. Re:My data will be readable by kermidge · · Score: 2

      Well, there's the problems with the medium itself, then there's the format, as you say (ought to be right up a cryptanalyst's alley, tho), then there's the real blocker: number of tracks, head design, and the circuitry that goes with it. Unless there are good documents for the machine's design and building, or one can be found in working order in a museum, you're SOL. It's a big problem that doesn't get much exposure.

    9. Re:My data will be readable by thsths · · Score: 1

      > If his data won't be readable, that's his problem.

      Actually the problem is (as usual) in front of the screen. There are many programs that can read a 1997 powerpoint file - he just picked the one version of office that does not.

      Using a format that is designed for compatibility would also solve the problem. PDF is pretty good, but there can be issues with embedded objects. PDF/A seems to be a safe bet, maybe missing some nice features, though.

    10. Re:My data will be readable by Dr_Barnowl · · Score: 1

      The main problem was that the project came too early ; it was innovative.

      It used very uncommon hardware - you needed a 12" analogue / digital laserdisc reader, and you needed an uncommon add-on CPU unit for the BBC Micro.

      Only a few years later, CD-ROM became ubiquitous, with the 700MB basic disc size being more than double one of the single 300MB sides on those 12" discs, although I'm not entirely sure whether the purely digital CD-ROM would have enough storage to cope with encodes of the analogue video tracks. A single DVD-ROM could probably house the whole project, along with a bootable OS and emulator to run it.

    11. Re:My data will be readable by Kjella · · Score: 1

      As someone who had to re-type a 80 page document because the company stopped using the software the document was created on, and didn't have a licence for it an no converter found online worked - I can say this does happen.

      I'm rather surprised that with todays VM/emulation solutions companies haven't figured this one out, unless you've sold the licenses or just leased/rented them in the first place keep at least one license of your old technical platform. That way you should at least be able to get a copy-pasted version or OCR a copy written to PDF.

      --
      Live today, because you never know what tomorrow brings
    12. Re:My data will be readable by Half-pint+HAL · · Score: 1

      If I get really froggy I use HTML, and you can just strip the tags and read that.

      Only if you're very, very careful, because if you strip the tags, you lose the ALT attributes on your IMG tags, which means you're ditching the plaintext fallback for the non-textual information in the page..../p.

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    13. Re:My data will be readable by Nutria · · Score: 1

      number of tracks, head design, and the circuitry that goes with it

      That's exactly what I was thinking when I referred to long-gone manufacturers. Otherwise, you could toss them on "modern" IBM 9-track drives and pull the data onto modern media for decipherment.

      --
      "I don't know, therefore Aliens" Wafflebox1
    14. Re:My data will be readable by cusco · · Score: 1

      That's why volunteers at the Planetary Society had to pull a computer and drive out of (literally) a computer museum to read the Pioneer data tapes. The tapes themselves were actually in fairly good shape, they had been stored in a controlled environment for all that time, but they were unreadable. (The miniscule storage costs were the excuse that the Bush Madministration used to order the data destroyed; not 'disposed of' but 'destroyed', and they were mightily pissed when NASA management handed them over to the Society.)

      --
      "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
    15. Re:My data will be readable by cusco · · Score: 1

      The law firm that my mom worked for bought an AS/400 with Word Perfect as a networked word processing system (yes, the salescritter should have been shot). Good luck getting any of those documents open, even with Word Perfect for DOS or Word Perfect for Mac. They had interns and Manpower temps re-typing documents for half a year when that thing went away.

      --
      "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
    16. Re:My data will be readable by IndustrialComplex · · Score: 1

      As long as we aren't talking about post WWIII levels of tech, I don't see how this will be a problem.

      Interpreting the data is trivial compared to preserving the data. As long as it isn't encrypted, getting useful data out of a format, even a long dead format on a long dead piece of equipment will be possible. Potentially hard and expensive, but possible.

      Recovering data from formats which have been allowed to deteriorate is a much bigger problem because you aren't dealing with extracting data from a difficult medium, you are dealing with data that is no longer there at all! That's the problem with old tapes and other formats.

      --
      Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
    17. Re:My data will be readable by Nutria · · Score: 1

      Madministration

      I see what you did there!

      I've seen too many people (a) misinterpret the blatantly obvious, in the zealous surety of their rightness, and (b) lie for political advantage.

      Thus, just as I'm withholding judgment on BO's apparent evilness, I withhold judgment on the alleged reason why the tapes were to be destroyed.

      --
      "I don't know, therefore Aliens" Wafflebox1
    18. Re:My data will be readable by anagama · · Score: 1

      Imagine if they didn't have paper copies. They'd have been screwed rather than just annoyed and slightly poorer.

      My business partner recently lost the all of her baby pictures for the first two years of her first kid. Not from hardware failure, but as best we can figure it relates to an issue in 2010 where updating iPhoto caused data loss. The time machine backup does not extend back before 2010 because the drive was replaced at some point (how many non-savy tech users think to backup their backups?). As a result -- they're totally gone.

      In contrast, I have all my baby photos from the 60s and 70s. Some a bit tattered, and instead of thousands you get when people use digital cameras, maybe 50 or so, but because they're on paper I have them. Reading them requires no unavailable technology -- just eyes.

      I love technology in general, but I've been bitten by it. If I really want to make sure I have the best chance of keeping something, I print it out. I don't print out everything -- the nature of digital content is that it allows people to store a huge amount of crap (like thousands of photos only slightly different from each other when one would be sufficient) for almost no cost. But that cheapness makes people devalue the little bit in that pile, that they really don't want to lose. And then they lose it and would pay almost anything to get it back.

      And yes, I know papers and photos fade, but the process is slower. Typically with a computer, it's working and available one second, and gone the next. You have a lot more time to correct poor storage techniques with physical documents. And yes, papers and photos can get burned up -- but you can make offsite backups of these things too.

      --
      What changed under Obama? Nothing Good
    19. Re:My data will be readable by jedidiah · · Score: 1

      It sounds like he could have just saved it to PS in 1997 and have been done with it.

      I have a Project Gutenberg CD from 1994 that's still perfectly usable because the data is isn't in some proprietary format. The data migrated off of optical media long ago and resides in various places within backup copies of my media hoard.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    20. Re:My data will be readable by jedidiah · · Score: 1

      The funny thing is that Word Perfect used a markup system and exposed the markup codes to the end user so the markup could be directly manipulated.

      A competent typist from back in the day could probably just read the file directly.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    21. Re:My data will be readable by cusco · · Score: 1

      I missed the Reveal Codes feature for a long time when I had to change to MS Word. The Word equivalent was primitive in comparison.

      The problem apparently was that the AS/400 files couldn't be exported to PC format. (Although knowing IBM it probably could be done, but no one that her office had access to know how to do it so they were told that it was impossible.) When the AS/400 went away so did all the data.

      --
      "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
    22. Re:My data will be readable by uninformedLuddite · · Score: 1

      Someone should have lost their life let alone their job for that

      --
      The new right fascists are bilingual. They speak English and Bullshit.
    23. Re:My data will be readable by metaforest · · Score: 1

      Kinda funny. some time ago I met a codger that wanted to speel-in some old wire recordings...

      It was impossible. Too much cost for too little.... he had no idea if any of the reels were blank...

      We could not even determine for sure what mechanical format the reels were. Who made them? (they are unmarked) What system(s) they were compatible with no clear coding on the spools? (Mechanical dimensions are not searchable... and are ambiguous for that tech)

      It was a wash. I'm sure those reels are quite readable, but short of engineering a custom machine... they are effectively unreadable.

    24. Re:My data will be readable by kermidge · · Score: 1

      Two of my elder cousins had a wire recorder, taped stuff off radio mid- late-'50s. My uncle had brought the recorder back from a lab he'd worked at during the War. It was a commercial model (company, I can't remember). I though it was an ingenious bit of engineering at the time (I was what, 8yrs. old?)

      I don't know the exact kind of stuff needed, but it should be readily possibly to differentiate the recorded places on the wire, then on to the scheme for encoding. It would have to be fairly simple given one-dimension to work with. But I can see that to do so usefully would be decidedly un-trivial.

      I'm glad y'all at least gave it a try.

    25. Re:My data will be readable by metaforest · · Score: 1

      When first researched the project it became clear that using steel wire like that creates some rather heinous mechanical constraints. The electronics are bonehead simple... The mechanical stuff needed to safely transport the wire and bale it properly onto a reel is fiendishly difficult to get right. Failing to get it just right makes any further attempt to read the wire impossible due to tangling. Anyone who has had to de-tangle a fishing reel knows what I mean. Now to add more difficulty, the line is brittle and a little thinner than human hair.

    26. Re:My data will be readable by petermgreen · · Score: 1

      It wasn't an electronic copy of the domesday book, it was a project collecting various stuff including photos and videos from schools that was supposedly in the spirit of the domesday book and putting it into a newflangled computer based system.

      The problem with that project was it was ahead of it's time and as such needed some pretty esoteric hardware*. Normal computing hardware from that era is still easy enough to find but the esoteric stuff needed for the domesday syste is not.

      * Specifically it used a BBC master (common) with a 6502 second processor card (fairly rare), a SCSI card (very rare) and a specific model of laserdisk player (very rare)

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  3. emulation / virtualization by smash · · Score: 2

    Support emulatorVM developers! Encapsulate your entire machine in a VM and you can run the entire software stack if necessary. Anything you need convenient access to, export to CSV, XML or some other standard format.

    --
    I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    1. Re:emulation / virtualization by lister+king+of+smeg · · Score: 1

      and when Unicode and ASCII are replaced?

      --
      ---Saying gnome 3 is better than windows 8 not so much a compliment as it is damning with light praise.
    2. Re:emulation / virtualization by cheater512 · · Score: 1

      How do you encapsulate the VM so it will still work 20 years in the future?

    3. Re:emulation / virtualization by Anonymous Coward · · Score: 5, Funny

      You're very clever, young man, very clever - but it's VMs all the way down!

    4. Re:emulation / virtualization by phantomfive · · Score: 1

      There's a pretty good chance LaTeX will still support them. There's a reason the TeX distribution is like 2 Gigs.....

      --
      "First they came for the slanderers and i said nothing."
    5. Re:emulation / virtualization by fuzzyfuzzyfungus · · Score: 1

      Unicodes is a bit sprawling; but ASCII is only 128 characters(unless dealing with the wonderful world of nonstandardized non-latin extensions or ad-hoc 8-bit extensions-of-convenience is your problem, in which case I'd advise shirking your duties and drinking heavily), making preserving the whole thing even by chiselling it into stone monuments or other archaic methods potentially viable.

    6. Re:emulation / virtualization by Mitchell314 · · Score: 2

      Honestly, reverse engineering ACII plain text files would be trivial. Not to the average person, but to somebody with a bit of background:
      A) We have software that can use something called frequency analysis to decipher something encoded that has a 1-1 correspondence so something we know (ie the english alphabet).

      B) Ignoring software, frequency analysis is something that could be (and before the days of computers, was) done by hand. Hell, some things could be picked out by eye. For one, all files would have a particular byte character that appears near the end of every (well formed) text file, as well as often appearing periodically through the average file. A key indicator of being a newline/carriage return. Also in the bulk of most documents the new line is followed by a particular other character that also appears in a periodic manner. Being the period. And then another character appearing often every so often (on average around 5-6 characters), a good candidate for the space character. I and A also being somewhat easy to pick out (the whole upper/lower case making it a bit harder, but still doable). With a bit more dedication, you can start guessing common words, such as a common letter followed by a less common letter followed by a very common letter ('the' sounds like a good candidate). And then to figure the rest out, compare the average frequencies of characters across many documents to the average frequencies of letters and punctuation in documents we already know. A decent undergrad senior in computer science could write a program to do this. Hell, I took a sophomore level math class that went over this.

      --
      I read TFA and all I got was this lousy cookie
    7. Re:emulation / virtualization by mrsurb · · Score: 1

      ASCII is even easier than that - because 0-9, a-z and A-Z are represented by sequential binary numbers.

    8. Re:emulation / virtualization by geniice · · Score: 2

      There are a few industrial setups where that is pretty much what has happened.

    9. Re:emulation / virtualization by thediv17 · · Score: 1

      Indeed, I have run PalmOS and Windows Mobile VM's on a Windows 2000 VM running inside VirtualBox and it worked fine.

    10. Re:emulation / virtualization by smash · · Score: 1

      Virtualbox has an open source edition. If you think x86 VMs are going anywhere you are mistaken.

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    11. Re:emulation / virtualization by smash · · Score: 1

      Also.... I already have VMs that run software from 20 years ago (well, 1995 - close enough).

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    12. Re:emulation / virtualization by smash · · Score: 3, Interesting

      err... plus DosBox is running x86 software I have from 198x...which is 30+ years now.

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    13. Re:emulation / virtualization by StripedCow · · Score: 1

      Encapsulate your entire machine in a VM and you can run the entire software stack if necessary.

      Yes, but what about my Google doc stuff?
      Can you run Google in a VM?

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    14. Re:emulation / virtualization by devman · · Score: 1

      You can export all that data in ODF formats.

    15. Re:emulation / virtualization by kfall · · Score: 1

      Also related to the "there are people that do history" below, 'VM curator' is the new librarian... e.g., https://olivearchive.org/

    16. Re:emulation / virtualization by mcswell · · Score: 1

      And if it's not English? You know, one of those 6999 other languages. Or maybe a programming language.

    17. Re:emulation / virtualization by smash · · Score: 1

      This is why i don't trust important stuff to somebody else's cloud.

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    18. Re:emulation / virtualization by Mitchell314 · · Score: 1

      If what is not English? In practice, in these studies you try to do across many varied documents so outliers don't throw you off. If you are testing a bunch of files that were found in random storage devices in the US, it's safe to assume that English is the majority language. For a different country, a different language. As far as I know, frequency analysis works well with many alphabetical languages.

      --
      I read TFA and all I got was this lousy cookie
    19. Re:emulation / virtualization by Macgrrl · · Score: 1

      *snort*

      Best use of this line I've seen in a long time.

      --
      Sara
      Designer, Gamer, Macgrrl in an XP World
  4. We should have listened by Anonymous Coward · · Score: 5, Insightful

    We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.

    I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.

    1. Re:We should have listened by Lehk228 · · Score: 1

      what's this "we" shit all my files are odt, ods, html, tex or txt files. they will be just as accessible in 100 years as they are now.

      --
      Snowden and Manning are heroes.
    2. Re:We should have listened by Tr3vin · · Score: 1

      Without developers maintaining editors/viewers, open formats are only slightly more usable than proprietary ones. 100 years is a really long time from now as far as technology goes. I wouldn't be so quick to say that open formats will still be easily accessible.

    3. Re:We should have listened by Wolfling1 · · Score: 1

      And we're not doing it now with Apple products?

    4. Re:We should have listened by plover · · Score: 2

      Actually, languages have been consolidating and standardizing rapidly with the advent of the printing press, effective and affordable transportation, broadcast media like TV and radio, and the Internet. Diversity of language is rapidly disappearing.

      The way things are going now, there will be only a few dozen languages left at the end of this century, and possibly only a handful after the hundred years that follow.

      Although it's entirely possible that technology will preserve native languages, too. If machine translation becomes as easy as slipping a Babel fish in your ear, people won't feel the need to drop their mother tongue for English or Mandarin.

      No matter what, we'll all still be yelling hateful things at each other, but at least we'll understand the insults the other guy is hurling.

      --
      John
    5. Re:We should have listened by geniice · · Score: 1

      Wrong tense. There are enough converters emulators around at this point that we can read anything halfway mainstream as long as we can read the hardware its on.

    6. Re: We should have listened by Guspaz · · Score: 1

      Even in the same language, cursing doesn't always translate. In Quebec French, all the curse words are church terms. Somebody from France probably wouldn't even recognize it as cursing, they'd just wonder why someone was angrily reciting a church-related vocabulary list.

    7. Re:We should have listened by lgw · · Score: 1

      We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.

      I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.

      Early file formats were all proprietary, and Microsoft was far from the worst. In the early mainframe days, you didn't really even have the concept of a "text file" (what an old mainframe called a file is what we'd call a partition). One common approach for individual text files was to keep things in the printer queue. No joke - there was no filesystem way to write a small text file to disk, but you could print a file easy enough, with metadata that kept it around in the queue, and there were common tools to let you read them on your terminal directly from the print queue.

      For anything advanced enough that each document is a discrete file on a filesystem, the file converters will outlive the original media. You'll be able to find, say, WordPerfect to Word converters long after the last 5.25" floppy becomes unreadable.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    8. Re:We should have listened by Lehk228 · · Score: 1

      zip is already 24 years old, ASCII is 50 years old. .od* files are zipped ascii XML, it will be parsable as long as anyone is interested in doing so.

      --
      Snowden and Manning are heroes.
  5. Re:So? by MrBandersnatch · · Score: 5, Insightful

    I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....

  6. Yes, backwards compatibility, blah blah blah... by Narcocide · · Score: 5, Insightful

    Yes, you're right I have this ASCII text file created in 1997 and I can't find anything to read it...

    OH WAIT ACTUALLY FUCKING *EVERYTHING* STILL READS IT.

    Stop gargling Microsoft's balls so much and wipe off your chin. Proprietary data formats are THE PROBLEM. Stop trying to redirect public discourse with this thinly veiled bullshit.

    1. Re:Yes, backwards compatibility, blah blah blah... by Nerdfest · · Score: 4, Informative

      Odds are that you don't need to convince Vint Cerf or Google in general about the advantages of open formats.

    2. Re:Yes, backwards compatibility, blah blah blah... by cheater512 · · Score: 1

      But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

    3. Re:Yes, backwards compatibility, blah blah blah... by PPH · · Score: 2

      Just Googled "ebcdic to ascii converter"

      About 123,000 results.

      --
      Have gnu, will travel.
    4. Re:Yes, backwards compatibility, blah blah blah... by Anonymous Coward · · Score: 1
    5. Re:Yes, backwards compatibility, blah blah blah... by ImperialSardaukar · · Score: 1

      122,999 of those are empty pages full of malware, or porn (often, both).

    6. Re:Yes, backwards compatibility, blah blah blah... by Narcocide · · Score: 1

      Yes, you're right and maybe this is the part I am having trouble coming to grips with. He seems like the last guy who should be spouting this line of rubbish. I feel like I'm in a bad B-rate horror movie and the body snatchers just got to the President...

    7. Re:Yes, backwards compatibility, blah blah blah... by fuzzyfuzzyfungus · · Score: 2

      But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

      How deep are your pockets?

      *IBM Consulting*

    8. Re:Yes, backwards compatibility, blah blah blah... by Nerdfest · · Score: 1

      Yes, the Talk XMPP shutdown and Google Reader are a little disturbing. We're as far as we are with the ubiquity of the internet because of open formats enabling intercommunication and competition between products and services by different providers. That seems to be going away again in favour of platform lock-in with things like iMessage, FaceTime, etc. Google's Hangouts are at least cross platform, but that's really only a mild improvement. You still need to use Google's implementation. I'm just happy I can still use the stuff under Linux for the most part. I'm a little worried about the future, as short sighted greed seems to have taken over.

    9. Re:Yes, backwards compatibility, blah blah blah... by Anonymous Coward · · Score: 5, Insightful

      But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

      $ printf "\xC5\xC2\xC3\xC4\xC9\xC3\x25" | iconv -f ebcdic-us -t ascii
      EBCDIC
      $ dpkg -S `which iconv`
      libc-bin: /usr/bin/iconv
      $ apt-cache show libc-bin | grep -e Essential -e Priority
      Essential: yes
      Priority: required

      So we got a program that can convert from EBCDIC-US to ASCII (or UTF-8 or whatever you want) and that program is in an Essential/Required package on any Debian-based system and for some reason you say that "aren't commonplace"?

      Are you on crack?

    10. Re:Yes, backwards compatibility, blah blah blah... by aaarrrgggh · · Score: 1

      ...everything except that Zip Drive I saved it on.

    11. Re:Yes, backwards compatibility, blah blah blah... by BrokenHalo · · Score: 1

      Just Googled "ebcdic to ascii converter"

      I don't even need to do that, because I still have one I wrote in Fortran 4 back in the early '70s when I was converting a suite of banking programs to migrate them off Burroughs mainframes. I'm quite sure it'll go through the GNU Compiler Collection without much modification.

    12. Re:Yes, backwards compatibility, blah blah blah... by gweihir · · Score: 1

      But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

      Well, they are available in any halfway complete Perl installation as standard. So I would say you have no clue...

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    13. Re:Yes, backwards compatibility, blah blah blah... by chrismcb · · Score: 1

      Proprietary data formats aren't exactly the problem either. Sure your ASCII text file is readable, and probably always will be. But that doesn't mean 100 years from now you'll be able to understand the format of the ASCII data stored there, or that programs still exist to read it... And that is the issue.
      Of course it might be easier to hack an XML file, you still need something to understand the format. Whether it is proprietary, or a standard.

    14. Re:Yes, backwards compatibility, blah blah blah... by RedHackTea · · Score: 1
      With an Open Source file format, I think it'd be pretty easy in 100 years to write a program to -- at the very least -- convert it to a new Open Source format. If you're a programmer, you know that this is trivial. The only thing you have to worry about is losing the specification or lack of examples (which is extremely rare depending on the popularity of the format).
      1. Step 1: Pick a modern language; today that may be Ruby
      2. Step 2a: Read the Open Source specification; this should easily be preserved on Wikipedia or within other "libraries"
      3. Step 2b: Or, if Step 2a does not exist, search for an example program with code (I've never studied BASIC, but I can read it; programming languages have a lot of common factors)
      4. Step 3: Convert to new Open Source format (since making a Viewer/Editor is probably pointless); share on modern versioning system (today that would be Git) with F/OSS license and spread amongst the geeks

      Anyone that says that "Proprietary Formats" aren't the problem are spreading shill -- either from ignorance or from being force-feed by M$ for so long. Proprietary formats may not be the whole problem, but they are 100% part of the problem. If you're not a coder or a geek, then maybe you have an excuse from lack of understanding. Geeks love to preserve stuff and tinker with old technology; read about Atari recently on /.? How many user forums are there are but the most oddball relics? As long as Wikipedia and our desire to store the past holds up, I don't see any problem. If I could, I'd bet my life on being able to still either read/convert the ODT format or code a program to convert the ODT format to a modern format in 100 years.

      The scary part... this isn't even just documents. This deals with audio files (why I use FLAC/OGG), images, videos, etc., etc....

      --
      The G
    15. Re:Yes, backwards compatibility, blah blah blah... by kermidge · · Score: 1

      A little disturbing? I'd say a whole lot disturbing - locking out open protocols for closed ones disturbs and dis-enfranchises participants; disrupts, rejects, and disables open communication for the poor bargain of yet another closed system of sets of walled-off users. Maybe it makes some kind of short-term business sense, otherwise seems fucking stupid to me.

      The open exchange of ideas is reduced to walled-off ghettos of gossip.

    16. Re:Yes, backwards compatibility, blah blah blah... by felipekk · · Score: 2

      Just Googled "oranges to apples converter"

      About 4,780,000 results

    17. Re:Yes, backwards compatibility, blah blah blah... by ideonexus · · Score: 1

      I think this is more than just Microsoft. It's crazy the lengths I have to go to sometimes if I want to resurrect a 10-year-old game on my modern PC. Switching to 64-bit Windows also killed a number of old programs I used to run in x86--even though they should run in x86 mode, they don't. I agree with you that the vast majority of issues are with proprietary software, but discontinued open-source projects regularly suffer the same fate.

      Kevin Kelly had a good article on this at the Longnow blog, where he makes the argument that the only way to preserve digital data is to perpetually migrate it to new systems and formats. It seems extreme, but I don't know if I see an alternative; othewise, if not for the work of volunteers we will loose much of our digital history.

      --
      i ~ Celebrating Science, Cyberspace, Speculation
    18. Re:Yes, backwards compatibility, blah blah blah... by karmaflux · · Score: 1

      XMPP would like to have a word with you.

      --

      REM Old programmers don't die. They just GOSUB without RETURN.

    19. Re:Yes, backwards compatibility, blah blah blah... by david_thornley · · Score: 1

      Yeah, but the .txt files in my old Mac 400K floppies aren't easy to read any more, let alone the 5.25" Radio Shack floppies.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    20. Re:Yes, backwards compatibility, blah blah blah... by uninformedLuddite · · Score: 1

      I'd like to party with you

      --
      The new right fascists are bilingual. They speak English and Bullshit.
    21. Re:Yes, backwards compatibility, blah blah blah... by BrokenHalo · · Score: 1

      :-)

      Contracting quite often did involve boring jobs, especially at that site. However, that little task at least got me out of COBOL for a day or so (plus time for testing). And they did pay me a lot of money for my time.

    22. Re:Yes, backwards compatibility, blah blah blah... by uninformedLuddite · · Score: 1

      I did a job a few years back in AU on a PDP-11 that is still being used by a very large concern. They still used RT-11, TSX, and doing all of their in house WP on LEX-11. I kid you not. It was like stepping through a time warp with Vt-52's and 100's running. They never upgraded their stuff as the custom written software was still doing the job required of it after all these years.

      --
      The new right fascists are bilingual. They speak English and Bullshit.
  7. We have an app for that by uberbrainchild · · Score: 1

    If there is a demand to open up and view a certain file type there will always be someone to create an app or website which will either open up the file or convert it to a more compatible format. There are already services out there that convert word to pdf for example oh and I just found an iPhone app for converting files, yay!

    --
    Anveto
    1. Re:We have an app for that by Prof.Phreak · · Score: 1

      By the time YOU care to convert a file and can't... there's no app, and NOBODY but you gives a damn about that file you got.

      --

      "If anything can go wrong, it will." - Murphy

  8. Re:So? by Mitchell314 · · Score: 1

    Man, fuck the future (that's right you historians-not-yet-born). They have all the flying cars and meal-in-a-pill's and immortality clinics and shit. The hell have they done for us to deserve our sympathy? If that means we can make them have to work that much harder to see how life was now, I say do it.

    Now back to my zombie virus work. Anybody got a decent time capsule for me to use?

    --
    I read TFA and all I got was this lousy cookie
  9. DRM and the digital black hole by Neo-Rio-101 · · Score: 4, Interesting

    A perfect example of this is basically the issue of old video games. (I may as well bring this up because it's going to come up)

    Recently, the Internet Archive stored a whole pile of TOSEC collections of games from various old systems (thanks to their DCMA exemption of being an archival repository so that they can legally do this). Data and information that would have otherwise been completely lost into a digital black hole, if it weren't for the fans of the system, and the dedicated teams of people collecting and amassing this software as a hobby.... in breach of copyright.

    The problem with DRM is that without dedicated crackers and pirates, unless the original rights holders are around long enough to resell old titles for that long (which most aren't), old games will simply disappear into a digital copyright black hole and never be seen again. This happens once the computer/console system system is old, not sold anymore, and forgotten about, and the media degrades and isn't backed up in some form (in breach of EULA). If people aren't able to collect the software and hang on to it, preserving/duplicating the media while still in copyright, it's going to vanish. Culturally important games of significance will be lost forever, and that, if anything is as much a crime as it is to pirate software in the first place.
    It's only due to the efforts of an army of swappers/crackers, etc, that most of the old games on old systems were even preserved.

    The steam model on PC is quite good though as it makes a few compromises where you can actually make backups and go offline if you want.
    For old computers and consoles however, this doesn't apply,.... and with some more restrictive attempts to squash the used game market, and force internet-always-connected authentication on upcoming consoles to even play the game... one has to wonder if the game companies deliberately want to squish all traces of their old work, let it disappear into the ether, and to resell you this year's football game which is just like last year's. I fear that this is where we are headed (if we aren't there already)

    --
    READY.
    PRINT ""+-0
    1. Re:DRM and the digital black hole by jeffasselin · · Score: 4, Interesting

      What about online-only games? Will historians in 100 years be able to play WoW and see what the game was like?

      --
      If he explores all forms and substances Straight homeward to their symbol-essences; He shall not die.
    2. Re:DRM and the digital black hole by Mitchell314 · · Score: 2

      Luckily for them, no.

      --
      I read TFA and all I got was this lousy cookie
    3. Re:DRM and the digital black hole by timeOday · · Score: 2

      Nor will they be able to join in World War II to see what that was like. However there is more recorded footage of WoW than WWII for future historians to study.

    4. Re:DRM and the digital black hole by JustOK · · Score: 1

      They can just watch the movie

      --
      rewriting history since 2109
    5. Re:DRM and the digital black hole by Sockatume · · Score: 1

      That'd depend on whether Blizzard turns over server code and whatever authentication they use (or a version of the game without such authentication) to archivists.

      --
      No kidding!!! What do you say at this point?
    6. Re:DRM and the digital black hole by Chris+Mattern · · Score: 1

      Will historians in 100 years be able to play WoW and see what the game was like?

      Historians can't play WoW *now* and see what the game used to be like.

    7. Re:DRM and the digital black hole by gmezero · · Score: 1

      Actually, it's quite possible. There are frequently dedicated fans for MMOs willing to reverse engineer the servers and setup hosts to keep the clients usable. For instance, consider Phantasy Star Online. There is a free private server distribution called Blue Burst that can be configured (along with some LAN trickery) to allow Dreamcast, Xbox, and early PC versions of the game to still authenticate against a server so they can simply boot!

      While PSO fans can get pretty nutty, I'm going to go out on a limb and say WoW fans are even more fanatical and one of them will come up with a solution.

  10. Print Everything! by dohzer · · Score: 1

    Print Everything!
    Problem solved.

  11. "Files that Last" by ddyer-bennet · · Score: 1

    Saw info on a book on this topic today, in fact: http://filesthatlast.com/about/ . Looks interesting so far.

  12. Don't forget DRM by onyxruby · · Score: 4, Insightful

    Were living in what could well be a future dark age for archeologists / historians. Hardly anything is put into a nice hard format (stone is incredibly rare and metal gets stolen) for someone to find. What's left suffers from incompatible file formats, acid based paper that decomposes, bit rot, cryptography, incompatible technology for data storage and worst of all DRM. With DRM you have active measures that try to prevent something from being usable.

    In the old days people stopped use with armed guards, obfuscation and primitive crypto. Today we have servers that are required for operational functionality for many products. With the advent of the cloud you have reasons for storing things where you have a dependency on a third party. How many services that are cloud / server based have come about and gone tits up?

    Even having a large well known brand name doesn't protect you from having a server shut down. Just think of Microsoft's play4sure service that lasted less than a decade. Having a license and a physical disk isn't that helpful when the DRM requires an authentication server that doesn't exist. With the movement to put more and more DRM into the cloud or with SSL certificates (again dependent upon servers and naturally time bombed) this is going to be a problem that will only grow worse.

    Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

    1. Re:Don't forget DRM by phantomfive · · Score: 1

      Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

      That is utterly a waste of time. It makes me sick to think of how much good effort is wasted jailbreaking the iPhone, when Apple could merely write a few lines of code and none of that would have been necessary. The entire jailbreak community around Apple is compensating for a few lines of code.

      I say that with complete respect for the jailbreakers, but it could be so much better.......

      --
      "First they came for the slanderers and i said nothing."
    2. Re:Don't forget DRM by drinkypoo · · Score: 1

      It makes me sick to think of how much good effort is wasted jailbreaking the iPhone

      If that makes you sick, how do you sleep at night with all the important things going on in the world?

      I am a bit confused at why so many people are willing to spend so much effort on a closed platform when there's a substantially less-closed platform next door that does all the same stuff. I don't think the iDevices are particularly bad or Android all that much better, I just don't get why anyone would feed Apple when they are so much more abusive to the user than Google. And don't give me all that guff about Google services, alternatives exist for all of them and you don't even need a Google account. You'll lose very little (for instance, you're definitely not going to get any kind of assisted location service which will integrate with the device, but if you don't want to be tracked by Google you'll have turned that off and rely on GPS anyway) and you'd have to lose all the same stuff to not be tracked by Apple, if not being tracked is your thing.

      For the record, I use google login and services, but I don't use assisted location. I don't particularly need to leak information about my APs to Google...

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:Don't forget DRM by phantomfive · · Score: 1

      If that makes you sick, how do you sleep at night with all the important things going on in the world?

      Same as everyone else, I ignore the problems I can't deal with while I fall into sweet, deep sleep.

      I am a bit confused at why so many people are willing to spend so much effort on a closed platform when there's a substantially less-closed platform next door that does all the same stuff. I don't think the iDevices are particularly bad or Android all that much better, I just don't get why anyone would feed Apple when they are so much more abusive to the user than Google.

      Yeah, it's sad. Although some of them have managed to swing $400k jobs from it, so I guess it kind of worked out?

      --
      "First they came for the slanderers and i said nothing."
  13. *sigh* by MrBandersnatch · · Score: 2

    Digital archival is one of the HARD problems. Over the last 40 years we have already lost more cultural artifacts that were created for the entirety of human history. A great deal of that is useless garbage of course but the original moon landing tape? 1000s of government emails reavealing exactly what was going on at pivotal times in history?

    The truth is, we need systems for hardcopy; digital is too tranient; emulators are a useful stop gap measure but dont protect againt the kinds of catastropic failures that we will likely see over the longer time frame; and we need indexing because someone at somepoint will want to wade through our digital ditritus.

    1. Re: *sigh* by AvitarX · · Score: 1

      Over the last 40 years we have already lost more cultural artifacts that were created for the entirety of human history.

      If this is true, it stands to reason we are creating cultural artifacts at such an increased rate that even if only a small percentage survive, future generations will have a more detailed picture of now than has existed in the past. I believe I read that half of all photographs ever taken were taken in the last few years. It doesn't take a high keep rate for things to be better preserved than from any other time in history.

      --
      Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    2. Re: *sigh* by Dogtanian · · Score: 1

      If this is true, it stands to reason we are creating cultural artifacts at such an increased rate that even if only a small percentage survive, future generations will have a more detailed picture of now than has existed in the past.

      You hit the nail *right on the head*. I've said exactly the same thing myself in response to the "OMG the present-day is going to be a digital dark age in centuries to come!!!!!!!!!!!!111111111" stories.

      Yeah, we're losing more because we're creating and storing ludicrous amounts of information compared to what we used to. Even if we lose a much higher percentage of that than we did with hardcopied information, and even if only a tiny percentage overall survives, we'll still have way more than we have compared to previous ages.

      Of course, if you're attached to a *specific* (e.g.) photograph, then yes, there are problems associated with digital storage that mean it might be lost, and you may still have to take steps to preserve it- but that's a different issue. There's so much out there- in general- that enough of it *will* survive to provide a representative portrait of our society.

      IMHO we're already at the stage where we're storing too much information (i.e. random crap on Facebook that will be around forever and may bite you on the arse in future rather than being able to healthily move on and leave the past behind like people were able to do in previous generations).

      --
      "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
  14. This is news? Nope. Not new... by flogger · · Score: 2

    This has been true of all technology in the past and will continue into the future. Just look at film. How many preserved films from 1915 are still around? Just the ones that were recorded into a new format of film, then a newer format of film, then into a VHS, then into a LaserDisc, then a DVD, then a BlueRay... (Metropolis, I am looking at you.)

    Within arms reach, I have Floppy drives that contain files created in AMI Pro work processors.... WHen I say Floppy, I am talking about the 5 1/4 inch floppies.
    Technology hardware and software is not stagnant... It will always continue to develop and progress (ignore windows 8). Data that is worth keeping will get converted. Data that isn't will get left behind. I would not be surprised that in about 25 years, there will be "classic" software as there is Classic literature...

    Too much typing.. going back to drinking.....

    --
    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
    "First things first -- but not necessarily in that order"
    -- The Doctor, "Doctor
    1. Re:This is news? Nope. Not new... by Anonymous Coward · · Score: 1

      This has been true of all technology in the past and will continue into the future. Just look at film. How many preserved films from 1915 are still around? Just the ones that were recorded into a new format of film, then a newer format of film, then into a VHS, then into a LaserDisc, then a DVD, then a BlueRay... (Metropolis, I am looking at you.)
       

      A lot of those films were purposely destroyed because they weren't seen as useful anymore. Put yourself in the time (well, anytime before 1970 or so): there's no home media. If you go back before 1945-1950, there's no television either. If a film can't be shown theatically, there's no way for it to be seen. And there's only so much room in the cinema. Theaters have one screen and there's all that many of them. Then there's all the pre-sound films. Who wants to see those again? (People now have little interest in black and white films from the 1950s. Even people in the 1940s had better things to do than watch silent films from 20 years ago.)

      So what do you do with all these old movies? Keep them around and wait for some practical form of home video? Hard to envision that such a thing would be developed--and in fact, it took 50 years from the start of widespread motion pictures for that to happen.

      The problem they had them is the same one we have now: no one cares about stuff until it gets really old. There's more interest in really old films (say from the 1920s) than moderately old films (films from the 1950s that have been lost, mostly B movies). So who wants to keep all this crap for 80+ years in the hope that one day, someone will want it?

    2. Re:This is news? Nope. Not new... by geniice · · Score: 1

      You don't need most of the things on that list. B&W Cellulose acetate film stored at low temps would still be around. In principle Nitrocellulose stored at low enough temps might have survived but you'd need to get postgrads or other expendable people to handle it.

    3. Re:This is news? Nope. Not new... by AliasMarlowe · · Score: 1

      You don't need most of the things on that list. B&W Cellulose acetate film stored at low temps would still be around. In principle Nitrocellulose stored at low enough temps might have survived but you'd need to get postgrads or other expendable people to handle it.

      Even some of the earliest movies have been digitized from ancient film. For example there are collections of shorts by Edison (1899-1902), as well as items like The Little Match Seller (1902), The Great Train Robbery (1903) or both halves of the Chicago-Michigan Football Game (1903). Obviously these are silent and monochrome, and in some cases the original was imperfect.

      --
      Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    4. Re:This is news? Nope. Not new... by NJRoadfan · · Score: 1

      The biggest losses in cultural resources wasn't from the degradation or the inability to play back the media, but from deliberate wiping of the programs off of video tape for reuse. For example, there is no known surviving copy of the entire broadcast of the first Superbowl.

    5. Re:This is news? Nope. Not new... by Dogtanian · · Score: 2

      You put your finger on it. I'd just add what I had planned on saying- that, in general, it's not always obvious what's going to be "useful" and "of interest" to future generations when it isn't practical to keep everything.

      In fact, a lot of things that would be of interest to us- i.e. everyday, mundane life- was never recorded at all, back when film and equipment were quite expensive and the effort and cost would have been saved for documenting "important" occasions. Even at a personal level, if I'd known that something like the Internet would become as important as it has, and that there'd be projects like Wikimedia Commons and the like, I might have photographed more of the things around me in my relatively mundane home town while growing up in the 1980s.

      --
      "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
    6. Re:This is news? Nope. Not new... by geniice · · Score: 1

      While this happens from time to time there tends to have been an intermediate conversion to something other than nitrocellulose which tends to have exploded by now although a mix of chance and good storage conditions can prevent that.

  15. Tax Records by PPH · · Score: 2

    The IRS wants to audit me, going back several years. I kept the records as required but they are unreadable now.

    Thanks Microsoft!

    --
    Have gnu, will travel.
    1. Re:Tax Records by yuhong · · Score: 1

      If it is Word/Excel, try disabling the file blocks using the registry or in 2010 or later using the UI in the Trust Center.
      See http://support.microsoft.com/kb/922849

  16. One would think by no-body · · Score: 1

    That people in the far future would be getting smarter to accomplish this - probably a tossup - and apart from it, it's very questionable if a far future for humanity even exists, the way "humanity" is behaving this days/years/decades/centuries/millenia....

    Maybe there are smarter robots by then babysitting...

  17. Re:So? by fuzzyfuzzyfungus · · Score: 2

    I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....

    Even if you don't care about the historians, I'm sure the lucky people who have the pleasure of handling property deeds at your local governance hive can tell you a story from within the last week or two about needing to pull some rather seriously dusty documents to allow a present-day transaction to go through without incident.

    Many data will, indeed, be of no interest at all, or the same historical interest that neolithic refuse dumps are; but data in the nontrivial-number-of-decades range are still live in more than a few contexts.

  18. Github Flavored Markdown by HalcyonBlue · · Score: 1

    I use Github Flavored Markdown. Thousands of years in the future, archaeologists will no doubt work furiously to decode my etchings upon a stone tablet, which will read: "# IF YOU CAN READ THIS YOU'RE A GEEK #" .

  19. Maybe. by MrEricSir · · Score: 4, Insightful

    XML doesn't magically solve everything in this regard. If there's no good documentation for the format, it's unlikely you'll be able to display everything exactly as intended. Likewise, if the format is hideously complex (see: Microsoft Office Open XML) or there's bugs in the de-facto implementation, it's going to be tricky to reverse engineer.

    I'd also point out that MS Office spits out compressed XML. I believe it's based on ZIP, which is very well documented, but that's yet another hurdle to cross. And then you have to deal with the binary format of the XML itself -- ASCII, UTF8, etc.

    --
    There's no -1 for "I don't get it."
    1. Re:Maybe. by viperidaenz · · Score: 1

      The file being in ZIP format is documented. the character encoding of the XML file is specified in the XML file itself, like all XML files should do.

      From what I've used so far the Open XML formats aren't hideously complex, although i've only been working with XLSX files.

    2. Re:Maybe. by Anonymous Coward · · Score: 1

      Much of the problem revolves around

      A: Hardware is always changing. Newer computer are no longer compatible with Windows 9x, for instance. Five years from now Win XP will no longer be compatible with it.

      So old programs written in Basic and Fortran and in MS Dos will no longer run properly. Old games, etc.. that don't support the hardware. Heck, I tried to get Duke Nukem 3d To run on Windows XP using MsDos mode, Dos Box, Freedos, and other simulators and it just won't work with my newer sound card and I can't figure out how to make it work.

      B: Intellectual property. Since many of these programs are still under protection for 95+ years and software lifetime is only like 7 or fewer years (often much less) it's not like third parties can take an older operating system or a simulator and modify it to work with newer hardware so that it can run old programs. Not to mention it's hard to get a hold of all these old programs to be able to do the testing required to run them. All the newer hardware is propriety and so we can't simply do what we want with it either in terms of things like writing drivers for older software, etc... and (hardware and other) patents last 20 years which is like a good three to maybe four generations in terms of technology turnover.

    3. Re:Maybe. by kermidge · · Score: 1

      For A. - as I understand it a good emulator can, for instance, contain a complete virtual 6502 sufficient for doing assembly coding. On the off chance, have you poked around in relevant forums to find out if others have the same problem or are able to get Duke running? I don't know from sound cards so only two ideas come to mind; one, again, hit the forums, and also see if you might have success with converting the sound output to a different format, one that the sound card (driver, really) expects to get.

      Anyway, I think you raise good points, ones I don't see mentioned very often if at all.

    4. Re:Maybe. by wonkey_monkey · · Score: 2

      ZIP format is documented.

      Right now it is. What about the ragtag bunch of misfit librarians who are all that's left after the zombie apocalypse?

      They burned all the books for warmth and to keep the zombies away.

      --
      systemd is Roko's Basilisk.
    5. Re:Maybe. by dkf · · Score: 1

      XML doesn't magically solve everything in this regard.

      There are no silver bullets at all in this area. You can make the container format self-describing, and the tree structure self-describing (which is pretty much what XML gives you), but the hard problem of capturing the actual semantic meanings of the nodes in the tree is just going to remain that. You could attach "semantic meaning descriptors" of course, but whose to say that they're going to be understandable by anyone? (Indeed, where I've seen such things they've typically been less understandable than the original non-semantic nodes and have depended on comprehending a large body of complex documents on the open internet at the same time, which is a total failure mode on many levels.)

      But going for XML (or ASN.1 or JSON or YAML or any number of other tree description schemes) is still better than trying to also pick apart the mess from horrible custom binary dump. It's at least one less obstacle, as you at least can read what the original generator thought it was sensible to tag the tree as. (Other layers of encoding that are reasonably standardized and so don't add to the problem are using a defined character encoding such as UTF-8 or ASCII, and using a compressed packaging format like ZIP or gzip.)

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    6. Re:Maybe. by cyborg_zx · · Score: 1

      For Duke Nukem 3D there are a plethora of source code modifications I suggest you check out. At least one of them will amost certainly do it for you. Running on the original exe may well be a challenge. Surprised SoundBlaster wouldn't work to be fair though since that at least became the de-facto standard.

    7. Re:Maybe. by Xest · · Score: 1

      I think it's more the point that although you may not be able to get a perfect representation back from the dead, you will at least be able to extract or worthwhile information if it's XML.

    8. Re:Maybe. by viperidaenz · · Score: 1

      If all the books have been burnt, it doesn't matter if it's an open standard or not.

  20. Another argument... by FuzzNugget · · Score: 1

    For open source. Save your files in open and/or openly defined, standardized formats and there will always be software that can deal with it.

    But I guess it's difficult for people to hear you explain that to them with their head up their ass.

    1. Re:Another argument... by wvmarle · · Score: 1

      Argument for open standards, yes. Open source, no. You don't need open source for open standards. And open source does not necessarily mean open standards.

  21. Google hire me, I solved this problem in 3 seconds by dicobalt · · Score: 1

    I would solve this by installing a Windows XP VM with a copy of Office XP. Now that I solved Google's hard problem they must now see I am qualified to work there. Google is on a FUD rampage of which the likes I haven't seen since the great Microsoft FUD storms.

  22. Re:What matters? by codepigeon · · Score: 1

    Ok, so how do you retrieve your photos that you stored on that 8inch floppy disk... 10 years from now?

    That is a gross exageration but is an anaolgy to the point of the article. Without proper protections, all the information, notes, white papers, studies, etc will be useless if there doesn't exist technology that can read it.

    In a worst case scenario how would humankind rebuild and not forget what was previously learned (e.g. dark ages we already experienced).

  23. Can anyone identify this character set? by Anonymous Coward · · Score: 1

    Still haven't found a description of the chaaracter set in which octal 222, 223, and 224 are right single quotation mark, left double quotation mark, and right double quotation mark.

    Anybody know this one?

    1. Re:Can anyone identify this character set? by Dr_Barnowl · · Score: 1

      Just to note, CP-1252 is the standard Western code page for Windows. I know this because I have to make special efforts on all my build scripts to cope with the fact that Windows has so far failed to join the "Just use UTF-8 like every other modern OS" club.

    2. Re: Can anyone identify this character set? by andy.ruddock · · Score: 1

      It's not, see https://en.wikipedia.org/wiki/Utf-8.
      Characters encode into one to four bytes - the beauty of UTF-8 being that a 'standard' US-ASCII text document is also correctly UTF-8 encoded.

      --
      God: An invisible friend for grown-ups.
    3. Re: Can anyone identify this character set? by voidphoenix · · Score: 1
      It's called UTF-8 because it uses a baseline of 8 bits (one byte, not 2) to represent characters.

      UTF-8 encodes each of the 1,112,064 code points in the Unicode character set using one to four 8-bit bytes (termed "octets" in the Unicode Standard). Code points with lower numerical values (i.e. earlier code positions in the Unicode character set, which tend to occur more frequently) are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well.

  24. On the PowerPoint 4.0/95 converters... by yuhong · · Score: 4, Insightful

    MS removed the PowerPoint 4.0/95 converters completely with Office 2007 for Windows and later, and disabled them by default in Office 2003 SP3. And the PowerPoint 4.0 converter (but not 95) was disabled by default instead of fixed with MS09-017.

    On the Mac, they removed then even earlier, when they ported Office to Carbon.

    IMO it would be a good idea for MS to package PP4X32 and PP7X32 from PowerPoint 2003 separately, along with a utility to call the converters of course.

  25. Uh, hello? by DogDude · · Score: 4, Funny

    For a supposedly smart guy, he seems a bit silly:

    He could've just downloaded MS's Powerpoint 97 viewer

    --
    I don't respond to AC's.
    1. Re:Uh, hello? by Narcocide · · Score: 1

      Yes, that's your first clue that schadenfreude is involved here. http://en.wikipedia.org/wiki/Straw_man

    2. Re:Uh, hello? by Yo_mama · · Score: 1

      And what does he do X years from now when that link is broken? What do historians 20, 50, or 100 years from now.

      For a supposedly Dog Dude, you seem a bit short sighted.

      --
      Never understimate the power of human stupidity -Lazarus Long
  26. Re:He's mistaken by yuhong · · Score: 1

    I think the user was either using PowerPoint 4.0 for Mac or did not upgrade to Office 97 immediately.

  27. libreoffice will open it ! by mejmeeks · · Score: 1

    If not, file a bug and send in the document. The power of freedom ...

  28. Re:He's mistaken by 0123456 · · Score: 1

    Quite likely. I had some old Word for Mac documents of scientific papers I wrote in the 90s, and the only way I was able to recover them a few years ago was to install a Windows 3.1-era copy of Word for Windows.

  29. Wasn't this solved ages ago? by samantha · · Score: 1

    I remember over two decades ago there was talk of making data objects, that is data that new how to present an object interface to get at its information. Data self contain its own reader in some ubiquitous language. But wait, we never got a ubiquitous language. Perhaps javascript today? But if you want to solve this problem then this is how to solve it. Or perhaps you could just package a converter to convert format XYZ to BSON as being good enough or at least better than today's breakage.

    One thing that really burns me is having my information that I created / entered / caused to be locked up in some proprietary opaque format, especially if owned by one and only one app.

    1. Re:Wasn't this solved ages ago? by gweihir · · Score: 1

      There is an ubiquitous language: ANSI-C + the concept of a raster display with x/y coordinates. Nobody cared enough, also because if you use sane formats (ASCII, PostScript, PDF/A), you can already display them everywhere.

      But JavaScript would be a complete fail. It is not even really compatible across browsers.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  30. Re:He's mistaken by yuhong · · Score: 2

    Have you tried disabling the file blocks first? At least Word for Mac 4.x and 5.x can be read this way.

  31. I have legible pictures over 150 years old by the_rajah · · Score: 2

    Some are glass plate Daguerreotypes. Somehow, I am not too confident that my digital pictures will be legible 150 years from now, unless I make a good quality print on archival paper. Digital files are too easily corrupted and made totally useless. Media formats will change. 8" floppies anyone?

    --


    "Do the Right Thing. It will gratify some people and astound the rest." - Mark Twain
    1. Re:I have legible pictures over 150 years old by AK+Marc · · Score: 1
      I have gifs and jpegs from much older than the document in question, and have no trouble with any of them (and BMPs from before that). I was listening to MP3 in the 1990s. They work fine now.

      What digital picture standard (not raw) do you think you'll have any trouble reading, and roughly when do you think you'd have any trouble with it?

      Media formats will change. 8" floppies anyone?

      So you are worried that your MFM HD from the 1980s won't work, not worried about the .BMP on it not being readable, if you found some way to spin it up? I can't even tell what you are whining about, other than "technology bad."

    2. Re:I have legible pictures over 150 years old by jafac · · Score: 2

      yes - this is a real issue - and ARCHIVED data that is important DOES need to be "spun up" and refreshed to new media.

      If it's hard drives, yes. If it's optical media. . . well that depends. Because some optical media just plain degrades over time. Some is written in special proprietary formats (like Apple's early implementations of CD+R) that you're going to have a hard time reading with CURRENT equipment.

      If your data is archived to tape, and more than 10 years old, I'm afraid you're fucked.

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    3. Re:I have legible pictures over 150 years old by AK+Marc · · Score: 1

      Tape backups, as a home solution, is rare, and even in your case, you aren't fucked. But, to keep the Internet nutters from arguing endlessly, yes, once you've saved your bits, they have a shelf life of [insert absurd number here]. I've got an old 3.5" floppy drive running around in a box somewhere, if I had to read a floppy, and the last time I tried, it worked fine, drive and media, and that's at least 10 years old. When I threw my XT out 5 years ago, it was still reading 5.25" without an issue, and I never had a disc go bad that wasn't visibly damaged, even 1980s vintage CD-Rs that were given a 5 year shelf life by a previous generation of nutters.

  32. Re:libreoffice will open it ! by yuhong · · Score: 1

    Note they also sometimes drop support for old formats too:
    https://bugs.freedesktop.org/show_bug.cgi?id=59902

  33. No different than cars by HockeyPuck · · Score: 4, Interesting

    We're still able to restore cars from the 80s and earlier as the cars were fully mechanical or hydraulic. No computers.

    Fast forward to 20yrs from now, nobody's going to be carrying the computer boards for a 2004 Toyota Pruis or a 2013 Tesla.

    However, you'll still be able to restore your grandfather's '57 Chevy...

    1. Re:No different than cars by AK+Marc · · Score: 3, Informative

      You'll just have to take the Prius ROM on an emulator on your phone, and plug in your phone to drive your car. Easy.

    2. Re:No different than cars by uninformedLuddite · · Score: 1

      We're still able to restore cars from the 80s and earlier as the cars were fully mechanical or hydraulic. No computers.

      We won't be allowed to drive it though. Unless of course it is after the EMP and then we will all be rich cab drivers

      --
      The new right fascists are bilingual. They speak English and Bullshit.
  34. Code should accompany data by michaelmalak · · Score: 4, Interesting

    I presented a solution to this long-standing problem last year to the Denver HTML5 Meetup.

    Code should never be separated from data. This is possible with HTML5, JavaScript, and open source.

    In the presentation, I steal and repurpose Hofstadter's analogy of DNA to an LP vinyl record, which is an information bearer, but useless without its information retriever (the record player). Like the cell of an animal, which contains both DNA and the means to "play" it, I ask why not the same with software?

    My maxim is: data should always carry the code with it to play itself. It was inspired from the field I've spent 50% of my career in: non-destructive testing where, for example, X-Rays and ultrasounds are performed on safety-critical industrial parts with 50-year service lives. If one of those parts fails and kills someone, you're going to want to go back into the old data and find the earliest indication of the flaw or fault and reinspect every other part in the world like it that is still in service. And maybe you need to go back 50 years. Under such a context, not providing the code with the data could be considered an act of gross neglect.

    In my presentation, I use the 1990's era trick of embedding XSL into an XML file, with the addition of the XSL now being able to use HTML5/JavaScript. Sadly, I've only gotten it work with Firefox -- the other browsers consider it a security violation.

    1. Re:Code should accompany data by femtobyte · · Score: 1

      From a future data recovery standpoint, how is the "code" any more useful than data? You'd still need to be able to figure out how to execute the code itself --- the code is just an especially complex and capable file format (which likely makes it very difficult to figure out if you've lost the execution instructions). Some file formats are already complete programming languages --- like PostScript. Do you think you could make much sense of a PostScript representation of a document if you started without a PostScript interpreter available (or at least a comprehensive PostScript specification, and a heck of a lot of free time)? Why is JavaScript any more likely to be well-known in the year 2100 than PDF? Or any easier to reverse engineer? Your trick of embedding XSL only "worked" because you were lucky enough for XSL to stick around as a commonly used file format --- or did you have a magical future-seeing crystal ball in the '90's (and then, why didn't you warn anyone about 9/11)?

    2. Re:Code should accompany data by michaelmalak · · Score: 1

      It's an issue of installed base.

      The installed base of any given NDT system is typically less than Qty. 100, often much less. The installed base of HTML5 interpreters is on the order of a billion. The installed base of PowerPoint 97 at its peak was in the tens of millions. To be honest, I think Vint Cerf is complaining a bit much. Anyone (including him) could download the appropriate VMs, archival operating systems, and archival Microsoft Office systems to read PowerPoint 97 and even convert it to a modern format where the file could even be further edited and modified. In his search to provide an example, I don't think Vint Cerf came up with a good one.

      A life-critical system with an installed base of less than 100 where the data must be preserved for 50 years is a better example, and answers your question: why code can be more useful than data. Code from an installed base of 100 is useful if it relies upon software that had an installed base of a billion.

    3. Re:Code should accompany data by femtobyte · · Score: 1

      PDF readers have an install base of pretty much every current computer on the planet. In fact, Microsoft Office has an install base at least able to *read* the formats of pretty much every current computer on the planet. So does Adobe Flash. If your argument is security-through-install-base, then why aren't these approximately as "safe" as JavaScript? I'd consider an MS Office document or a flash application to be a pretty iffy long-term archival format; however, according to an "install base" criteria, they seem like perfectly good choices. In fact, that's the same logic that encouraged so many people to use Office '97 back in '97: "everyone uses it; stop being silly and bothering me about 'open standards.'" I don't disagree that a wide install base is one ingredient in increasing the probability of format longevity; but I think you'll need more sophisticated criteria than that to not get seriously blindsided by widespread future changes --- "everyone switched to Python 5 a decade ago --- the last time anyone would have been running a JavaScript interpreter, they were still using *magnetic* storage media. Good luck getting one of those working!"

    4. Re:Code should accompany data by michaelmalak · · Score: 1

      The likely long-term viability of PDF does not discredit the long-term viability of JavaScript.

      You agree with Vint Cerf about PowerPoint 97 being an archival format. I disagree with him. As I wrote, I believe software archives will maintain VMs, copies of OS's, and copies of Microsoft Office due to the historical installed base of tens of millions.

      The question is whether you would be able to find a JavaScript interpreter on a search engine in 2100. I believe the answer is "yes," because it was an important (by installed base) piece of software. Admittedly, it won't help you in an apocalyptic scenario.

    5. Re: Code should accompany data by michaelmalak · · Score: 1

      So when distributing an old OpenOffice file I should also put OpenOffice?

      OpenOffice should have an option to save to an XML file that includes an embedded XSL/HTML5/JavaScript viewer program as I described in my presentation.

    6. Re:Code should accompany data by femtobyte · · Score: 1

      By that logic, though, there's nothing particularly special about your concept of "code accompanying the data." PDF is an "important (by installed base) piece of software" --- and whatever archived VMs that still have a working JavaScript interpreter (web browser) will *also* have a working PDF reader. While you're making a big deal of keeping the "code" (XML/JavaScript) with the data, this is actually entirely irrelevant to what the mechanism you propose relies on: constantly maintaining a working "chain" of VMs for VMs for VMs for whatever systems are (were) commonly in use. This approach might well work; after all, storage is cheap these days. However, it is prone to catastrophic failure: if the "chain" of generating new VMs is ever broken, you're left with an extremely complicated and opaque mess of bits that would require re-inventing entire (dead) operating systems to restore.

      An alternate approach to maintaining ever increasing opaque complexity (better not let anything important slip into the cracks) is to try to come up with clever ways to produce formats/archives that would be especially fast/easy to reverse-engineer and bootstrap from first principles if no "living" interpreter could be found. This is a hard problem, if not impossible to find a satisfactory answer, but worth thinking about. If you can popularize such a format, then you'll have both the protection you propose ("eternal" support in nested VMs), *and* a backup plan for recovering information if, over the decades, you misjudge what technological branches will be faithfully preserved for posterity.

      Finally, one potential monkey wrench in the works of your plans to always have a chain of older operating systems: operating systems are becoming increasingly dependent not only on self-contained binaries on a machine, but internet connections to gigantic networks of services. What happens when you try to boot up Windows 2038 in the year 2065 (through a few intermediate virtual machines), only to find that the OS needs to connect to several hundred servers/services that were scrapped twenty years ago? So, for your proposal to work, you also need to force operating system designers not to create any external dependencies --- exactly opposite to the current trend of "cloudifying" everything in sight. Increasing integration with "the cloud" would be fatal to your data-preservation method.

    7. Re:Code should accompany data by michaelmalak · · Score: 1

      I consider PDF to be powerful because it can contain JavaScript, and even embedded mouse-driven interactive animated 3D. I consider PDF to be a lateral alternative to embedding JavaScript in XML as I presented.

      I agree, self-describing formats, such as the Voyager pixel image and the Contact engineering diagrams, are interesting. They solve the extreme of the problem, in a survivalist way. It's the bomb shelter level of planning, whereas it would be reasonable for most people to instead just stock 7-30 days of provisions on a shelf. A shelf is convenient, as is a data file that executes itself against a widely-available interpreter.

      Yes, I see the chain of operating systems as the solution just for the closed source world, and I'm hoping those days are behind us or soon will be.

    8. Re: Code should accompany data by Yo_mama · · Score: 1

      Really? A viewer that works on N^X operating systems, including ones we haven't written yet?

      --
      Never understimate the power of human stupidity -Lazarus Long
    9. Re: Code should accompany data by michaelmalak · · Score: 1

      A viewer that complies with W3C standards for HTML5/JavaScript.

    10. Re:Code should accompany data by yusing · · Score: 1

      I can retrieve the -essential- information on an LP vinyl record with a piece of paper and a pin.

      I vote for ASCII (or some representation with all-printable characters) for all data that's got to last. That way it can be printed on paper which can potentially last for thousands of years... even buried in a garbage pit. https://en.wikipedia.org/wiki/Oxyrhynchus_Papyri

      At least, until someone invents little disks you can spin over a self-powered stonelike table and they're read back to you.

      --

      "You must try to forget all you have learned. You must begin to dream." -- Sherwood Anderson

  35. see Windows 1250 and 1251 by Doug+Merritt · · Score: 2
    Windows 1250 and 1251 do, and possibly others. It sounds familiar, but my memory is fuzzy, so I just looked around.

    https://en.wikipedia.org/wiki/Windows-1250

    --
    Professional Wild-Eyed Visionary
  36. apt-cache search EBCDIC by Burz · · Score: 1

    Yields 4 results in Ubuntu. You can search reputable open source archives on the web, too.

    How deep are your pockets?

    *IBM Consulting*

    Um, really???

  37. Re:So? by ArhcAngel · · Score: 1

    *spoilers*

    --
    "A person is smart. People are dumb, panicky dangerous animals and you know it." - K
  38. real problem is: FEATURE CREEP by bussdriver · · Score: 2

    I've been part of archival problem planning. We went with DVD. now I am not there, I suspect they are thinking DVD sucks and are moving "forward" when the DVD was more than good enough and those plastic discs will last a century. mpeg-2 files will have open source decoders. Now physical readers will still be a problem... the only solution is to wait as long as possible and then switch to the next long lasting format - but not necessarily the newest one at that time. (which is why moving to blueray is a waste of money.)

    The biggest problem with other formats is the FORMAT; even with something like open office documents, the ODF format will have revisions and new features added and tweaks to the format. version 2, 3 etc. The features and changes that promote the creation of more and more formats is the biggest problem. Just like my above DVD video problem- if you go beyond your needs then you are complicating things with more and more formats.

    TEXT? sucks. we need WORD! Word 1.0? the app sucks... we need WORD 20! (and all versions in between to migrate the old docs...plus labor to deal with conversion issues...)

    Perhaps we need ARCHIVAL formats; like PDF, which has done besides the stupid additions Adobe has been making to it. Or just TEXT export... a less bloated output only format without the feature BS problems.

    Thankfully, email remains the same... sort of. although storage of the emails differs greatly; if you want to archive emails you need to pick a close-to-the-source method (and simple storage filesystem-- good luck reading that NTFS formatted disk image in 30 years.)

    1. Re:real problem is: FEATURE CREEP by Stuarticus · · Score: 1

      I'll gladly bet with you that those all DVDs won't last one hundred years. Unless the archival problem planning you were involved in was more than ten years ago DVD was an insane option, even then it was sketchy at best.

      http://www.thexlab.com/faqs/opticalmedialongevity.html

      --
      If you think someone isn't free to have a different definition of "freedom" you may be a tyrant.
    2. Re:real problem is: FEATURE CREEP by bussdriver · · Score: 1

      DVD-RAM and DVD-R in cartridges, never exposed. Blueray wasn't even a name back when we did this. HD storage was a joke and way more expensive at the time. Tapes were expensive but were looked at as an option. Best of all, a long list of devices that could read the things being around for a long time. (Sure you can get PATA to SATA adapters, but will you be able to read an old HD's filesystem? probably.)

      Well, I'll bet $1,000,000 they will last 100 years. You can collect the money from my grandchildren :-p

      It doesn't matter as long as it lasts a long enough time to avoid upgrading and migrating data multiple times. It'll last beyond blueray and perhaps beyond it's replacement. It may be migrating to HDs today as far as I know... knowing them, probably RAID 5 without a backup (they kept thinking RAID 5 had built-in backup way back then... which BTW was hardware only back then.)

  39. I do blame Microsoft by Darinbob · · Score: 4, Informative

    Seriously, why would Vincent Cerf not blame Microsoft? They have an extremely poor track record with backwards compatibility, and I don't think they even know what forwards compatibility is. If you design the data formats correctly then you can keep things usable for decades (or centuries). Guess what, twenty year old TeX documents still work, and yet Word X won't work with Word X-2. I've pulled runoff documents off of 70's versions of Unix that can still be printed. That says to me that one can deal with compatibility issues.

    This is all intentional on Microsoft's part too. They make money when customers buy new copies of software, so it is in their best financial interests to make sure that customers have significant pressure to upgrade. I remember the solution to an acknowledged bug for Word 97 was to make sure that everyone who was going to read your document had the appropriate Word 97 plug in in their older version of Word. I completely blame Microsoft here.

    This is not that hard a problem, IF the company pays attention to it and gives it even a small amount of priority.

    1. Re:I do blame Microsoft by KiwiSurfer · · Score: 1

      I think Microsoft is doing very well in terms of backward compatiability when you compare certain products of theirs to those offered by other companies. One example I'm sure many people can relate to is Windows. It is the only OS I'm aware of that can run many apps compiled in the '90's on the current version without requiring a lot of hacking. I still play games from the '90's on Windows 8, many of which work fine out without any tweaking. Try that with Apple, Linux, et al. While Microsoft hasn't always done well with all their products (IE, Outlook, et al comes to mind), there are several products (such as Windows) where they are doing very well and they should be acknowledged for that.

    2. Re:I do blame Microsoft by mhotchin · · Score: 2

      To say that MS has a poor record of backwards compatibility is, well, ridiculous. It's only just about *the* most important thing for them, because the majority of their business is with busnesses, and if their FooBar app doesn't run, then they don't upgrade.

      No other OS has near the level of compatibility that the MS sequence does.
      http://www.youtube.com/watch?v=vPnehDhGa14

      http://blogs.msdn.com/b/oldnewthing/archive/2006/11/06/999999.aspx
      http://blogs.msdn.com/b/oldnewthing/archive/2003/08/28/54719.aspx

    3. Re:I do blame Microsoft by serviscope_minor · · Score: 2

      No other OS has near the level of compatibility that the MS sequence does.

      Somebody's been drinking the kool-aid.

      There's a small, little known company called IBM selling a type of computer called a "mainframe" which might beg to disagree. You can buy a modern mainframe which will still run your unmodified programs which you wrote on an original System 360. In 1964.

      Microsoft have not even existed as long as that chain of backwards compatibility, and you try getting the original digger to run on Windows 8 (or RT! ha! instruction set changes are no barrier to IBM apparently) without Dosbox.

      --
      SJW n. One who posts facts.
    4. Re:I do blame Microsoft by Darinbob · · Score: 2

      The OS stays compatible in some ways (Windows is not at all unique here). However the Microsoft applications have serious problems in this regard. Maybe some of the competition is not so great either but it's no excuse when Word can't even be compatible with itself. They have changed the file format in Word in fundamental ways several times.

    5. Re:I do blame Microsoft by drinkypoo · · Score: 2

      No other OS has near the level of compatibility that the MS sequence does.

      It's called ANSI C on Unix. Pick up a copy of The UNIX Programming Environment and you can still use the examples verbatim on a Linux machine today. And you can even still use Motif apps, if we're talking about GUI programs. They still work just like they did when they were new, except a hell of a lot faster.

      Oh, you want backwards compatibility for closed-source software? Guess what? Plenty of software craps itself when it does anything interesting on the wrong version of windows. In reality, there's only one way to ensure compatibility, and that's to have your hands on the source — and for it to be worth a crap to begin with.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    6. Re:I do blame Microsoft by Mirar · · Score: 1

      I would blame Microsoft as well.

      It's their problem to invent a format that is readable in the future and in the future create readers that can read all old formats they invented.

      Do that design well and it's an easy problem.

      Don't do it well (they didn't) and it's a hard problem.

      It doesn't even have to be separated in time for it to become a problem - formulas in Excel used to be in the local language, so you couldn't even exchange sheets to another region (or a computer with a different language in the same region). I haven't heard about it, but I kind of bet there were issues with transferring files between PC and Mac for the same Microsoft Office (in the same language) as well.

    7. Re:I do blame Microsoft by Darinbob · · Score: 1

      One thing I found interesting was that the MFC stuff from Microsoft (which I rarely used, a nasty piece of work that is) has some code to automatically save and restore your class states. However all it does is write out a version number and then essentially do a binary dump of the data. If the version number doesn't match when reading it in then it will refuse. There are so many things wrong with that approach I don't know where to start. But having Microsoft present that to their developer customers as an example of how things should be done is telling.

      People don't get to see much code written by Microsoft, but what can be seen is very often very naive, poorly written, and violating Microsoft's own standards. Most of this code though are samples though, things to show devs how to use a new DLL for instance. So part of me wonders if the absymal quality is only because they assign interns to write this stuff and that the smart developers do more important things, but at other times I wonder if this is a more pervasive style internally which would explain the MFC stuff.

  40. Re:He's mistaken by Bing+Tsher+E · · Score: 1

    You don't even have to install Word for Windows from that era. WinWord 2.0 will run as a stand-alone binary. Just the Winword.exe file by itself will run. And it's less than 1.44M in size so you can just have it on a floppy diskette. On any 16 or 32-bit Windows machine, of course. It even includes that era's VBA so you can use the winword.exe binary as a portable 'execution environment' sort of.

  41. Re:What matters? by ganjadude · · Score: 1

    he specifically stated that he re backs up every year. I dont go that far but i have data going back as far as the early 90s that started on large floppys, migrated them to smaller floppies, migrated them to CD-r's and now have them on external hard drives. It isnt too hard to keep formats alive. (also note on the hard drives I keep VMs with older OS's able to read formats that i have not found a way to convert, which isnt many.)

    --
    have you seen my sig? there are many others like it but none that are the same
  42. This problem isn't new to anyone by kriston · · Score: 1

    This problem isn't new to anyone. If it's new to you, then you need to get involved in the digital preservation movement.

    http://en.wikipedia.org/wiki/Digital_obsolescence

    --

    Kriston

    1. Re:This problem isn't new to anyone by gweihir · · Score: 1

      Indeed. I remember hearing a very good talk about preserving digital images 15 years ago. They were going for TIFF without any compression as writing software that recovers the image becomes very simple then. .pnm formats in ASCII would also do it.

      This problem is not new, and it is solved. It is solved as long as you look at whether a particular product ignores this problem or not _before_ you decide to standardize on it. If people were not so ignorant, Microsoft could have never pulled this stunt that made them countless billions.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  43. "hard problem" by macraig · · Score: 4, Insightful

    Vint, that's bullshit and you know it. It's nothing more than preserving syntaxes, grammar, file formats. That's not hard, and it only requires someone to create a format conversion ONCE to solve the problem at each stage of the evolution.

    The real problem here is proprietary non-public formats and structures. When the structure of data has been a closely guarded secret and requires reverse engineering that may not even yield a perfect result, THAT is hard.

    1. Re:"hard problem" by gweihir · · Score: 1

      Some say that for MS file formats, not even MS has a spec that is usable. Would explain why documents get mangled when you go from one Word version to another. Pathetic.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  44. Re:He's mistaken by yuhong · · Score: 1

    WordBasic, actually. What is fun BTW is to unblock Word 6.0/95 formats in 2010 and later and open a file with WordBasic like SCANPROT.DOT.

  45. Re:Code should NEVER accompany data! by lahvak · · Score: 4, Insightful

    No! Fail! You don't get it!

    1) Code is data
    2) Code is data that is especially hard to interpret
    3) One of the main reasons of all this mess ia that in all those proprietary formats, data is intermixed with code, and the whole mess is very hard to parse.

    Data should be kept completely isolated, as far away from code as possible. That way, if you cannot interpret the code any more, you will still be able to analyze and parse the data. You know, it is not that hard to construct a record player.

    --
    AccountKiller
  46. He should be blaming Microsoft by gweihir · · Score: 2

    My first Latex publications from 20 years back and all my human-readable ASCII scientific data still be read and used without any problem. Human-readable file
    formats in the UNIX tradition completely solve this problem.

    This problem is only hard if the people making the data formats are either stupid or do not want their formats to be easily accessible to other applications, as Microsoft does. Of course, others are creating just as fundamentally broken formats for either of the same reasons.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:He should be blaming Microsoft by hcs_$reboot · · Score: 2

      Just hex print the MS 97 file and you have a human readable format:

      00007b0 5f00 675f 6f6d 5f6e 7473 7261 5f74 005f
      00007c0 696c 6362 732e 2e6f 0036 5f5f 7270 6e69
      00007d0 6674 635f 6b68 6500 6978 0074 6573 6c74
      00007e0 636f 6c61 0065 626d 7472 776f 0063 706f
      00007f0 6974 646e 7300 7274 636e 7970 7000 7475
      0000800 0073 6177 6e72 0078 5f5f 7473 6361 5f6b
      0000810 6863 5f6b 6166 6c69 6900 7773 7270 6e69

      --
      Slashdot, fix the reply notifications... You won't get away with it...
    2. Re:He should be blaming Microsoft by gweihir · · Score: 1

      Your standards are too low.

      Or maybe you have been exposed to too many corporate BS documents....

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  47. the man is out of touch by stenvar · · Score: 2

    You can get emulators for just about every machine you can imagine: PDP-10, PDP-11, DOS, Atari, Amiga, C64, microcontroller, etc. You can get hardware emulators with FPGAs if you like. Almost any important format is documented or has been reverse engineered. Yes, you can easily read 1997 PowerPoint files, even if his weird choice of Office on Mac can't. And that's only with current technology. Give it a few decades and all that can happen behind the scenes and computers will just automatically perform even the most complicated data conversions behind the scenes. "Computer, scan the 1997 floppy and put the data on screen."

    1. Re:the man is out of touch by uninformedLuddite · · Score: 1

      Give it a few decades and all that can happen behind the scenes and computers will just automatically perform even the most complicated data conversions behind the scenes. "Computer, scan the 1997 floppy and put the data on screen."

      Really? Do I need to insert my credit card so that all the different patent and rights holders get their cut as I pass through their digital territory?

      --
      The new right fascists are bilingual. They speak English and Bullshit.
    2. Re:the man is out of touch by stenvar · · Score: 1

      That is entirely up to you. But if you want to read the data, you can.

  48. Re:He's mistaken by Ultracrepidarian · · Score: 1

    I was guessing I wouldn't find Kool Aid Man for Atari 6200 but, sure enough, it's out there.

  49. Re:So? by wickedskaman · · Score: 2

    Who hurt you? :-(

    --
    Sand's overrated... it's just tiny little rocks.
  50. Not hard by Tough+Love · · Score: 1

    Backward compatibility is not a hard problem, Vint Cerf just isn't very good at it as evidenced by the IPv6 fiasco.

    --
    When all you have is a hammer, every problem starts to look like a thumb.
  51. Darth Cerf is nuts by rs79 · · Score: 1

    What's he doing keeping stuff in MS apps for? Then when they don't work 5 years later he's all like OMG THE NET WILL BREAK.

    Idiot. He knows better. Or should.

    --
    Need Mercedes parts ?
  52. Yes Yes by Greyfox · · Score: 1
    Blah blah digital dark age blah blah gay sex. Anything worth preserving is still written in books. Do we really need to preserve every byte of information on the internet for posterity? Most of us dildos just aren't that interesting. I'm pretty sure the future will survive just fine if every cat video on youtube doesn't last to be reviewed by future generations. If every movie for the past couple-three decades went up in smoke, anything of value lost? Really? Granted, I like to imagine some future-NPR blathering on about how that all-digital rendition of Snoop Dog's "All my Bitches" in D minor was one of the greatest works of the era, but it probably wouldn't really be all that funny. The world doesn't need to remember you or me or that guy over there. Hell we can't even learn from our history from few decades ago, much less from some guy who got nailed to a cross a couple thousand years ago (Probably not the one you were thinking of.)

    Sure I am sometimes saddened at the thought of the video games of my youth being lost forever, but even if they weren't it wouldn't recapture the joy I felt upon encountering them at the time. Do you think you are more important than that? Think of the current year and then start going back a decade at a time and name one person you know of from that time. How long before you run out of people you know personally? Before you run out of people you have even heard of? I bet most people can't even make it a century. Millions of men fought in the world wars, many of their stories are still recorded. How many people bother to look at even one? My grandfather recounted a story of seeing the first automobiles in his town, how many people even think of a time when they didn't exist, or the time when they were new to the world? Precious few I reckon.

    If you want to worry about what history will think of THIS time, perhaps you should be a more careful custodian of previous ones.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    1. Re:Yes Yes by uninformedLuddite · · Score: 1

      If you want to worry about what history will think of THIS time, perhaps you should be a more careful custodian of previous ones.

      I dread to think that archives of youtube twitter and facebook survive into the future. We will be looked back upon as a bunch of retarded idiots.

      --
      The new right fascists are bilingual. They speak English and Bullshit.
  53. It's hard for obscure forms of content by Animats · · Score: 1

    I have simulation programs trapped in Working Model for Mac format. I have 3D animation projects trapped in Softimage 3D for Windows NT. Neither is easily convertible to anything else. (Worse, they're on DAT tapes.)

    Images, video, audio, and text documents are easy to convert because there are modern formats that directly correspond to them. But some things don't translate well.

  54. Re:Code should NEVER accompany data! by michaelmalak · · Score: 1

    In my presentation, you'll see that the strategy of embedding XSL in an XML file has the code in the top half and the data in the bottom half, clearly delineated. They are easily separable. But by having them in a single file, they will not get separated by someone copying them.

  55. Old floppies are a problem. by Static · · Score: 1

    Old samplers are rather a victim of that. The hardware is often fine and can still crank out some awesome sounds, but they are often diskette based and storage technology has moved on hundreds of times faster than synthesizor technology.

    The Ensoniq scene has almost abandoned the EPS series because they used double-density drives and DD 3½" floppies haven't been made for years - and HD floppies aren't reliable in DD drives. Nowadays even HD diskettes are losing their stored bits. *All* the people keen to keep the ASR-10 alive have shifted to SCSI solutions because floppies are just not reliable anymore.

    Wade.

    1. Re:Old floppies are a problem. by NJRoadfan · · Score: 1

      Floppy emulators are becoming a popular alternative for use in synths and older computers. I'm finding out that even HD disks aren't reliable in HD drives. Brand new disks with bad sectors are common as media quality took a nosedive. Many disks I have used for sneakernetting to my 486 that were just months old have become unreadable (its not the drives), good thing there wasn't anything important on them! Meanwhile disks from years ago still work fine.

  56. CSV by VortexCortex · · Score: 1

    Bullshit. You're merely enjoying the consequences of voluntary DRM. If you don't care about your data you'll lose it, just like those pictures you used to draw in crayon that hung on the fridge. If they ARE important then you can keep them and use the data indefinitely.

    I still run the GWBASIC programs, and even 16 bit x86 DOS code I wrote as a child to edit images and color palettes via keyboard in (M)CGA video modes which BIOS still emulates, and OSs like Free DOS can still make use of (Watercolor isn't extinct because Oil paint exists, Platforms are to game makers what Canvas and Paint is to Painters). Hell even my very 1st 386 bootloader can be written to an MBR and booted on a brand new x86-64 system (disable Security Theater Boot). This is NATIVE support. With an emulator, I can even run programs I wrote for my dad's old PDP-8 -- A completely different architecture... 12 bit bytes!. I cared enough about the little dinky things I did as a kid to make sure they were preserved across every major storage format change. I can still read the comments my dad thankfully added to some of my code all those years ago -- a valuable lesson indeed; My kids find gramps' snark quite funny. That's several generations of data compatibly for my family's directory tree...

    It's not useful to bitch about compatibility by citing programs created by companies that willfully suck at compatibility. MS DOS requires an emulator, but DR DOS can still be installed on my new systems. Though it doesn't recognize my sound card I can still program a driver for it though -- just like I did to get my old custom IR transceiver devices to control my new home theater setup (lights, screen, volume, etc) via my aging Osborne-1's serial port.... It's a functional "conversation-piece" to hear that familiar 5.25" drive access as the signal tables are loaded for TV instead of the stereo. That same data format which has been in use now for decades and even works on new hardware w/ Linux via LIRC now -- thanks to the kids... old Ozy will give out someday. Thats a future proof protocol compatibility across several generations of hardware, simultaneously.

    There is NOTHING stopping me from converting the palettes and images created in my PAL_EDIT.COM into a GIMP .PAL / indexed .TGA or .TIFF, or .PNG, etc. I can (and do) frequently convert files in both directions, to go from GIMP to PAL_EDIT.COM to get new images and new "mods" into my really old game "engines". That's the thing about open formats and programs with source code available. Remember the push back against non-textual network protocols and even in email?) We won this battle already. I wasn't aware anyone had stopped fighting it. This page is written in TEXT. It's JavaScript and HTML... FFS: The 1st damn web page on the Internet still renders.

    The authors can ALWAYS create data converters if they want, the problem is giving up that right and not demanding source code access. If my own data formats can survive the transition from kid to teen to adult and even be shared and passed on to my own kids (who love "real" retro games, BTW, such hipsters), then surely multi-billion dollar companies can do it too. Or, are you implying that despite all that money they are more inept than I can even imagine? If so, that's a pretty big dig at Microsoft there Vint... Bravo. Kind of makes me wonder WTF you're paying them for, eh?

    I expect this kind of BS from you now Vint. I mean, you don't even realize the usefulness of your own contributions to mankind, Saying that the Internet is not a human right. Look up human: A characteristic of humans; A human being. It is a human right. It's the right to bear technology. That's what the 2nd amendment is really about, they just worded it wrong, they're imperfect. Just because some old farts can't understand the future the way we do now, doesn't make new technology NOT a human right. The Internet is the equivalent of access to spee

    1. Re:CSV by VortexCortex · · Score: 1

      Tisk. Tisk. I checked to see. I used the emulator to create an MS Excel file in Office 97 format on Windows 98. I went through each version of Office I have, that is to say: Not every version. I successively pulled up the Excel file, converted and re-saved it. It now works in MS Office 2013. To me this just reinforces the idea that continuous duplication across formats is the answer. I still assert that open source programs and open formats are needed, otherwise you could lose access to a program -- I noticed that XP said it wasn't a legit copy, which I know it is.... DRM fail.

      If a company goes out of business and there is no provision for its software to become accessible to others, all the products running that software may become inaccessible, Cerf said. "There are hard, complicated technical and legal problems that will have to be resolved."

      The problem is recognized and there are efforts internationally to address it. Cerf said he's been in meetings about this issue attended by 400 people.

      "It may be that the cloud computing environment will help a lot. It may be able to emulate older hardware on which we can run operating systems and applications," he said.

      Does it take 400 of the finest minds you can muster to simultaneously shout. "OPEN SOURCE" ? If the company goes out of business, and their code was open source, it really doesn't require any further action to ensure the data will always be usable by the end users, eh? Businesses need to stop using artificial scarcity, stop selling infinitely reproducible bits, and simply Do Work to make money. What happens if the buzzword compliant "Cloud Computing Environment" goes out of business? Why then you can't even try to reverse engineer your data -- It's really fucking gone then, eh, Genius?

  57. longterm readability and backwards compatibility by waterbear · · Score: 1

    There are free/libre software projects with great records in opening up interoperability and keeping backwards compatibility. On the other hand, fashions among proprietary s/w makers seem to change, and about now there is a tendency to stop worrying about existing users and just abandon past formats.

    Any number of folk will say things like "shouldn't be difficult at all to reverse engineer", but that doesn't make anything happen. On the other hand, there are plenty of apostles of the latest version ready to heap abuse on anyone bold enough to ask for backwards compatibility, and that attitude is a big source of problems.

    Longterm readability is helped when software developers take the trouble to maintain backwards compatibility across different versions of popular tools and across competing applications that have broadly similar uses. That doesn't directly help with hardware barriers, but at least it would be good if the number of needless software barriers is kept down.

    [...] Most of these things will be readable just as long as the applications that created them are around, but not longer.
    [...]
    Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.

  58. Call a spade a spade by ArsenneLupin · · Score: 1

    "I'm not blaming Microsoft,' said Cerf,

    Let's call a spade a spade. It's 100% a problem due to opaque binary formats. Had the document been written in (clean) HTML or plain text, it would have stayed usable without problems.

  59. the internet is filled with thieves by FudRucker · · Score: 1

    a thief for example is, recently i was looking for an Owner's Manual for a Suzuki motorcycle in PDF form, the bike is a few years old so Suzuki does not keep it and the only website that has it downloadable wants me to both sigh up for an account with them and wants money for the download, and they did not make the owners manual so they have no rights to withhold that information either intellectually or materialistically, so i refused to sign up on their lame website and refuse to give them money and i will keep searching for a free copy

    --
    Politics is Treachery, Religion is Brainwashing
    1. Re:the internet is filled with thieves by Kardos · · Score: 1

      Just request a copy from Suzuki

    2. Re:the internet is filled with thieves by drinkypoo · · Score: 1

      Just request a copy from Suzuki

      GP says Suzuki doesn't have the manual. It's quite plausible. Ford no longer has manuals for pre-Powerstroke diesels, and the printer (Helm) is no longer printing them, so if they run out of backstock for your model year then you need to know which model years are most similar, or you need to go to eBay. When I got the FSM for my 1989 240SX I was able to get it from Nissan, but I heard they ran out a couple years later.

      Personally, when I want a FSM, I go straight to eBay anyway, because it's almost always the cheapest source. But I will often google around as well, because that's almost. Of course, you can often get illicit FSM scans on eBay as well, but you get about what you pay for there. The best thing is to get the real, OE hardcopy.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  60. Re:Code should NEVER accompany data! by lahvak · · Score: 1

    I guess that makes sense if your data is so complicated that it actually needs XML, but I would still say that for simple data that can be stored in a simple to parse format like csv or tsv, it is better to keep it separate.

    --
    AccountKiller
  61. Future Proffession : Data Archeologist by nemesisfixx · · Score: 1

    I was wondering what professions I should keep tags on, just in case we have that talk about careers with my son-to-be... Being an expert on long-gone and "lost" data formats and collecting their respective tools just seems like a future relic (Oh, and we already keep terabytes of all those myriads of one-time-use programs and utilities we downloaded from 5 years ago, right?)

  62. best safeguard by hduff · · Score: 2

    The best safeguard is the abandonment of all existing proprietary formats to freedom (so anybody can write conversion software) and the proliferation of open formats on an ongoing basis.

    --
    "I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert
  63. Re:So? by r_a_trip · · Score: 1

    *** Yes, and by god, future historians will care about YOUR spreadsheets and YOUR websites! ***

    Actually they do. Historians are still trying to (painstakingly) find out how people in the Neolithic lived. So yes, having access to YOUR spreadsheets and YOUR websites will be very valuable for historians in say 3000 years.

    *** Egotistical jackass. No one gives a shit about 99.999999% of humanity after they're gone. ***

    Projection? That YOU don't give a shit about humanity, doesn't mean nobody else does.

    --
    # touch universe # chmod +rwx universe # ./universe
  64. Vint is not blaming Microsoft, but I do. by 140Mandak262Jamuna · · Score: 1

    Microsoft from day one has been making its data incompatible with everything else. It was a lean and hungry company back then (it is fat and hungry now), and it was compatible with every existing thing on the import side and incompatible with everything on export. It fought a mean campaign against Samba. It played dirty with Netscape and the web standards. Bugs in IE worked around in IIS and vice versa to make it very very hard to stick to a standard.

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
  65. Macwrite... by tekrat · · Score: 1

    I'm having a similar problem. My father had started writing a book on Macintosh 512k using Macwrite. He passed away a decade ago, but, recently I uncovered a box of floppies.

    Needless to say, even reading a floppy on a modern Macintosh is pretty much impossible, and even then, the older Mac documents had a data and resource fork, and recovering data from those early formats is pretty hairy.

    Some of the data can be recovered, but it's unlikely I'll ever be able to completely read the book he was writing -- Unless I find myself a Mac 512 with Macwrite, and then run the text through the serial port to a more modern PC.

    --
    If telephones are outlawed, then only outlaws will have telephones.
  66. How do you read the five inch floppies they're sto by arfonrg · · Score: 1

    Same way the TRS-80 fans do.... Take an old drive with an adapter and read it off once and transfer it to new media.

    Ira Goldlang's site (trs-80.com) has TONS of old software done that way.

    --
    Your thin skin doesn't make me a troll
  67. Cerf is wrong... by arfonrg · · Score: 1

    His position is that the data format is what will prevent data recovery - I postulate that as long as there are bored nerds that perceive a challenge, the old can and will be reverse engineered.

    --
    Your thin skin doesn't make me a troll
  68. Re:Cerf is wrong... part II by arfonrg · · Score: 1

    What WILL cause all of our digital data to finally be lost is media degradation. Every piece of data ever created will eventually be lost because the media it's on finally fails and someone forgot to copy it before hand. (That or the sun engulfs the Earth before we finally figure out that we have to get off this planet)

    --
    Your thin skin doesn't make me a troll
  69. Re:nasa by arfonrg · · Score: 1

    Put copies online and see how fast some nerds don't decrypt it....

    --
    Your thin skin doesn't make me a troll
  70. Re:nasa by arfonrg · · Score: 1

    (Ignore my "don't" in the above sentence. It made sense in my head but not so much in print.)

    --
    Your thin skin doesn't make me a troll
  71. Re:longterm readability and backwards compatibilit by jedidiah · · Score: 1

    I recently encountered some bit of data that was encoded in a proprietary format but didn't really need to be. Nothing about the data required the extra features available from the proprietary format.

    It turned out that a file from proprietary app X generated a file that couldn't be properly displayed on other copies of the same app without first being converted to a non-proprietary format.

    Some people do really perverse things to avoid giving you data in a reasonable format.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  72. One of the reasons for ODF format by Anonymous Coward · · Score: 1

    The Open Document Format(tm) was intended to ensure that documents have longevity. They looked at what companies like microsoft were doing, with every version 'incompatible' with prior versions. (Its not a random thing either, microsoft goes out of its way to make *certain* that new versions are incompatible with old, so that people are *forced* to upgrade. When the Open Document Format(tm) was created, users such as the Vatican Library who have a large number of documents over 1000 years old, a good number of documents over 1500 years old, a smaller number of documents over 2000 years old, and less than two dozen shelves full of documents more than 2500 years old. Being able to read old data is important to them. Being able to read old data is an abomination to microsoft. Hence ODF. But microsoft tried to kill ODF with their OOXML which has proprietary undocumented containers within the XML, which makes reading anything older than 1 version impossible. Thanks again microsoft.

  73. The internet never forgets by Windwraith · · Score: 1

    So, the internet never forgets about that time you got drunk and posted stupid photos, but it forgets everything else? God damn.

  74. Re:Google hire me, I solved this problem in 3 seco by Ant+P. · · Score: 1

    Great. Now make your solution continue to work 20 years from now when the Windows XP activation service ceases to exist, which is what TFA is actually about.

  75. a lifetime ago by mcswell · · Score: 1

    "...software lifetime is only like 7 or fewer years..." Do you have a source for this, or is this your guess?

    I'm not asking to disagree, quite the opposite: for seven years (coincidence) now, I've been arguing for storing grammar data in an XML format precisely because storing it in the programming language of a particular grammar parser means it will be unuseable in the not-so-distant future. While I have anecdotes (I once wrote a parser using three programming languages, and all three of them became obsolete within a year or two), I would love to have a study to cite.

  76. Re:Except spoken word changes too by mcswell · · Score: 1

    Rumor has it the Bible is still readable after a couple thousand years. In Greek, Hebrew and Aramaic if you take the time to learn, else in translation.

  77. cc:Mail by mcswell · · Score: 1

    And I have email from the 1990s that I canNOT read today. It's called Lotus cc:Mail. (I could read it if I was willing to pay.)

    "Digital data lasts forever -- or five years, whichever comes first."
              --Jeff Rothenberg, 1997

  78. Re:Google hire me, I solved this problem in 3 seco by dicobalt · · Score: 1

    Windows loader, or an army of lawyers.

  79. Re:What matters? by Macgrrl · · Score: 1

    My husband and I have been writing roleplaying games for nearly 20 years together. Many of the older games in effect only exist as hardcopy because the softcopies are on outdated media like floppy discs and zip cartridges in old versions of PageMaker or Quark that we can no longer open.

    He is keen on storing new games on GoogleDocs but I'm reluctant to trust them to an external 3rd party who has a history of killing services. I have much more faith in storing the content as txt or rtf files moving them from computer to computer as we migrate.

    --
    Sara
    Designer, Gamer, Macgrrl in an XP World
  80. Re:The Print Button by Macgrrl · · Score: 1

    If you want to keep it, you should probably be laser printing it, inkjet ink fades.

    --
    Sara
    Designer, Gamer, Macgrrl in an XP World
  81. Who knows by metaforest · · Score: 1

    We don't know what this means either.... proprietary format... encrypted... and it cost a lot to send it.... alas it never arrived.

    AOAKN HVPKD FNFJU YIDDC
    RQXSR DJHFP GoVFN MIAPX
    PABUZ WYYNP CMPNW HJRZH
    NLXKG MENEK ONOIB AREEQ
    UAOTA RBQRH DJoFM TPZEH
    LKXGH RGGHT JRZCQ FNKTQ
    KLDTS GQIRU AOAKN 27 1525/6

    NURP 40 TW 194
    NURP 37 DK 76

    lib 1625
    ToR 1522 copies sent 2

    signed W. Stot, S(j/g)T.