Slashdot Mirror


Vint Cerf: Data That's Here Today May Be Gone Tomorrow

dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'" We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.

63 of 358 comments (clear)

  1. My data will be readable by drinkypoo · · Score: 3, Informative

    My data will be readable because I use bog-standard formats. If I get really froggy I use HTML, and you can just strip the tags and read that.

    If his data won't be readable, that's his problem. Anything you want to save for posterity, export it now.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:My data will be readable by Bremic · · Score: 5, Insightful

      Until HTML includes DRM and half the stuff you create ends up being unreadable.

      Well, really we are probably good for anything that can be opened in a text editor for a long long while; but the point is there. Anything can be lost to data format shifts.

      As someone who had to re-type a 80 page document because the company stopped using the software the document was created on, and didn't have a licence for it an no converter found online worked - I can say this does happen.

      How many people are going to shell out $600 for software to open something they want to make an edit on? How many are going to just give up and find someone to rekey it, or just give it up as a loss?

      With more and more systems including format locks, in 50+ years historians will likely have a lot of trouble finding out details from today. Kind of like it is now when we go to look at archival film from WWII and find it's all faded into obscurity. We have the same problems, just with different causes. Then it was lack of preservation of a medium with a limited lifespan. Now it's storing stuff in formats that will go away as they are improved upon, blocked, or just forgotten about.

      Sure if your in your 20s, or even 30s, you probably haven't realized the copy of your grandfathers photos are sitting on a floppy disk in a proprietary format. But when you get older you may encounter these issues.

    2. Re:My data will be readable by Nutria · · Score: 4, Informative

      Or NASA data from deep space probes that's stored in now-unknown formats on mag tapes from long, long, long gone manufacturers.

      --
      "I don't know, therefore Aliens" Wafflebox1
    3. Re:My data will be readable by starburst · · Score: 2

      From a 2002 slashdot story:

      mccalli writes :
      "Thought people might find this amusing. In 1986, the UK compiled an electronic [copy of the] domesday book. They used BBC Master computers to do it, and the result was put on laserdisc. I actually used this project whilst at school. This article states that nothing can now read these merely 15-year old discs. The original, written approx. 1086, is still doing fine thank you very much."
      Sounds like a good candidate for Bruce Sterling's Dead Media Project. (Speaking of Sterling, the "graying cyberpunk" has an interesting article in the Austin Chronicle on the upcoming SXSW Interactive conference called "Information Wants to be Worthless" -- thanks to reader ag3n7.)

    4. Re:My data will be readable by ganjadude · · Score: 2

      why didnt you OCR and then make the edits? There are numerous OCR options that would have fit that need no?

      --
      have you seen my sig? there are many others like it but none that are the same
    5. Re:My data will be readable by geniice · · Score: 2

      In fairness they did manage to transfer the stuff off the discs and put the stuff without copyright issues online.

    6. Re:My data will be readable by Concerned+Onlooker · · Score: 2

      "How many people are going to shell out $600 for software to open something they want to make an edit on?"

      The upside to this is that when somebody wants to update that nifty company Flash web site and discovers that Flash now costs an arm and a leg, the site gets re-written in html.

      --
      http://www.rootstrikers.org/
    7. Re:My data will be readable by kermidge · · Score: 2

      Well, there's the problems with the medium itself, then there's the format, as you say (ought to be right up a cryptanalyst's alley, tho), then there's the real blocker: number of tracks, head design, and the circuitry that goes with it. Unless there are good documents for the machine's design and building, or one can be found in working order in a museum, you're SOL. It's a big problem that doesn't get much exposure.

  2. emulation / virtualization by smash · · Score: 2

    Support emulatorVM developers! Encapsulate your entire machine in a VM and you can run the entire software stack if necessary. Anything you need convenient access to, export to CSV, XML or some other standard format.

    --
    I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    1. Re:emulation / virtualization by Anonymous Coward · · Score: 5, Funny

      You're very clever, young man, very clever - but it's VMs all the way down!

    2. Re:emulation / virtualization by Mitchell314 · · Score: 2

      Honestly, reverse engineering ACII plain text files would be trivial. Not to the average person, but to somebody with a bit of background:
      A) We have software that can use something called frequency analysis to decipher something encoded that has a 1-1 correspondence so something we know (ie the english alphabet).

      B) Ignoring software, frequency analysis is something that could be (and before the days of computers, was) done by hand. Hell, some things could be picked out by eye. For one, all files would have a particular byte character that appears near the end of every (well formed) text file, as well as often appearing periodically through the average file. A key indicator of being a newline/carriage return. Also in the bulk of most documents the new line is followed by a particular other character that also appears in a periodic manner. Being the period. And then another character appearing often every so often (on average around 5-6 characters), a good candidate for the space character. I and A also being somewhat easy to pick out (the whole upper/lower case making it a bit harder, but still doable). With a bit more dedication, you can start guessing common words, such as a common letter followed by a less common letter followed by a very common letter ('the' sounds like a good candidate). And then to figure the rest out, compare the average frequencies of characters across many documents to the average frequencies of letters and punctuation in documents we already know. A decent undergrad senior in computer science could write a program to do this. Hell, I took a sophomore level math class that went over this.

      --
      I read TFA and all I got was this lousy cookie
    3. Re:emulation / virtualization by geniice · · Score: 2

      There are a few industrial setups where that is pretty much what has happened.

    4. Re:emulation / virtualization by smash · · Score: 3, Interesting

      err... plus DosBox is running x86 software I have from 198x...which is 30+ years now.

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
  3. We should have listened by Anonymous Coward · · Score: 5, Insightful

    We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.

    I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.

    1. Re:We should have listened by plover · · Score: 2

      Actually, languages have been consolidating and standardizing rapidly with the advent of the printing press, effective and affordable transportation, broadcast media like TV and radio, and the Internet. Diversity of language is rapidly disappearing.

      The way things are going now, there will be only a few dozen languages left at the end of this century, and possibly only a handful after the hundred years that follow.

      Although it's entirely possible that technology will preserve native languages, too. If machine translation becomes as easy as slipping a Babel fish in your ear, people won't feel the need to drop their mother tongue for English or Mandarin.

      No matter what, we'll all still be yelling hateful things at each other, but at least we'll understand the insults the other guy is hurling.

      --
      John
  4. Re:So? by MrBandersnatch · · Score: 5, Insightful

    I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....

  5. Yes, backwards compatibility, blah blah blah... by Narcocide · · Score: 5, Insightful

    Yes, you're right I have this ASCII text file created in 1997 and I can't find anything to read it...

    OH WAIT ACTUALLY FUCKING *EVERYTHING* STILL READS IT.

    Stop gargling Microsoft's balls so much and wipe off your chin. Proprietary data formats are THE PROBLEM. Stop trying to redirect public discourse with this thinly veiled bullshit.

    1. Re:Yes, backwards compatibility, blah blah blah... by Nerdfest · · Score: 4, Informative

      Odds are that you don't need to convince Vint Cerf or Google in general about the advantages of open formats.

    2. Re:Yes, backwards compatibility, blah blah blah... by PPH · · Score: 2

      Just Googled "ebcdic to ascii converter"

      About 123,000 results.

      --
      Have gnu, will travel.
    3. Re:Yes, backwards compatibility, blah blah blah... by fuzzyfuzzyfungus · · Score: 2

      But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

      How deep are your pockets?

      *IBM Consulting*

    4. Re:Yes, backwards compatibility, blah blah blah... by Anonymous Coward · · Score: 5, Insightful

      But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

      $ printf "\xC5\xC2\xC3\xC4\xC9\xC3\x25" | iconv -f ebcdic-us -t ascii
      EBCDIC
      $ dpkg -S `which iconv`
      libc-bin: /usr/bin/iconv
      $ apt-cache show libc-bin | grep -e Essential -e Priority
      Essential: yes
      Priority: required

      So we got a program that can convert from EBCDIC-US to ASCII (or UTF-8 or whatever you want) and that program is in an Essential/Required package on any Debian-based system and for some reason you say that "aren't commonplace"?

      Are you on crack?

    5. Re:Yes, backwards compatibility, blah blah blah... by felipekk · · Score: 2

      Just Googled "oranges to apples converter"

      About 4,780,000 results

  6. DRM and the digital black hole by Neo-Rio-101 · · Score: 4, Interesting

    A perfect example of this is basically the issue of old video games. (I may as well bring this up because it's going to come up)

    Recently, the Internet Archive stored a whole pile of TOSEC collections of games from various old systems (thanks to their DCMA exemption of being an archival repository so that they can legally do this). Data and information that would have otherwise been completely lost into a digital black hole, if it weren't for the fans of the system, and the dedicated teams of people collecting and amassing this software as a hobby.... in breach of copyright.

    The problem with DRM is that without dedicated crackers and pirates, unless the original rights holders are around long enough to resell old titles for that long (which most aren't), old games will simply disappear into a digital copyright black hole and never be seen again. This happens once the computer/console system system is old, not sold anymore, and forgotten about, and the media degrades and isn't backed up in some form (in breach of EULA). If people aren't able to collect the software and hang on to it, preserving/duplicating the media while still in copyright, it's going to vanish. Culturally important games of significance will be lost forever, and that, if anything is as much a crime as it is to pirate software in the first place.
    It's only due to the efforts of an army of swappers/crackers, etc, that most of the old games on old systems were even preserved.

    The steam model on PC is quite good though as it makes a few compromises where you can actually make backups and go offline if you want.
    For old computers and consoles however, this doesn't apply,.... and with some more restrictive attempts to squash the used game market, and force internet-always-connected authentication on upcoming consoles to even play the game... one has to wonder if the game companies deliberately want to squish all traces of their old work, let it disappear into the ether, and to resell you this year's football game which is just like last year's. I fear that this is where we are headed (if we aren't there already)

    --
    READY.
    PRINT ""+-0
    1. Re:DRM and the digital black hole by jeffasselin · · Score: 4, Interesting

      What about online-only games? Will historians in 100 years be able to play WoW and see what the game was like?

      --
      If he explores all forms and substances Straight homeward to their symbol-essences; He shall not die.
    2. Re:DRM and the digital black hole by Mitchell314 · · Score: 2

      Luckily for them, no.

      --
      I read TFA and all I got was this lousy cookie
    3. Re:DRM and the digital black hole by timeOday · · Score: 2

      Nor will they be able to join in World War II to see what that was like. However there is more recorded footage of WoW than WWII for future historians to study.

  7. Don't forget DRM by onyxruby · · Score: 4, Insightful

    Were living in what could well be a future dark age for archeologists / historians. Hardly anything is put into a nice hard format (stone is incredibly rare and metal gets stolen) for someone to find. What's left suffers from incompatible file formats, acid based paper that decomposes, bit rot, cryptography, incompatible technology for data storage and worst of all DRM. With DRM you have active measures that try to prevent something from being usable.

    In the old days people stopped use with armed guards, obfuscation and primitive crypto. Today we have servers that are required for operational functionality for many products. With the advent of the cloud you have reasons for storing things where you have a dependency on a third party. How many services that are cloud / server based have come about and gone tits up?

    Even having a large well known brand name doesn't protect you from having a server shut down. Just think of Microsoft's play4sure service that lasted less than a decade. Having a license and a physical disk isn't that helpful when the DRM requires an authentication server that doesn't exist. With the movement to put more and more DRM into the cloud or with SSL certificates (again dependent upon servers and naturally time bombed) this is going to be a problem that will only grow worse.

    Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

  8. *sigh* by MrBandersnatch · · Score: 2

    Digital archival is one of the HARD problems. Over the last 40 years we have already lost more cultural artifacts that were created for the entirety of human history. A great deal of that is useless garbage of course but the original moon landing tape? 1000s of government emails reavealing exactly what was going on at pivotal times in history?

    The truth is, we need systems for hardcopy; digital is too tranient; emulators are a useful stop gap measure but dont protect againt the kinds of catastropic failures that we will likely see over the longer time frame; and we need indexing because someone at somepoint will want to wade through our digital ditritus.

  9. Re:XML? by cheater512 · · Score: 2

    In to a usable document from scratch? Pretty hard. Ever looked at the XML of a moderately complex document?

  10. Re:XML? by ShanghaiBill · · Score: 2

    I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

    Yes, the problem is not "data" but "data in proprietary formats" ... and even that is becoming less of a problem. A converter to/from almost anything is usually just a google search away. With VMs and emulators, even proprietary binary programs are easier than ever to deal with. I can run any CP/M or C64 program on my desktop Linux computer using free emulators. This was indeed a "hard problem", but today it is mostly solved.

  11. This is news? Nope. Not new... by flogger · · Score: 2

    This has been true of all technology in the past and will continue into the future. Just look at film. How many preserved films from 1915 are still around? Just the ones that were recorded into a new format of film, then a newer format of film, then into a VHS, then into a LaserDisc, then a DVD, then a BlueRay... (Metropolis, I am looking at you.)

    Within arms reach, I have Floppy drives that contain files created in AMI Pro work processors.... WHen I say Floppy, I am talking about the 5 1/4 inch floppies.
    Technology hardware and software is not stagnant... It will always continue to develop and progress (ignore windows 8). Data that is worth keeping will get converted. Data that isn't will get left behind. I would not be surprised that in about 25 years, there will be "classic" software as there is Classic literature...

    Too much typing.. going back to drinking.....

    --
    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
    "First things first -- but not necessarily in that order"
    -- The Doctor, "Doctor
    1. Re:This is news? Nope. Not new... by Dogtanian · · Score: 2

      You put your finger on it. I'd just add what I had planned on saying- that, in general, it's not always obvious what's going to be "useful" and "of interest" to future generations when it isn't practical to keep everything.

      In fact, a lot of things that would be of interest to us- i.e. everyday, mundane life- was never recorded at all, back when film and equipment were quite expensive and the effort and cost would have been saved for documenting "important" occasions. Even at a personal level, if I'd known that something like the Internet would become as important as it has, and that there'd be projects like Wikimedia Commons and the like, I might have photographed more of the things around me in my relatively mundane home town while growing up in the 1980s.

      --
      "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
  12. Tax Records by PPH · · Score: 2

    The IRS wants to audit me, going back several years. I kept the records as required but they are unreadable now.

    Thanks Microsoft!

    --
    Have gnu, will travel.
  13. Re:XML? by fuzzyfuzzyfungus · · Score: 2

    I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

    Binary formats were standard for everything up through Office 2003. Office 2007(2003 with optional converter pack and some weird bugs) could output something XML based, though I have the vague memory from the OpenDocument/Open Office XML slugfest that 2007 produced something that deviated from the theoretical ideal of OOXML in some respects, and that full conformity happened at 2010 or 2013. I might be remembering that wrong; but anything before 2003, and a lot from 2003 were definitely binary.

  14. Re:So? by fuzzyfuzzyfungus · · Score: 2

    I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....

    Even if you don't care about the historians, I'm sure the lucky people who have the pleasure of handling property deeds at your local governance hive can tell you a story from within the last week or two about needing to pull some rather seriously dusty documents to allow a present-day transaction to go through without incident.

    Many data will, indeed, be of no interest at all, or the same historical interest that neolithic refuse dumps are; but data in the nontrivial-number-of-decades range are still live in more than a few contexts.

  15. Maybe. by MrEricSir · · Score: 4, Insightful

    XML doesn't magically solve everything in this regard. If there's no good documentation for the format, it's unlikely you'll be able to display everything exactly as intended. Likewise, if the format is hideously complex (see: Microsoft Office Open XML) or there's bugs in the de-facto implementation, it's going to be tricky to reverse engineer.

    I'd also point out that MS Office spits out compressed XML. I believe it's based on ZIP, which is very well documented, but that's yet another hurdle to cross. And then you have to deal with the binary format of the XML itself -- ASCII, UTF8, etc.

    --
    There's no -1 for "I don't get it."
    1. Re:Maybe. by wonkey_monkey · · Score: 2

      ZIP format is documented.

      Right now it is. What about the ragtag bunch of misfit librarians who are all that's left after the zombie apocalypse?

      They burned all the books for warmth and to keep the zombies away.

      --
      systemd is Roko's Basilisk.
  16. On the PowerPoint 4.0/95 converters... by yuhong · · Score: 4, Insightful

    MS removed the PowerPoint 4.0/95 converters completely with Office 2007 for Windows and later, and disabled them by default in Office 2003 SP3. And the PowerPoint 4.0 converter (but not 95) was disabled by default instead of fixed with MS09-017.

    On the Mac, they removed then even earlier, when they ported Office to Carbon.

    IMO it would be a good idea for MS to package PP4X32 and PP7X32 from PowerPoint 2003 separately, along with a utility to call the converters of course.

  17. Uh, hello? by DogDude · · Score: 4, Funny

    For a supposedly smart guy, he seems a bit silly:

    He could've just downloaded MS's Powerpoint 97 viewer

    --
    I don't respond to AC's.
  18. Re:He's mistaken by yuhong · · Score: 2

    Have you tried disabling the file blocks first? At least Word for Mac 4.x and 5.x can be read this way.

  19. I have legible pictures over 150 years old by the_rajah · · Score: 2

    Some are glass plate Daguerreotypes. Somehow, I am not too confident that my digital pictures will be legible 150 years from now, unless I make a good quality print on archival paper. Digital files are too easily corrupted and made totally useless. Media formats will change. 8" floppies anyone?

    --


    "Do the Right Thing. It will gratify some people and astound the rest." - Mark Twain
    1. Re:I have legible pictures over 150 years old by jafac · · Score: 2

      yes - this is a real issue - and ARCHIVED data that is important DOES need to be "spun up" and refreshed to new media.

      If it's hard drives, yes. If it's optical media. . . well that depends. Because some optical media just plain degrades over time. Some is written in special proprietary formats (like Apple's early implementations of CD+R) that you're going to have a hard time reading with CURRENT equipment.

      If your data is archived to tape, and more than 10 years old, I'm afraid you're fucked.

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  20. No different than cars by HockeyPuck · · Score: 4, Interesting

    We're still able to restore cars from the 80s and earlier as the cars were fully mechanical or hydraulic. No computers.

    Fast forward to 20yrs from now, nobody's going to be carrying the computer boards for a 2004 Toyota Pruis or a 2013 Tesla.

    However, you'll still be able to restore your grandfather's '57 Chevy...

    1. Re:No different than cars by AK+Marc · · Score: 3, Informative

      You'll just have to take the Prius ROM on an emulator on your phone, and plug in your phone to drive your car. Easy.

  21. Code should accompany data by michaelmalak · · Score: 4, Interesting

    I presented a solution to this long-standing problem last year to the Denver HTML5 Meetup.

    Code should never be separated from data. This is possible with HTML5, JavaScript, and open source.

    In the presentation, I steal and repurpose Hofstadter's analogy of DNA to an LP vinyl record, which is an information bearer, but useless without its information retriever (the record player). Like the cell of an animal, which contains both DNA and the means to "play" it, I ask why not the same with software?

    My maxim is: data should always carry the code with it to play itself. It was inspired from the field I've spent 50% of my career in: non-destructive testing where, for example, X-Rays and ultrasounds are performed on safety-critical industrial parts with 50-year service lives. If one of those parts fails and kills someone, you're going to want to go back into the old data and find the earliest indication of the flaw or fault and reinspect every other part in the world like it that is still in service. And maybe you need to go back 50 years. Under such a context, not providing the code with the data could be considered an act of gross neglect.

    In my presentation, I use the 1990's era trick of embedding XSL into an XML file, with the addition of the XSL now being able to use HTML5/JavaScript. Sadly, I've only gotten it work with Firefox -- the other browsers consider it a security violation.

  22. see Windows 1250 and 1251 by Doug+Merritt · · Score: 2
    Windows 1250 and 1251 do, and possibly others. It sounds familiar, but my memory is fuzzy, so I just looked around.

    https://en.wikipedia.org/wiki/Windows-1250

    --
    Professional Wild-Eyed Visionary
  23. real problem is: FEATURE CREEP by bussdriver · · Score: 2

    I've been part of archival problem planning. We went with DVD. now I am not there, I suspect they are thinking DVD sucks and are moving "forward" when the DVD was more than good enough and those plastic discs will last a century. mpeg-2 files will have open source decoders. Now physical readers will still be a problem... the only solution is to wait as long as possible and then switch to the next long lasting format - but not necessarily the newest one at that time. (which is why moving to blueray is a waste of money.)

    The biggest problem with other formats is the FORMAT; even with something like open office documents, the ODF format will have revisions and new features added and tweaks to the format. version 2, 3 etc. The features and changes that promote the creation of more and more formats is the biggest problem. Just like my above DVD video problem- if you go beyond your needs then you are complicating things with more and more formats.

    TEXT? sucks. we need WORD! Word 1.0? the app sucks... we need WORD 20! (and all versions in between to migrate the old docs...plus labor to deal with conversion issues...)

    Perhaps we need ARCHIVAL formats; like PDF, which has done besides the stupid additions Adobe has been making to it. Or just TEXT export... a less bloated output only format without the feature BS problems.

    Thankfully, email remains the same... sort of. although storage of the emails differs greatly; if you want to archive emails you need to pick a close-to-the-source method (and simple storage filesystem-- good luck reading that NTFS formatted disk image in 30 years.)

  24. Re:XML? by belmolis · · Score: 2

    Both have published specifications, so reverse engineering shouldn't be necessary. However, Microsoft's XML includes things that are not defined in the specification. That was one of the objections to giving it status as an open standard.

  25. I do blame Microsoft by Darinbob · · Score: 4, Informative

    Seriously, why would Vincent Cerf not blame Microsoft? They have an extremely poor track record with backwards compatibility, and I don't think they even know what forwards compatibility is. If you design the data formats correctly then you can keep things usable for decades (or centuries). Guess what, twenty year old TeX documents still work, and yet Word X won't work with Word X-2. I've pulled runoff documents off of 70's versions of Unix that can still be printed. That says to me that one can deal with compatibility issues.

    This is all intentional on Microsoft's part too. They make money when customers buy new copies of software, so it is in their best financial interests to make sure that customers have significant pressure to upgrade. I remember the solution to an acknowledged bug for Word 97 was to make sure that everyone who was going to read your document had the appropriate Word 97 plug in in their older version of Word. I completely blame Microsoft here.

    This is not that hard a problem, IF the company pays attention to it and gives it even a small amount of priority.

    1. Re:I do blame Microsoft by mhotchin · · Score: 2

      To say that MS has a poor record of backwards compatibility is, well, ridiculous. It's only just about *the* most important thing for them, because the majority of their business is with busnesses, and if their FooBar app doesn't run, then they don't upgrade.

      No other OS has near the level of compatibility that the MS sequence does.
      http://www.youtube.com/watch?v=vPnehDhGa14

      http://blogs.msdn.com/b/oldnewthing/archive/2006/11/06/999999.aspx
      http://blogs.msdn.com/b/oldnewthing/archive/2003/08/28/54719.aspx

    2. Re:I do blame Microsoft by serviscope_minor · · Score: 2

      No other OS has near the level of compatibility that the MS sequence does.

      Somebody's been drinking the kool-aid.

      There's a small, little known company called IBM selling a type of computer called a "mainframe" which might beg to disagree. You can buy a modern mainframe which will still run your unmodified programs which you wrote on an original System 360. In 1964.

      Microsoft have not even existed as long as that chain of backwards compatibility, and you try getting the original digger to run on Windows 8 (or RT! ha! instruction set changes are no barrier to IBM apparently) without Dosbox.

      --
      SJW n. One who posts facts.
    3. Re:I do blame Microsoft by Darinbob · · Score: 2

      The OS stays compatible in some ways (Windows is not at all unique here). However the Microsoft applications have serious problems in this regard. Maybe some of the competition is not so great either but it's no excuse when Word can't even be compatible with itself. They have changed the file format in Word in fundamental ways several times.

    4. Re:I do blame Microsoft by drinkypoo · · Score: 2

      No other OS has near the level of compatibility that the MS sequence does.

      It's called ANSI C on Unix. Pick up a copy of The UNIX Programming Environment and you can still use the examples verbatim on a Linux machine today. And you can even still use Motif apps, if we're talking about GUI programs. They still work just like they did when they were new, except a hell of a lot faster.

      Oh, you want backwards compatibility for closed-source software? Guess what? Plenty of software craps itself when it does anything interesting on the wrong version of windows. In reality, there's only one way to ensure compatibility, and that's to have your hands on the source — and for it to be worth a crap to begin with.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  26. "hard problem" by macraig · · Score: 4, Insightful

    Vint, that's bullshit and you know it. It's nothing more than preserving syntaxes, grammar, file formats. That's not hard, and it only requires someone to create a format conversion ONCE to solve the problem at each stage of the evolution.

    The real problem here is proprietary non-public formats and structures. When the structure of data has been a closely guarded secret and requires reverse engineering that may not even yield a perfect result, THAT is hard.

  27. Re:Code should NEVER accompany data! by lahvak · · Score: 4, Insightful

    No! Fail! You don't get it!

    1) Code is data
    2) Code is data that is especially hard to interpret
    3) One of the main reasons of all this mess ia that in all those proprietary formats, data is intermixed with code, and the whole mess is very hard to parse.

    Data should be kept completely isolated, as far away from code as possible. That way, if you cannot interpret the code any more, you will still be able to analyze and parse the data. You know, it is not that hard to construct a record player.

    --
    AccountKiller
  28. He should be blaming Microsoft by gweihir · · Score: 2

    My first Latex publications from 20 years back and all my human-readable ASCII scientific data still be read and used without any problem. Human-readable file
    formats in the UNIX tradition completely solve this problem.

    This problem is only hard if the people making the data formats are either stupid or do not want their formats to be easily accessible to other applications, as Microsoft does. Of course, others are creating just as fundamentally broken formats for either of the same reasons.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:He should be blaming Microsoft by hcs_$reboot · · Score: 2

      Just hex print the MS 97 file and you have a human readable format:

      00007b0 5f00 675f 6f6d 5f6e 7473 7261 5f74 005f
      00007c0 696c 6362 732e 2e6f 0036 5f5f 7270 6e69
      00007d0 6674 635f 6b68 6500 6978 0074 6573 6c74
      00007e0 636f 6c61 0065 626d 7472 776f 0063 706f
      00007f0 6974 646e 7300 7274 636e 7970 7000 7475
      0000800 0073 6177 6e72 0078 5f5f 7473 6361 5f6b
      0000810 6863 5f6b 6166 6c69 6900 7773 7270 6e69

      --
      Slashdot, fix the reply notifications... You won't get away with it...
  29. the man is out of touch by stenvar · · Score: 2

    You can get emulators for just about every machine you can imagine: PDP-10, PDP-11, DOS, Atari, Amiga, C64, microcontroller, etc. You can get hardware emulators with FPGAs if you like. Almost any important format is documented or has been reverse engineered. Yes, you can easily read 1997 PowerPoint files, even if his weird choice of Office on Mac can't. And that's only with current technology. Give it a few decades and all that can happen behind the scenes and computers will just automatically perform even the most complicated data conversions behind the scenes. "Computer, scan the 1997 floppy and put the data on screen."

  30. Re:XML? by gweihir · · Score: 4, Insightful

    Have you seen what some people (and MS) do with XML? And what convoluted structures they use? Coded in binary? With compression and other eminently hard to understand stuff? Most of these things will be readable just as long as the applications that created them are around, but not longer.

    Forget XML. Forget Unicode as well. Plain ASCII is the only thing that works. Simple PDF or PostScript will work also, because the standards and open-source tools to read them will still be around. But nothing as complicated as a MS office document will survive. LibreOffice formats may have a chance, because LibreOffice may still be compilable and runnable (being FOSS), but only because of that and I would not bet on it.

    Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  31. Re:So? by wickedskaman · · Score: 2

    Who hurt you? :-(

    --
    Sand's overrated... it's just tiny little rocks.
  32. Re:XML? by Dr_Barnowl · · Score: 4, Informative

    Not even Microsoft can implement their Office XML "standard" ; from examination it's pretty much a direct name-for-name serialization of their internal binary structs, with some of the more obvious gaffes like explicitly saying "do this like this old version of Word" hastily renamed to placate ISO. It needs you to implement a whole bunch of specific behaviours if you want it to work in the MS software (things like "if you update this bit, you also have to update this other bit just so or it won't work"), but these aren't documented.

    You've got more of a chance, sure, just because the structs are marked and you don't have to infer where their boundaries are, but it's a far cry from ODF which was designed from the outset to be an open XML format rather than just hastily being bunged together to permit large purchasing bodies (like governments) to tick the "Open format" box on their form.

  33. Re:XML? by Half-pint+HAL · · Score: 2

    Holy shit, yeah, you're right - it's totally impossible to strip out the XML tags and be left with readable plain text content!

    I bet nobody could ever decode it!

    You seem to be assuming a flat-text file with predictable order. Strip the XML out of anything in a tabular format (eg a spreadsheet -- see TFS) and you lose vital data. Blank cells are lost and the tabulated data no longer lines up.

    It gets worse in a filetype with unstructured formatting, eg DTP and slideware. You've got a collection of elements that are only ordered by their metadata. The explanatory labels you want to overlay on top of that image? They're no longer linked to it and you've no way of knowing what they're their for. Multiple news stories on the same page merge into one, and have been divorced from their headlines.

    Readable != useful.

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  34. best safeguard by hduff · · Score: 2

    The best safeguard is the abandonment of all existing proprietary formats to freedom (so anybody can write conversion software) and the proliferation of open formats on an ongoing basis.

    --
    "I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert