Vint Cerf: Data That's Here Today May Be Gone Tomorrow
dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'"
We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.
I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.
Careful with names containing L slashdot.org/~AiphaWolf_HK slashdot.org/~AlphaWoif_HK slashdot.org/~AiphaWoif_HK
My data will be readable because I use bog-standard formats. If I get really froggy I use HTML, and you can just strip the tags and read that.
If his data won't be readable, that's his problem. Anything you want to save for posterity, export it now.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
What's a Macintosh?
What ever it is, I bet if he used LaTeX+Beamer he wouldn't have this problem. Whether it was authored in 1997 or 2011, it almost certainly would still work on a "Macintosh". Maybe he could learn a thing or two from Donald Knuth and Leslie Lamport, and stop playing around with the rugrats at Google.
Support emulatorVM developers! Encapsulate your entire machine in a VM and you can run the entire software stack if necessary. Anything you need convenient access to, export to CSV, XML or some other standard format.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.
I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.
I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....
Yes, you're right I have this ASCII text file created in 1997 and I can't find anything to read it...
OH WAIT ACTUALLY FUCKING *EVERYTHING* STILL READS IT.
Stop gargling Microsoft's balls so much and wipe off your chin. Proprietary data formats are THE PROBLEM. Stop trying to redirect public discourse with this thinly veiled bullshit.
If there is a demand to open up and view a certain file type there will always be someone to create an app or website which will either open up the file or convert it to a more compatible format. There are already services out there that convert word to pdf for example oh and I just found an iPhone app for converting files, yay!
Anveto
Man, fuck the future (that's right you historians-not-yet-born). They have all the flying cars and meal-in-a-pill's and immortality clinics and shit. The hell have they done for us to deserve our sympathy? If that means we can make them have to work that much harder to see how life was now, I say do it.
Now back to my zombie virus work. Anybody got a decent time capsule for me to use?
I read TFA and all I got was this lousy cookie
A perfect example of this is basically the issue of old video games. (I may as well bring this up because it's going to come up)
Recently, the Internet Archive stored a whole pile of TOSEC collections of games from various old systems (thanks to their DCMA exemption of being an archival repository so that they can legally do this). Data and information that would have otherwise been completely lost into a digital black hole, if it weren't for the fans of the system, and the dedicated teams of people collecting and amassing this software as a hobby.... in breach of copyright.
The problem with DRM is that without dedicated crackers and pirates, unless the original rights holders are around long enough to resell old titles for that long (which most aren't), old games will simply disappear into a digital copyright black hole and never be seen again. This happens once the computer/console system system is old, not sold anymore, and forgotten about, and the media degrades and isn't backed up in some form (in breach of EULA). If people aren't able to collect the software and hang on to it, preserving/duplicating the media while still in copyright, it's going to vanish. Culturally important games of significance will be lost forever, and that, if anything is as much a crime as it is to pirate software in the first place.
It's only due to the efforts of an army of swappers/crackers, etc, that most of the old games on old systems were even preserved.
The steam model on PC is quite good though as it makes a few compromises where you can actually make backups and go offline if you want.
For old computers and consoles however, this doesn't apply,.... and with some more restrictive attempts to squash the used game market, and force internet-always-connected authentication on upcoming consoles to even play the game... one has to wonder if the game companies deliberately want to squish all traces of their old work, let it disappear into the ether, and to resell you this year's football game which is just like last year's. I fear that this is where we are headed (if we aren't there already)
READY.
PRINT ""+-0
Print Everything!
Problem solved.
Saw info on a book on this topic today, in fact: http://filesthatlast.com/about/ . Looks interesting so far.
Were living in what could well be a future dark age for archeologists / historians. Hardly anything is put into a nice hard format (stone is incredibly rare and metal gets stolen) for someone to find. What's left suffers from incompatible file formats, acid based paper that decomposes, bit rot, cryptography, incompatible technology for data storage and worst of all DRM. With DRM you have active measures that try to prevent something from being usable.
In the old days people stopped use with armed guards, obfuscation and primitive crypto. Today we have servers that are required for operational functionality for many products. With the advent of the cloud you have reasons for storing things where you have a dependency on a third party. How many services that are cloud / server based have come about and gone tits up?
Even having a large well known brand name doesn't protect you from having a server shut down. Just think of Microsoft's play4sure service that lasted less than a decade. Having a license and a physical disk isn't that helpful when the DRM requires an authentication server that doesn't exist. With the movement to put more and more DRM into the cloud or with SSL certificates (again dependent upon servers and naturally time bombed) this is going to be a problem that will only grow worse.
Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.
Digital archival is one of the HARD problems. Over the last 40 years we have already lost more cultural artifacts that were created for the entirety of human history. A great deal of that is useless garbage of course but the original moon landing tape? 1000s of government emails reavealing exactly what was going on at pivotal times in history?
The truth is, we need systems for hardcopy; digital is too tranient; emulators are a useful stop gap measure but dont protect againt the kinds of catastropic failures that we will likely see over the longer time frame; and we need indexing because someone at somepoint will want to wade through our digital ditritus.
with office 2007 pro, maybe if your were not using the student license you would have power-point as well
What really matters besides photographs? I back mine up on a number of offsite solutions that I control ( hard drives ) and the re- backup every year on slightly newer hardware. I also rotate through a variety of online cloud solutions and all that stuff to make sure I am backed up on whatever is the current popular services. Okay I realize more than photographs matter bit that's what matters to me. I don't see it being a huge issue.
Vint Cerf jumped the shark a long time ago - when ICANN became more of a money making venture than something to make the Internet better.
I've got email from the 1990s I can still read today.
The gifs and jpgs from back then are still viewable today, and will likely to be viewable in 20 years.
You have to keep migrating data off your old storage media _hardware_. And that can be a problem if you don't actually have enough bandwidth for your archive size.
I have plenty of old powerpoint 97 presentations ( and 98 since there was no 97 for mac ) and they work just fine with Office 2011.
Maybe if he had some plugins or something that were OS 9 or OS X PPC specific he couldn't load it?
Anyway who cares... fire up SheepShaver in a virtual machine or OS X 10.4 in Virtualbox and launch it from there.
I think he's making a mountain out of a mole hill here... That I can download any Atari 2600 or Commodore 64 product ever made and run it on anything from my Mac to my PC to my Android phone tells me that we're not at risk of losing anything any time soon.
Also, why didn't he write it in LaTeX like he should have in the first place? :P
This has been true of all technology in the past and will continue into the future. Just look at film. How many preserved films from 1915 are still around? Just the ones that were recorded into a new format of film, then a newer format of film, then into a VHS, then into a LaserDisc, then a DVD, then a BlueRay... (Metropolis, I am looking at you.)
Within arms reach, I have Floppy drives that contain files created in AMI Pro work processors.... WHen I say Floppy, I am talking about the 5 1/4 inch floppies.
Technology hardware and software is not stagnant... It will always continue to develop and progress (ignore windows 8). Data that is worth keeping will get converted. Data that isn't will get left behind. I would not be surprised that in about 25 years, there will be "classic" software as there is Classic literature...
Too much typing.. going back to drinking.....
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
"First things first -- but not necessarily in that order"
-- The Doctor, "Doctor
The IRS wants to audit me, going back several years. I kept the records as required but they are unreadable now.
Thanks Microsoft!
Have gnu, will travel.
Man, don't be like that. If we're nice to the future, they might give us time machines!
That people in the far future would be getting smarter to accomplish this - probably a tossup - and apart from it, it's very questionable if a far future for humanity even exists, the way "humanity" is behaving this days/years/decades/centuries/millenia....
Maybe there are smarter robots by then babysitting...
+1 Underrated.
I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....
Even if you don't care about the historians, I'm sure the lucky people who have the pleasure of handling property deeds at your local governance hive can tell you a story from within the last week or two about needing to pull some rather seriously dusty documents to allow a present-day transaction to go through without incident.
Many data will, indeed, be of no interest at all, or the same historical interest that neolithic refuse dumps are; but data in the nontrivial-number-of-decades range are still live in more than a few contexts.
I use Github Flavored Markdown. Thousands of years in the future, archaeologists will no doubt work furiously to decode my etchings upon a stone tablet, which will read: "# IF YOU CAN READ THIS YOU'RE A GEEK #" .
XML doesn't magically solve everything in this regard. If there's no good documentation for the format, it's unlikely you'll be able to display everything exactly as intended. Likewise, if the format is hideously complex (see: Microsoft Office Open XML) or there's bugs in the de-facto implementation, it's going to be tricky to reverse engineer.
I'd also point out that MS Office spits out compressed XML. I believe it's based on ZIP, which is very well documented, but that's yet another hurdle to cross. And then you have to deal with the binary format of the XML itself -- ASCII, UTF8, etc.
There's no -1 for "I don't get it."
For open source. Save your files in open and/or openly defined, standardized formats and there will always be software that can deal with it.
But I guess it's difficult for people to hear you explain that to them with their head up their ass.
I would solve this by installing a Windows XP VM with a copy of Office XP. Now that I solved Google's hard problem they must now see I am qualified to work there. Google is on a FUD rampage of which the likes I haven't seen since the great Microsoft FUD storms.
Doesn't he know about the magic of the cloud
that Apple, Photobucket, Flickr, and others who cannot be trusted
any further than they can be tossed by a trebuchet
have promised us ?
Still haven't found a description of the chaaracter set in which octal 222, 223, and 224 are right single quotation mark, left double quotation mark, and right double quotation mark.
Anybody know this one?
MS removed the PowerPoint 4.0/95 converters completely with Office 2007 for Windows and later, and disabled them by default in Office 2003 SP3. And the PowerPoint 4.0 converter (but not 95) was disabled by default instead of fixed with MS09-017.
On the Mac, they removed then even earlier, when they ported Office to Carbon.
IMO it would be a good idea for MS to package PP4X32 and PP7X32 from PowerPoint 2003 separately, along with a utility to call the converters of course.
For a supposedly smart guy, he seems a bit silly:
He could've just downloaded MS's Powerpoint 97 viewer
I don't respond to AC's.
If not, file a bug and send in the document. The power of freedom ...
If not, file a bug and send in the document. The power of freedom ...
Even language changes over a few hundred years or so. Might not be a problem for some things like technical documents, but in terms of presenting information of cultural significance most of it still gets lost with time. Slang, flowery speech, idioms, references to current events, jokes, etc., a lot of it looses it's "zing" after around 50 years or so if not within a single generation. There may be a few notable exceptions to this (some classical works are still funny, like much of the Canterbury Tales), but even then you sometimes need a dictionary or thesaurus to get the jokes or at least the general gist of them.
I remember over two decades ago there was talk of making data objects, that is data that new how to present an object interface to get at its information. Data self contain its own reader in some ubiquitous language. But wait, we never got a ubiquitous language. Perhaps javascript today? But if you want to solve this problem then this is how to solve it. Or perhaps you could just package a converter to convert format XYZ to BSON as being good enough or at least better than today's breakage.
One thing that really burns me is having my information that I created / entered / caused to be locked up in some proprietary opaque format, especially if owned by one and only one app.
Some are glass plate Daguerreotypes. Somehow, I am not too confident that my digital pictures will be legible 150 years from now, unless I make a good quality print on archival paper. Digital files are too easily corrupted and made totally useless. Media formats will change. 8" floppies anyone?
"Do the Right Thing. It will gratify some people and astound the rest." - Mark Twain
As long as the decription of the file format is preserved -- and this can still be done with paper documents -- then we can simply translate or convert the old information into new forms or formats. Nothing ever need be lost.
The concern is only illusory.
We're still able to restore cars from the 80s and earlier as the cars were fully mechanical or hydraulic. No computers.
Fast forward to 20yrs from now, nobody's going to be carrying the computer boards for a 2004 Toyota Pruis or a 2013 Tesla.
However, you'll still be able to restore your grandfather's '57 Chevy...
I presented a solution to this long-standing problem last year to the Denver HTML5 Meetup.
Code should never be separated from data. This is possible with HTML5, JavaScript, and open source.
In the presentation, I steal and repurpose Hofstadter's analogy of DNA to an LP vinyl record, which is an information bearer, but useless without its information retriever (the record player). Like the cell of an animal, which contains both DNA and the means to "play" it, I ask why not the same with software?
My maxim is: data should always carry the code with it to play itself. It was inspired from the field I've spent 50% of my career in: non-destructive testing where, for example, X-Rays and ultrasounds are performed on safety-critical industrial parts with 50-year service lives. If one of those parts fails and kills someone, you're going to want to go back into the old data and find the earliest indication of the flaw or fault and reinspect every other part in the world like it that is still in service. And maybe you need to go back 50 years. Under such a context, not providing the code with the data could be considered an act of gross neglect.
In my presentation, I use the 1990's era trick of embedding XSL into an XML file, with the addition of the XSL now being able to use HTML5/JavaScript. Sadly, I've only gotten it work with Firefox -- the other browsers consider it a security violation.
https://en.wikipedia.org/wiki/Windows-1250
Professional Wild-Eyed Visionary
Yields 4 results in Ubuntu. You can search reputable open source archives on the web, too.
How deep are your pockets?
*IBM Consulting*
Um, really???
some said use Latex or VIM/Notepad plain ASCII
*spoilers*
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
I've been part of archival problem planning. We went with DVD. now I am not there, I suspect they are thinking DVD sucks and are moving "forward" when the DVD was more than good enough and those plastic discs will last a century. mpeg-2 files will have open source decoders. Now physical readers will still be a problem... the only solution is to wait as long as possible and then switch to the next long lasting format - but not necessarily the newest one at that time. (which is why moving to blueray is a waste of money.)
The biggest problem with other formats is the FORMAT; even with something like open office documents, the ODF format will have revisions and new features added and tweaks to the format. version 2, 3 etc. The features and changes that promote the creation of more and more formats is the biggest problem. Just like my above DVD video problem- if you go beyond your needs then you are complicating things with more and more formats.
TEXT? sucks. we need WORD! Word 1.0? the app sucks... we need WORD 20! (and all versions in between to migrate the old docs...plus labor to deal with conversion issues...)
Perhaps we need ARCHIVAL formats; like PDF, which has done besides the stupid additions Adobe has been making to it. Or just TEXT export... a less bloated output only format without the feature BS problems.
Thankfully, email remains the same... sort of. although storage of the emails differs greatly; if you want to archive emails you need to pick a close-to-the-source method (and simple storage filesystem-- good luck reading that NTFS formatted disk image in 30 years.)
Democracy Now! - uncensored, anti-establishment news
Use it. We're way too obsessed with saving everything. If it's not worth paper and ink, we won't miss it in 100 years.
Seriously, why would Vincent Cerf not blame Microsoft? They have an extremely poor track record with backwards compatibility, and I don't think they even know what forwards compatibility is. If you design the data formats correctly then you can keep things usable for decades (or centuries). Guess what, twenty year old TeX documents still work, and yet Word X won't work with Word X-2. I've pulled runoff documents off of 70's versions of Unix that can still be printed. That says to me that one can deal with compatibility issues.
This is all intentional on Microsoft's part too. They make money when customers buy new copies of software, so it is in their best financial interests to make sure that customers have significant pressure to upgrade. I remember the solution to an acknowledged bug for Word 97 was to make sure that everyone who was going to read your document had the appropriate Word 97 plug in in their older version of Word. I completely blame Microsoft here.
This is not that hard a problem, IF the company pays attention to it and gives it even a small amount of priority.
This problem isn't new to anyone. If it's new to you, then you need to get involved in the digital preservation movement.
http://en.wikipedia.org/wiki/Digital_obsolescence
Kriston
Vint, that's bullshit and you know it. It's nothing more than preserving syntaxes, grammar, file formats. That's not hard, and it only requires someone to create a format conversion ONCE to solve the problem at each stage of the evolution.
The real problem here is proprietary non-public formats and structures. When the structure of data has been a closely guarded secret and requires reverse engineering that may not even yield a perfect result, THAT is hard.
No! Fail! You don't get it!
1) Code is data
2) Code is data that is especially hard to interpret
3) One of the main reasons of all this mess ia that in all those proprietary formats, data is intermixed with code, and the whole mess is very hard to parse.
Data should be kept completely isolated, as far away from code as possible. That way, if you cannot interpret the code any more, you will still be able to analyze and parse the data. You know, it is not that hard to construct a record player.
AccountKiller
My first Latex publications from 20 years back and all my human-readable ASCII scientific data still be read and used without any problem. Human-readable file
formats in the UNIX tradition completely solve this problem.
This problem is only hard if the people making the data formats are either stupid or do not want their formats to be easily accessible to other applications, as Microsoft does. Of course, others are creating just as fundamentally broken formats for either of the same reasons.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Guys this is NOT Microsoft specific. This has come up with NASA and old telmentry stored on tape and programs stored on punch cards.
The problem is that to store documents for long periods of time you have two methods
1) photograph on microphish
2) print on high quality low acid paper.
For pictures you have one method only and that is to get your photograph put on a negative.
That is it
This is because their is no official electronic backup standard that has been proven to last. None, zippo, goose egg.
Dont complain about Microsoft, or any other company, We ( the public has not demanded), permance for our electronic records, so we have none.
So if you want your data to stick around, get it printed.
I have habit of telling people what they do not what to hear, and people do not really want to hear this. If you do not make you data permant, in the methods which I have specified, it will be gone very quickly
Chip
Try telling a Syrian villager that you are very worried about not being able to open 1997 Powerpoint presentations 100 years from now.
You can get emulators for just about every machine you can imagine: PDP-10, PDP-11, DOS, Atari, Amiga, C64, microcontroller, etc. You can get hardware emulators with FPGAs if you like. Almost any important format is documented or has been reverse engineered. Yes, you can easily read 1997 PowerPoint files, even if his weird choice of Office on Mac can't. And that's only with current technology. Give it a few decades and all that can happen behind the scenes and computers will just automatically perform even the most complicated data conversions behind the scenes. "Computer, scan the 1997 floppy and put the data on screen."
Who hurt you? :-(
Sand's overrated... it's just tiny little rocks.
Backward compatibility is not a hard problem, Vint Cerf just isn't very good at it as evidenced by the IPv6 fiasco.
When all you have is a hammer, every problem starts to look like a thumb.
What's he doing keeping stuff in MS apps for? Then when they don't work 5 years later he's all like OMG THE NET WILL BREAK.
Idiot. He knows better. Or should.
Need Mercedes parts ?
Sure I am sometimes saddened at the thought of the video games of my youth being lost forever, but even if they weren't it wouldn't recapture the joy I felt upon encountering them at the time. Do you think you are more important than that? Think of the current year and then start going back a decade at a time and name one person you know of from that time. How long before you run out of people you know personally? Before you run out of people you have even heard of? I bet most people can't even make it a century. Millions of men fought in the world wars, many of their stories are still recorded. How many people bother to look at even one? My grandfather recounted a story of seeing the first automobiles in his town, how many people even think of a time when they didn't exist, or the time when they were new to the world? Precious few I reckon.
If you want to worry about what history will think of THIS time, perhaps you should be a more careful custodian of previous ones.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I have simulation programs trapped in Working Model for Mac format. I have 3D animation projects trapped in Softimage 3D for Windows NT. Neither is easily convertible to anything else. (Worse, they're on DAT tapes.)
Images, video, audio, and text documents are easy to convert because there are modern formats that directly correspond to them. But some things don't translate well.
Or we could all use The National Archives Digital Preservation tool called Xena which converts proprietary non-public formats to open source formats see https://sourceforge.net/projects/dpsp/?source=directory - it is how the Archive are dealing with the problem.
In my presentation, you'll see that the strategy of embedding XSL in an XML file has the code in the top half and the data in the bottom half, clearly delineated. They are easily separable. But by having them in a single file, they will not get separated by someone copying them.
Again a reason that closed formats for storage of data should not be legal.
Only way to ensure users the right to there own data and histrorians access to old data is to store data in open formats.
Wether software should be "free" or not can in my oppinion be discussed, but I find it hard to understand that the world can not see that the use of closed formats for storage is similar to encrypting the users data without asking.
Give us the freedom of plain paper back !
Old samplers are rather a victim of that. The hardware is often fine and can still crank out some awesome sounds, but they are often diskette based and storage technology has moved on hundreds of times faster than synthesizor technology.
The Ensoniq scene has almost abandoned the EPS series because they used double-density drives and DD 3½" floppies haven't been made for years - and HD floppies aren't reliable in DD drives. Nowadays even HD diskettes are losing their stored bits. *All* the people keen to keep the ASR-10 alive have shifted to SCSI solutions because floppies are just not reliable anymore.
Wade.
I guess that justifies the stash of all those old computers I'm hording...
Bullshit. You're merely enjoying the consequences of voluntary DRM. If you don't care about your data you'll lose it, just like those pictures you used to draw in crayon that hung on the fridge. If they ARE important then you can keep them and use the data indefinitely.
I still run the GWBASIC programs, and even 16 bit x86 DOS code I wrote as a child to edit images and color palettes via keyboard in (M)CGA video modes which BIOS still emulates, and OSs like Free DOS can still make use of (Watercolor isn't extinct because Oil paint exists, Platforms are to game makers what Canvas and Paint is to Painters). Hell even my very 1st 386 bootloader can be written to an MBR and booted on a brand new x86-64 system (disable Security Theater Boot). This is NATIVE support. With an emulator, I can even run programs I wrote for my dad's old PDP-8 -- A completely different architecture... 12 bit bytes!. I cared enough about the little dinky things I did as a kid to make sure they were preserved across every major storage format change. I can still read the comments my dad thankfully added to some of my code all those years ago -- a valuable lesson indeed; My kids find gramps' snark quite funny. That's several generations of data compatibly for my family's directory tree...
It's not useful to bitch about compatibility by citing programs created by companies that willfully suck at compatibility. MS DOS requires an emulator, but DR DOS can still be installed on my new systems. Though it doesn't recognize my sound card I can still program a driver for it though -- just like I did to get my old custom IR transceiver devices to control my new home theater setup (lights, screen, volume, etc) via my aging Osborne-1's serial port.... It's a functional "conversation-piece" to hear that familiar 5.25" drive access as the signal tables are loaded for TV instead of the stereo. That same data format which has been in use now for decades and even works on new hardware w/ Linux via LIRC now -- thanks to the kids... old Ozy will give out someday. Thats a future proof protocol compatibility across several generations of hardware, simultaneously.
There is NOTHING stopping me from converting the palettes and images created in my PAL_EDIT.COM into a GIMP .PAL / indexed .TGA or .TIFF, or .PNG, etc. I can (and do) frequently convert files in both directions, to go from GIMP to PAL_EDIT.COM to get new images and new "mods" into my really old game "engines". That's the thing about open formats and programs with source code available. Remember the push back against non-textual network protocols and even in email?) We won this battle already. I wasn't aware anyone had stopped fighting it. This page is written in TEXT. It's JavaScript and HTML... FFS: The 1st damn web page on the Internet still renders.
The authors can ALWAYS create data converters if they want, the problem is giving up that right and not demanding source code access. If my own data formats can survive the transition from kid to teen to adult and even be shared and passed on to my own kids (who love "real" retro games, BTW, such hipsters), then surely multi-billion dollar companies can do it too. Or, are you implying that despite all that money they are more inept than I can even imagine? If so, that's a pretty big dig at Microsoft there Vint... Bravo. Kind of makes me wonder WTF you're paying them for, eh?
I expect this kind of BS from you now Vint. I mean, you don't even realize the usefulness of your own contributions to mankind, Saying that the Internet is not a human right. Look up human: A characteristic of humans; A human being. It is a human right. It's the right to bear technology. That's what the 2nd amendment is really about, they just worded it wrong, they're imperfect. Just because some old farts can't understand the future the way we do now, doesn't make new technology NOT a human right. The Internet is the equivalent of access to spee
Vint Cerf should learn when to shut the fuck up.
By definition they're NOT Daguerreotypes.
Daguerreotypes are single, positive images on silvered copper plates. Anything on a glass plate from the late 1850s up to the 1880s will be a negative image produced by the "wet plate" process (unless its a positive "lantern slide", which is just a positive copy of a glass negative!). Wet plate was a development of the original negative process, the Calotype, which itself in many ways was superior to the contemporaneous and inflexible Daguerreotype.
Of course, 8" floppies are dangerously modern - I see your floppies and raise Mag Tape and Punched media (tape and card). :-)
There are free/libre software projects with great records in opening up interoperability and keeping backwards compatibility. On the other hand, fashions among proprietary s/w makers seem to change, and about now there is a tendency to stop worrying about existing users and just abandon past formats.
Any number of folk will say things like "shouldn't be difficult at all to reverse engineer", but that doesn't make anything happen. On the other hand, there are plenty of apostles of the latest version ready to heap abuse on anyone bold enough to ask for backwards compatibility, and that attitude is a big source of problems.
Longterm readability is helped when software developers take the trouble to maintain backwards compatibility across different versions of popular tools and across competing applications that have broadly similar uses. That doesn't directly help with hardware barriers, but at least it would be good if the number of needless software barriers is kept down.
[...] Most of these things will be readable just as long as the applications that created them are around, but not longer.
[...]
Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.
"I'm not blaming Microsoft,' said Cerf,
Let's call a spade a spade. It's 100% a problem due to opaque binary formats. Had the document been written in (clean) HTML or plain text, it would have stayed usable without problems.
a thief for example is, recently i was looking for an Owner's Manual for a Suzuki motorcycle in PDF form, the bike is a few years old so Suzuki does not keep it and the only website that has it downloadable wants me to both sigh up for an account with them and wants money for the download, and they did not make the owners manual so they have no rights to withhold that information either intellectually or materialistically, so i refused to sign up on their lame website and refuse to give them money and i will keep searching for a free copy
Politics is Treachery, Religion is Brainwashing
Yes, and by god, future historians will care about YOUR spreadsheets and YOUR websites! Egotistical jackass. No one gives a shit about 99.999999% of humanity after they're gone.
I guess that makes sense if your data is so complicated that it actually needs XML, but I would still say that for simple data that can be stored in a simple to parse format like csv or tsv, it is better to keep it separate.
AccountKiller
To a historian any information is good. Imagine how happy a historian of the future will be to be able run a map reduce on petabytes of documents from now. No information is not of interest to a historian if it can provide insight. Rummy summed it up well with his known unknowns and unknown unknowns. All data can tell a story, be nice to make it sing.
NASA has this same problem. They are unable to interpret most of the data from the Viking missions in the 70s. They have tapes of the data but they lost the documentation on how to interpret the 1s and 0s.
I'm just a realist without the delusion that somehow someone in the future will care deeply about my digital feces.
Another pointless story from a pointless old man... Wake me up when Slashdot posts a good article.
captcha: feebler
I was wondering what professions I should keep tags on, just in case we have that talk about careers with my son-to-be... Being an expert on long-gone and "lost" data formats and collecting their respective tools just seems like a future relic (Oh, and we already keep terabytes of all those myriads of one-time-use programs and utilities we downloaded from 5 years ago, right?)
The best safeguard is the abandonment of all existing proprietary formats to freedom (so anybody can write conversion software) and the proliferation of open formats on an ongoing basis.
"I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert
*** Yes, and by god, future historians will care about YOUR spreadsheets and YOUR websites! ***
Actually they do. Historians are still trying to (painstakingly) find out how people in the Neolithic lived. So yes, having access to YOUR spreadsheets and YOUR websites will be very valuable for historians in say 3000 years.
*** Egotistical jackass. No one gives a shit about 99.999999% of humanity after they're gone. ***
Projection? That YOU don't give a shit about humanity, doesn't mean nobody else does.
# touch universe # chmod +rwx universe #
Microsoft from day one has been making its data incompatible with everything else. It was a lean and hungry company back then (it is fat and hungry now), and it was compatible with every existing thing on the import side and incompatible with everything on export. It fought a mean campaign against Samba. It played dirty with Netscape and the web standards. Bugs in IE worked around in IIS and vice versa to make it very very hard to stick to a standard.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I'm having a similar problem. My father had started writing a book on Macintosh 512k using Macwrite. He passed away a decade ago, but, recently I uncovered a box of floppies.
Needless to say, even reading a floppy on a modern Macintosh is pretty much impossible, and even then, the older Mac documents had a data and resource fork, and recovering data from those early formats is pretty hairy.
Some of the data can be recovered, but it's unlikely I'll ever be able to completely read the book he was writing -- Unless I find myself a Mac 512 with Macwrite, and then run the text through the serial port to a more modern PC.
If telephones are outlawed, then only outlaws will have telephones.
It works in the IETF, who shouldn't it work anywhere?
What can't be expressed by ASCII is anyway too complex so you shouldn't do it
Stop making Microsoft out to be the technology leaders of the world. Microsoft exists to make money for themselves.
If we can't read our own files then we can only blame ourselves for choosing file formats for which we do not have the documentation for the structure.
Same way the TRS-80 fans do.... Take an old drive with an adapter and read it off once and transfer it to new media.
Ira Goldlang's site (trs-80.com) has TONS of old software done that way.
Your thin skin doesn't make me a troll
His position is that the data format is what will prevent data recovery - I postulate that as long as there are bored nerds that perceive a challenge, the old can and will be reverse engineered.
Your thin skin doesn't make me a troll
What WILL cause all of our digital data to finally be lost is media degradation. Every piece of data ever created will eventually be lost because the media it's on finally fails and someone forgot to copy it before hand. (That or the sun engulfs the Earth before we finally figure out that we have to get off this planet)
Your thin skin doesn't make me a troll
I recently encountered some bit of data that was encoded in a proprietary format but didn't really need to be. Nothing about the data required the extra features available from the proprietary format.
It turned out that a file from proprietary app X generated a file that couldn't be properly displayed on other copies of the same app without first being converted to a non-proprietary format.
Some people do really perverse things to avoid giving you data in a reasonable format.
A Pirate and a Puritan look the same on a balance sheet.
The Open Document Format(tm) was intended to ensure that documents have longevity. They looked at what companies like microsoft were doing, with every version 'incompatible' with prior versions. (Its not a random thing either, microsoft goes out of its way to make *certain* that new versions are incompatible with old, so that people are *forced* to upgrade. When the Open Document Format(tm) was created, users such as the Vatican Library who have a large number of documents over 1000 years old, a good number of documents over 1500 years old, a smaller number of documents over 2000 years old, and less than two dozen shelves full of documents more than 2500 years old. Being able to read old data is important to them. Being able to read old data is an abomination to microsoft. Hence ODF. But microsoft tried to kill ODF with their OOXML which has proprietary undocumented containers within the XML, which makes reading anything older than 1 version impossible. Thanks again microsoft.
In the future decompiling programs will be easier. It's already possible, although tedious. Having the source code to a program would make backward compatability, modification and porting to new platforms possible where it wasn't before.
A theoretical tool for decompiling old console games will be known as a triplicator (redundant duplicator). An emulator, debugger, compiler, video recorder and more combined together. Just playing a game would automatically generate a wide variety of metadata, generic labels and identify game data formats. Most of a game is tied to the interface, so being able to glance at snapshots, gamepad input, routine parameters and more while looking at the assembly language would be very useful. And triplicator could generate some pseudo code (or C with inline assembly, or any higher level language), which would be easier to work from than raw assembly language. Recompiling sections on the fly (combined with saved states to avoid lock-ups), to get feedback for identifying variable names and what routines are for. After a certain amount gets decompiled, it can be recompiled with SDL and run natively instead of inside an emulator (playthrough of the game is recorded, so triplicator could use that as a script to playtest the SDL port automatically, finding any differences). The end result is perfect source code, a native port, extracted data and hundreds of megabytes of documentation. All created in a fraction of the amount of time it would take to use separate tools (weeks instead of years, probably).
So, the internet never forgets about that time you got drunk and posted stupid photos, but it forgets everything else? God damn.
If Google themselves gave more than 3 months notice. Years from now? Might not be readable 3 months from now!
Put everything in the cloud Google says! And you got 3 months to get it out...
If you have a pile of data and you don't keep the tools to read it then, you are the fool. Keeping original tools should have been the smart move, or updating files that were valuable. Sorry, I have old software and hardware to cover my past, why don't more people doi this? Sounds like a niche market for someone with old hardware or emulators.
Great. Now make your solution continue to work 20 years from now when the Windows XP activation service ceases to exist, which is what TFA is actually about.
"...software lifetime is only like 7 or fewer years..." Do you have a source for this, or is this your guess?
I'm not asking to disagree, quite the opposite: for seven years (coincidence) now, I've been arguing for storing grammar data in an XML format precisely because storing it in the programming language of a particular grammar parser means it will be unuseable in the not-so-distant future. While I have anecdotes (I once wrote a parser using three programming languages, and all three of them became obsolete within a year or two), I would love to have a study to cite.
And I have email from the 1990s that I canNOT read today. It's called Lotus cc:Mail. (I could read it if I was willing to pay.)
"Digital data lasts forever -- or five years, whichever comes first."
--Jeff Rothenberg, 1997
Office file formats, no matter what office suite or version, were never meant to be archival formats. They were more like savegames, little "memory dumps" allowing you to continue the game where you left off, no more no less. In fact some early systems even just dumped the memory onto diskette. (i.e. the Canon Cat) That's why such formats have non-portable options like OLE objects which are nearly impossible to open on another computer. If such a file ever moves from one computer to another you are screwed.
If you want to have something you want to be able to read in a few years or send to someone else, you must use archival formats. Those formats must be as trivially simple as possible. Possible candidates for archiving "printed" documents are TIFF (bitmap format, supports multiple pages) and archival PDF. Be sure to include a dump of the text in a separate text file so it's trivial to search. You don't need to change things in your archive. If you want a newer version re-create it again.
Never ever ever store data in file formats you cannot read yourself. Complex (binary) file formats are acceptable only as long as they don't have to be backed up. That's why SQL-Servers tend to store their dumps as simple text files.
Windows loader, or an army of lawyers.
Ah, a news bulletin from the land of "Where the Hell have you been for the last 30 years, asleep?"
We don't know what this means either.... proprietary format... encrypted... and it cost a lot to send it.... alas it never arrived.
AOAKN HVPKD FNFJU YIDDC
RQXSR DJHFP GoVFN MIAPX
PABUZ WYYNP CMPNW HJRZH
NLXKG MENEK ONOIB AREEQ
UAOTA RBQRH DJoFM TPZEH
LKXGH RGGHT JRZCQ FNKTQ
KLDTS GQIRU AOAKN 27 1525/6
NURP 40 TW 194
NURP 37 DK 76
lib 1625
ToR 1522 copies sent 2
signed W. Stot, S(j/g)T.