Retro Machines Key to Rescuing Old Data
SimilarityEngine writes "New Scientist report on the virtues of old kit. From the article:
'Today's stylish PCs may perform billions of calculations a second and store tens of billions of bytes of data, but for many, they have got nothing on the 32, 48 or 64-kilobyte machines that were the giants of the early 1980s.
This renewed interest in old-school computing is more than just a trip down memory-chip lane. Early computers are a part of our technological heritage, and also offer a unique perspective on how today's machines work. And within growing collections of original computers and home-made replicas, and the anecdote-filled web pages and blogs devoted to them, lies the equipment and expertise that will one day help unlock our past by reading countless computer files stored in outmoded formats.'"
Seems to be a growing interest in the Commodore community. On irc.eskimo.com #c64friends channel, there's a bunch of people developing software and hardware for the C64 and 128. There's one guy even working entirely in the CP/M mode of the 128. Since I had to pack my 128 system up to move, I haven't done anything with it lately, but after the new computer room is setup in the house, I'll be back in full swing. 16MHz 65c02 processor, 16MB RAM, 2GB HDD... it's not your father's Commodore.
-- Liberalism is a mental disorder.
I just gave a speech to a bunch of progressive groups in Kentucky Saturday that included a screed on data loss. Twenty two years after starting a lawsuit on fair taxation and coal reserves, for example, the suit finally made it through the courts. My question was: how good a job are we doing preserving the records and data for those cases that take 30 or 50 years, like tobacco or asbestos. I'm looking ahead to the lawsuits on global warming.
If you want to see the talk:
http://www.hollowground.net/tecactv
wh
You'd be amazed at what we've got running under Hercules...there's a lot of computing history being lost because people threw away old round tapes, thinking "Oh, we'll never run THAT again". A guy used an emulator to rescue old census data from Africa (was the story reported here? It wasn't that long ago), and that kind of thing will be only seen more as time goes on.
If you know of old IBM mainframe software on tape, drop me a note; chances are I can recover it. I've got 9-track and 3480 cartridge tape drives on a PC just for that purpose.
Disinfect the GNU General Public Virus!
I regularly read 9-track tapes written in the late 60s.
The tapes I have the most problems with are actually from about 1984-1987 or so...Memorex and BASF switched to a binder (the stuff that keeps the oxide on the tape) in those years that tends to migrate to the surface, making the tape stick to the read/write head and preventing it from reading correctly. There are ways of correcting the problem long enough to read the data, but I haven't been able to try any of them (the best, supposedly, is to run the tape through the same process used to freeze-dry food commercially).
Disinfect the GNU General Public Virus!
The universal format for documentation, I believe, is the printed hard-copy document. Think of it this way: If we received the Rosetta Stone, or bits of the Torah or Quran, on some electronic media, would we have been able to get the content off - especially if it was encrypted somehow?
I think the only universal format is the printed page, which requires no "special equipment" to read (it might not be interpretable, but it can easily be recognised as a document) whereas a computer-recorded pile of numbers, while perhaps recognisable has having meaningful content, will probably, in the future, have no context in which to extract its meaning. Consider this: you receive some piece of hardware in the future which you realise stores binary data. Is it numbers? Is it a program? Is it sample data from atmospheric noise collection? All you know is there is binary data. All you know is there is binary data, and you don't even know if it is stored in 8-bit blocks, 16-bit blocks, 3 bit-blocks, or whatever. You don't know if it's in ASCII or some weird encoding of, say, Farsi. You might try running some statistical analysis on it to see if it's some kind of language, but against what do you compare the 'glyphs' of the numbers? When you see a stone like the rosetta stone, it's obvious what you've got; when you've got a list of numbers, there is no way to tell what it is other than a list of numbers.
This is a great danger of the digital age, in my opinion, and it is good that there is still expertise floating around about the "old" equipment. But remember, the "old" equipment is still less than a century old: what will happen in 100 more years? 400? I have this nagging concern that data integrity of digital media will not last the thousands of years that printed material lasted for future generations. I think this is why I really don't like the idea of digitising the libraries, or even digitising photography.
Definitely something to consider for all those folks concerned with "the best data format" and if .DOC or .PDF or XML or whatever is better.
The best format is one that contains enough information to clue the interpreters how to interpret it rather than relying on something else. Right now, all digital documents are merely a string of numbers, and a string of numbers is not sufficient to contain meaning to interpret itself - those numbers rely on some interpreter to receive meaning (as an excersise to prove this, take any file on your computer and look at it in a debugger - on various systems, a hex-editor, and a program that will use the contents of any file as raw image or audio data. It might not be rendered sensibly (I don't know that I'd want to listen to the "song" that, say, Firefox would be), but there is no effective way to tell if the string of numbers has meaning by using trial and error.
A printed document unequivocally has more information than this - a schemaatic diagram is different than a picture of an apple is different than a poem... and while we may not know 'apple' or the language of the poem or have the capability to understand the diagram, we know that those things aren't, say, a random paint splatter.
So, again, while I applaud the efforts of these guys for writing down their knowledge, if they don't do it in a "universal" format, who will be around to interpret their blogs and digital records in 1000 years?
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
CD are not as durable as many think. Check this article for some wake up.
r .htm
http://www.rense.com/general52/themythofthe100yea
From the article:
"But an investigation by a Dutch personal computer magazine, PC Active, has shown that some CD-Rs are unreadable in as little as two years, because the dyes in the CD's recording layer fade."
Until they get to the patient records archives at the CDC or even a local hospital's TB clinic. Then they can learn a whole hell of a lot about how a disease used to spread and its epidemiological characteristics in a society that doesn't have "modern" medicine to control it.
I worked for Georgia's Division of Public Health in the 1990s. One of the most interesting projects I worked on was to recover data from the Medical College of Georgia's TB clinic. It was all on 9-track tape and was recorded from 1966 to 1973. The doctor who wrote the software was in his late 70s when I met him. He still understood the data encoding that he created for his clinic's dinosaur computer system and was working independently to import it all into a PC-based database. The concept of relational data was practically alien for minicomputers of the era; the way he had to encode the clinic's data to build statistical models out of it was fascinating, but it would have been lost forever if the original coder weren't still alive.
This is not my sandwich.
Tapes are relatively easy as the 64 can read most of the, the hard part is that sone disk formats are hard to come by, the Commodore PET has several different format drives, the most popular are the 4040/2031 which a Commodore 64 can read, but the 512k single sided 8050 and double sided 8250/SFD-1001 disks are another matter both using quad density drives (nowhere related to the PC HD format) and GCR encoded to increase capacity. These drives (unless you are a hardware whiz) communicate exclusively using IEEE-488 so A PET/CBM or B128 are best employed.
I myself use the PC-to-pet interface the C2N232 with related software to get the files fron the PET to the PC, from there it's a matter of some home spun chipmunk BASIC programs to get the files tidyed up and in ASCII.
To be consistently successful at it you have to not only have the tools but knowledge of the various disk and file formats and system quirks that you are dealing with, which will help you get around the unexpected.
I've had requests to help convert 64 related software, but have passed on that as I am not into real time programming work (some sort of lighting program on a cartridge) but there are others up to that challenge.
Same goes for other platforms like old 400k Mac disks which use a varialble speed drive and can only be read IIRC on a 68k mac using System 6 or lower. There are also the protected disks or those that were recorded with utilities to improve speed or capacity (which makes the disks/tapes differ from any knwn standard format). Not everything can be done with an emulator.
"Enjoy what you're doing! If it becomes drudgery, you're doing it wrong!" - Jim Butterfield