Software Archaeology
Plug1 writes "Salon (day pass needed) has an article about preserving software for historical purposes. It discusses source code archiving, and the effect the DMCA is having on attempts to catalog and analyze legacy code. It will be a shame if in the future a wealth of information is locked away because knoweldge of the underlying technology is lost."
10 print "Hello World" 20 goto 10 Those were the days....
Omnis amans amens
That the DMCA DOES NOT APPLY outside the USA. However, hardware Digital Restriction Management DOES.
I really dont want strong crypto keeping out of stuff that I OWN, or My CONTENT.
I'td be a neat experiemnt to create a Linux driver that emulates TCPA chips so that stupid software thinks you're auth'ed.
This would explain the pyramids, if in the past IP laws of ancient cultures prevented sharing of ideas.
Who could ever forget the awesome software company Central Point Software? Their PC Tools and famous Copy2PC were high quality, and very useful products. Anyone that was anybody had Copy2PC, a program that could copy nearly ANY copy protected floppy disk. They even came out with a floppy controller that did the same thing.
July 30, 2003 | For Grady Booch, the nightmare goes something like this: Deep in the future, a team of archaeologists stumble onto a rare cache of 20th century art, a major assortment of works thought lost to the ravages of time. http://cm.mps.salon.com/mps/desk/nav/salonlogo.gif http://cm.mps.salon.com/mps/desk/nav/salonlogo.gif
The only problem, of course, is that they don't know it. All the images are recorded in an obsolete digital format, JPEG, and nobody knows how to unscramble the data. As a result, the hard disk containing said artwork spends its days not in a museum but as a coffee coaster in some college professor's crowded office.
"It might seem silly now, but put yourself 1,000 years in the future," says Booch, chief scientist at IBM's Rational Software subsidiary. "It's not too hard to imagine."
In an industry where one man's clever C code is another man's Linear B, Booch already knows the frustration of playing software archaeologist. As co-developer of the Universal Modeling Language (UML), a mid-1990s effort to create a common "blueprint" notation for object-oriented software programs, he's spent the last 10 years laboring to spare future programmers the same torment.
It's an uphill battle on a hill that is only growing steeper. With new programs replacing old and no major company or institution playing the central role of source-code archivist, the amount of software history currently circling the memory hole is scarily large. And even if there were a central institution, recent changes to the copyright code have made the transfer of source code from old media to new forms of storage a dicey prospect, legally. Add it all up, and you have the ideal makings for what some are already calling the "digital dark age."
"Things are going to be lost not because people don't want to save them or because the original creators don't want to save them, but because they can't save them," says Brewster Kahle, founder of the Internet Archive, an institution that has lobbied for a safe harbor within the Digital Millennium Copyright Act to shield institutions looking to archive source code.
For Booch, the barriers to software preservation aren't so much legal as educational. Most developers have come to accept the evolvable nature of software programs. What is lacking is the ability to examine static source-code snapshots with a scholarly, comparative eye. In the interest of encouraging that skill, Booch this fall will lead a seminar on software archaeology and preservation at the newly reopened Computer History Museum in Mountain View, Calif.
"Our industry has had a major effect in changing the world," says Booch, talking over the phone from his Denver, Colo., office. "It would be great if we could preserve the artifacts and interview the architects while they're still alive."
Booch isn't alone. Now that the hysteria surrounding Y2K has faded, developers are free to worry about legacy code again. One increasingly common worry is what to do with it? For every modern offshoot of DOS/Windows, Unix and Macintosh OS evolving with the marketplace, a dozen ghost programs lurk inside yellowed engineering pads, punch-card stacks and slowly degaussing magnetic memories. Even if programmers could get their hands on these programs and find a way to preserve and update their contents, a new question emerges: How do you qualitatively analyze those contents on a historical basis?
"It's funny," says Dave Thomas, a Dallas software consultant and co-author, with Andrew Hunt, of "The Pragmatic Programmer," a 1999 book on software design methods. "Colleges spend a lot of time teaching people how to write code, but very few teach them how to read code. When you think about it, we programmers spend most of our time reading code, not writing code."
To help fill the gap, Thomas served as cohost of the 2001 Software Archaeology: Understanding Large Systems workshop, hosted by Object Oriented Programming,
If you're going to preserve software, doesn't it make sense to preserve the hardware to run it on as well? Emulation is less than perfect.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
I can hardly see DOS or the like being useful in the future, can you?
I have over 70 freaks, do you?
If the problem is that knoweldge of the underlying foundations of technology is being lost it is because of the concept of abstraction, of which .Net is the latest and greatest incarnation.
It really all started when some engineers decided that machine code was too hard and invented assembler. Nowadays it's not even necessary to know what a bit is or how an ALU works to make programs. Just point and click and you've got yourself a brand spanking new database app courtesy of VB.
No one ought to knock VB because it really is the best tool for what it does, but it also lowers the barrier to entry for would-be programmers. This can only lead to worse programs.
The most fundamental concept in computer science is logic, not algorithms (or worse programming languages). If a 'programmer' hasn't written a program in a low level language like C or assembler, the hiring manager should beware. Without hands-on experience with the fundamentals of computer science that person is lacking at the most basic level, regardless of whether he knows 1 language or 50 languages. He is handicapped.
It's a good thing to abstract, but it's also important to remember and study the bases of our science.
Indiana Jones and the Raiders of the Lost Archive
Unless I am mistaken Salon, like most websites trying to make some money, is having financial problems.
... without their permission isn't that plagiarism?
They changed to a registration/fee based model, but allowed 1 day passes for whatever reason.
Nothing can hurt them more than being slashdotted by a bunch of people using a day pass.
someone has already copied the contents of the article into a comment which is good because it saves them bandwidth, but
This is why things like the DMCA and DRM come about - people thoughtlessly violating other people's copyrights/etc, and/or taking their services for granted.
I'm no better than anyone else, I do the same thing.
I guess my point is: either support the people who provide services you enjoy (music, video, news, web content, porn, whatever), or quit complaining when they finally start defending themselves.
no comment
10 print "Hello World"
20 beep
30 goto 10
Even years ago I was much more 1337 than yu0 !
This probably falls under the category of fair use.
If it doesn't then there is still the matter of the government (the US at least) being able to do whatever it pleases with copywrited material. In this case the government's authority to copy what it wants is a good thing.
The Library of Congress is already making archival coppies of copywrited music and it is going to continue this dispite any hypothetical protestations of the RIAA. Why, because it is deemed neccessary for the preservation of culture. It will ultimately be the governement who will have the authority to do the kinds of backup that is neccessary to preserve our programming heritage.
It is our job as citizens to open the government's eyes to the need to copy this code before the technology that will allow us to do so becomes obsolete and otherwise unusable. Like any other technology programming will continue to advance but it is important to remember simpler the roots of the technology in order to provide the kind of perspective that lets us know where we've been and where we might be going.
It's the burning of the library of Alexandria all over again. This time, on the fires of corporate profit. Just remember, as we slide into another dark age, you're the ones that used Microsoft Office!
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
So, I should be saving the 200 lbs of DEC VMS manuals, Our old VAX, all the tapes, and keep our TU-85 tape drive under service contract? How much is this all worth. Do you have any idea how much it costs to keep that hardware running? If you want to keep the code, what is the point if you don't have hardware to run it on, unless you're going to develop some emulator. Don't get me wrong I think it's a horrible shame that all those hours of engineering to develop the hardware and software is finally being trashed. There are some amazingly great ideas that were used to make that stuff. But at what cost do you preserve it?
CD's degrade over time, their lifetime is estimated to be 100 years maximum. CD-R's can become unusable after a couple of days of being exposed to mountain sun, and will probably not last more than 15 years. In the meantime, the computer equipment will develop to a point where CD's are not needed any more, because there is better technology available. So it will become necessary to store the devices that were used to read them (i.e. whole computers). But these devices are partly made of stuff that decomposes over time, like rubber in bearings etc. Conserving data is not as easy as it seems. I wonder whether it'd be more efficient to print out the source codes on acid-free paper and store them like books - or perhaps microfiches - in a number of locations around the world.
where's all that Karma?
It'll just beget a new academic field: Nerdiology.
Consider conferences on Geek Culture someday, where Prof. Bipperton Fusslebeak delivers a sad, acedmic commentary on contemporary culture:
"An Analysis of the Correlation between Increased Use of Open Source Software, and Slashdot Posts Centered Around Deviant Sexual Behaviors in the Post-.Com Era".
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
"The only problem, of course, is that they don't know it. All the images are recorded in an obsolete digital format, JPEG, and nobody knows how to unscramble the data."
I'm doing my part to make sure that the porn images of the Internet don't meet this similar fate. I have recorded my voice describing each of the images in my collection, and encoded it into the open-source OGG format. Much of the recording has consisted of little more than "Mmmmmmmmmmm, yeah baby", but I think that speaks volumes.
...places like The Underdogs are so crucially important, at least on the gaming side of things. They're a truly indispensable repository of old games you can't find anywhere anymore, for Mac and PC alike.
SNACKS ARE AWESOME
I went looking for it again a couple of years ago, but it has been lost. It was written in a language which no longer exists: OPS-4. Even the original source code has disappeared. All that is left is a partial port, to another language which no longer exists (OPS-5). Here is a brief description by the author.
Looking at the source code for the partial port gives some of the feel of the game:
A number of years ago Scientific American had a article lamenting the loss of intellectual assets with the inevitable degradation of old software, documentation, media, computers, and the like. Yet the same issue had another article on changes in the canned-goods industry (the rise of new canning technologies). While the first article bitterly mourned the loss of software-related knowledge and assets, the second article made no such mention of the corresponding loss of canning-related knowledge and assets.
Why is obsolete software technology worth preserving where obsolete manufacturing technologies are not? In a 100 years, will we really need access to the billions of JPEGs that were spewed out by digital cameras everywhere? I am not arguing for ignoring history (even though those that learn from history are also doomed to repeat it), but I am wondering about the double-standard. What realms of human knowledge and invention are worth saving, and which are not?
BTW, for the record, I still have old documents and applications from my Mac 128k and I might even have a paper tape copy of a old APL program that I wrote 25 years ago. But then I am a certified packrat.
Two wrongs don't make a right, but three lefts do.
This article reminds me of a joke one of my CS professors told us (I hope I remember it right):
The year was 2015. Joe, a programmer, was getting up in years and decided he wanted to have his body frozen after he died. He made the arrangements, and when the time came, he was frozen and placed in a government facility. Time passed, and he was forgotten.
Jump ahead a few centuries... suddenly Joe finds himself conscious again! He is on a lab table surrounded by strange looking people in uniforms. Their leader, speaking through a translator, welcomes Joe back to life.
Joe is amazed! There are so many questions he wants to ask, but first he says, "Why did you bring me back to life?"
The leader answers, "Well, the year is 9999. Y10k is coming up, and your file says you know Cobol."
I was wondering how they were going to use an aged Harrison Ford in the next Indiana movie! Obviously, he will have become a "software archeologist," and thus never have to leave his cubicle.
*snaps whip*
"Fetch me another Mountain Dew, Shorty!"
If you have the source code for something then you have no cause to fear the DMCA, since you don't need to decrypt it. And if you don't have the source code, where is the value? Is there really any value in running lotus 123 for the Apple//? Perhaps if you have an Apple//, but so what? You cannot "fly over the code" from any height (as was mentioned in the article) because you don't have any code to fly over. You have an executable, and the "structure" there is quite different than looking at source code.
If you want source code for DOS, hit freedos.org and download it. It's not Microsoft's source, but so what? It does the very same job and, in many cases, it's superior to the original. Works that have value will be replicated and emulated; works thta have no value simply have no value - where is the need (or logic) in "preserving" them?
Take the Doomsday Project (in the UK) as an example. An Acorm Archimedies lazerdisc full of content relating to life in the 20th century. The problem came when they wanted to get the data off .. and couldn't easily find a compatible lazerdisc reader.
Of course, the format of the data is an issue. But if you can't get the data off the media, then the format of it isn't going to matter in the slightest.
Avantslash - View Slashdot cleanly on your mobile phone.
One of my favourite bits in 'All Tomorrow's Parties' (If memory serves - it's a while since I read Gibson) is where the computer shop keeper explains that 'real bright people' building computer systems like to buy stuff from our era. ;)
He goes on to explain that they use these 'ancient' systems to understand and gain insight into current systems, adding that nothing really changes, just gets added to (and that noone really understands the full system).
I believe Gibsons insight will be proven real, and that Software Archaeology is *essential* for the future DMCA or no DMCA.
The alternative is stagnation in the evolution of computer systems. This cannot happen, although it might in America
The part/parts of the World that don't succumb to DMCA fever will become the new tech leaders (and probably a great immigration target for us lot!)
they have a section for software where they are getting old software from the likes of Macromedia and others for preservation. havent seen any source-code listed, but its still a good service for history
the history of the world
I responded to this above once already, but because this is dear to my heart, I'll do it again. Of course Salon isn't going to care if anyone prints out a copy and tapes it to their cube wall. But if a Web site grabs the text and posts it in a place like Slashdot, that deprives us of literally thousands of readers. Many of those readers might otherwise watch and ad and grab the daypass, which is good for our financial health, and some percentage of other readers might even subscribe, which is even better for us.
Technically, it's copyright infringement, but Salon isn't going to devote resources to suing Slashdot or Slashdot readers. If we were going to go that route, we'd start with the Freerepublic assholes, who actively want us to go bankrupt and do everything they can to help us down that road. To slashdot readers, the best appeal I can make is simple.
We want to make a living at what we do, so we can keep doing it. I want to keep paying great technology writers like Rachel Chalmers and Sam Williams to do interesting stories. If we convince enough readers to watch our ads or subscribe, we'll pull off this magic trick. So basically, the way I see it, any time a Slashdot reader posts the full text of a story on Slashdot, it's a vote against our survival, which is ironic, since you wouldn't be posting the stories if you didn't think there was some merit in them, right?
Editor, Salon Business & Technology
Salon.com
If people can reverse-engineer Microsofts file formats without help, why wouldn't they be able to work out a jpeg, or and mp3?
Get your own free personal location tracker
What about CD-R's exposed to mountain dew?
Actually, that's not quite what's happening.
ISDA spiders are trolling around and seeing a ftp/web site with "video game" in the text and offering files like pacman.zip and streetfighter2.zip for download.
C&D notices are automatically being sent, none of it has to do with the DMCA, but with regular old copyright law, since the ISDA assumes the games are being put up for download.
Whatshisface (who had the big manual site and shut it down) just couldnt be bothered to explain to anyone at the ISDA what files are.
I dont think any manufacturer really gives a shit about people collecting/trading/photocopying the service and operation manuals, or even schematics for out of production machines.
I don't need no instructions to know how to rock!!!!
"It might seem silly now but put yourself 1,000 years in the future," says Booch, chief scientist at IBM's Rational Software subsidiary. "It's not too hard to imagine."
This assumes that (a) humans will still be drinking coffee 1,000 years from now, (b) we will still have college professors and (c) they will still have need of drink coasters.
I believe that 1,000 years from now we will consume our caffeine in pill form only, be schooled by robots and will obtain our liquids from intravenous bags.
Microsoft is already doing this. Each version of a new MS operating system and office product generally includes a pretty much unedited copy of the previous copy of all prior editions of the software. So they are preserving history.
Each new version, the software gets bigger and bigger and biggers. It is an archealogical wonder in itself. Another name for this coding style is called bloat. Linux has many of the same things going on.
This argument about the need to preserve prior formats has been around for quite awhile. The truth of the matter is that software is largely an evolutionary process. Most file formats build upon the past, so there is a tendency for software to naturally preserve its path.
Of course, for Grady Booch, who wants to be reconized as an intellectual giant a thousand years from now, the main question is if his name will invoke the same awe as say Euclid and Archimedes. He is, after all, one of the trinity of OO modeling approaches.
This is a good argument for mandatory source code deposit. To get a copyright on code, you should have to deposit a copy of the source with the Library of Congress. The Library of Congress has the authority to require this, but currently they only require a printout of the first ten and last ten pages, because they didn't want to store all the paper. That should change.
After all, in five years Salon.com may be gone from the web, and since neither Google nor the Internet Archive have a paid subscription, this story will be forever lost to the ages.
So kudos for reposting this valuable information to Slashdot! Without the efforts of others like you, internet surfers in generations to come might never understand the importance of, well, the efforts of others like you.
The difficulty of future generations being able to deipher our data without a guide is high but not impossible. The best example is hieroglyphics. Until the discovery of the Rosetta stone, Egyptian hieroglyphics were impossible to read. After, it was so much easier. On the other hand, there is no Rosetta stone for Mayan glyphs. Although it has taken longer to decipher, slowly the Mayan symbols are being translated. It took 100 years longer, but it is being done.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Conserving data is not as easy as it seems. I wonder whether it'd be more efficient to print out the source codes on acid-free paper and store them like books - or perhaps microfiches - in a number of locations around the world.
One modern 80GB hard disk.
80GB = 80,000,000,000 bytes = 80,000,000,000 ASCII characters.
One stanarded printed US-letter-sized page is 80 X 60 characters or 4800 characters.
80,000,000,000 characters / (4800 characters/page) = 16,666,667 pages (rounded off).
This is potentially just the data on Joe Schmoes Best Buy laptop. Now consider that the amount of data generated by humans is something like terabytes per day...
Healthcare article at Kuro5hin
For hundreds of years, after the science of creating corrective eye lenses was invented in Venice, Italy, the process of grinding and shaping the lenses was kept a very profitable secret. People who could not afford to pay for this very expensive Intellectual Property generally just went without. Sure. You could get magnifying lenses, but not lenses that corrected for nearsightedness.
Those of you of moderate to low income (I'm talking. . . making less than 7 figures per year, to put it in perspective with pre-reniassance nobility), who require corrective eye lenses, imagine yourself unable to beg, borrow, or steal a pair of glasses for yourself. Even crude ones.
Eventually, the secret got out, and now we have a global multi-billion dollar industry.
In other words, the very concept of IP is just plain evil.
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
I work in data capture for a pharma company. We're required by law to keep *RAW DATA* for the patentable lifetime of a drug, which could be 40 years in some cases. Doesn't sound too bad, but our raw data needs our application to browse it. That application needs our infrastructure - which is huge - it doesn't work as a standalone. That infrastructure only works on a particular set of hardware. There isn't an easy answer. We could say we'd bodge it and export to XML, but what about those ECG graphical traces that are in a proprietary format with annotations? It's really difficult and it's very tempting to say "print the whole lot out on several trees and put it in the paper archive"...
I am not worried about today's file formats from becoming lost to people 200 hundred years from now. In the future, when someone downloads version 32.2.0 of the kernel, they will have an option to include modules that add support to all applications for ancient file formats, really old file formats, and old file formats. Each one could take up a few hundred megabytes... but on the hardware of the future, that'll be like 640k today.
The only thing we need to do is maintain our compliance to standards! Because barring the end of the world, HTML and other standards will never die. They'll just get turned into kernel options with a default of NO.
no thanks
I have some old AutoCAD 3 files from high school, a hopelessly optimistic design for an automatic vacuum cleaner, if I recall.
My dad still has a program he wrote on punch cards someplace.
That's the trouble, isn't it? Even if the data survives, the hardware to read it might not.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
Although the issues involved in this case are slightly different, The term 'Software Archaeology' (or at least 'Programmer Archeologist') might come from Vernor Vinge's book 'A Deepness in the Sky'.
In that book, code-as-data is taken to an extreme, and the best programmers have the title "Programmer Archaeologist", since they spend little time writing new code; instead they look through old code to find something written for a similar situation. It isn't that old programmers are better-- it's that the software contains facts and information that are of value.
Whereas on Star Trek someone might look through an ancient captain's log to learn about a bizarre planet/new race/weird disease/strange technology, in Vinge's book that sort of specialized information is stored in the source code for software that was written at the time to deal with the situation.
Brian Bergeron gives a fairly decent treatment of the whole data loss issue in his book Dark Ages II: When the Digital Data Die . Although,
this could be a lot of hysteria over nothing. As I recall in
Asimov's Foundation's
Edge, Trevize comes across some ancient computers, and they
just fire up and start working beautifully right away after
centuries of disuse. Heheh, if only this were the case. The hard
drive on the HP I got last Christmas already crapped out.