Slashdot Mirror


Software Archaeology

Plug1 writes "Salon (day pass needed) has an article about preserving software for historical purposes. It discusses source code archiving, and the effect the DMCA is having on attempts to catalog and analyze legacy code. It will be a shame if in the future a wealth of information is locked away because knoweldge of the underlying technology is lost."

35 of 434 comments (clear)

  1. Please understand... by Creepy+Crawler · · Score: 5, Insightful

    That the DMCA DOES NOT APPLY outside the USA. However, hardware Digital Restriction Management DOES.

    I really dont want strong crypto keeping out of stuff that I OWN, or My CONTENT.

    I'td be a neat experiemnt to create a Linux driver that emulates TCPA chips so that stupid software thinks you're auth'ed.

    --
    1. Re:Please understand... by Creepy+Crawler · · Score: 3, Insightful

      It's a signature/encryption mechanism. Wait till MS requires it ON. They could even make it so the whole fucking partition is encrypted by a software key that YOU CANT GET.

      And once MS requires it, how's Linux going to fit in there? I'd figure that MS TCPA computers would have to be signed to even speak to other MS machines. We cant have traffic going out of the network that isnt validated for internal traffic.

      --
    2. Re:Please understand... by Lazar+Dobrescu · · Score: 5, Insightful
      This is not the only problem the article addresses though. As it is now, there are already tons of old file formats for which the software needed to read it is nearly impossible(or totally impossible) to find. Documents written in those file formats could contain useful, or at the least interesting content, but we can't get to that content.

      We are talking here about file formats 30 years old, or even less. Try to imagine what will happen in 200 years. Most of our history will be written to electronic media, and for people that will live in 200 years, the file format used for that media will very probably be undecipherable.

      What is the solution? Some say that we need to convert all documents in a more recent file format every x years. That will really become a pain in the ass as the number of archives go higher and higher.

      Another trick could be to describe in whole the file format used and attach that description to every file. That, of course, brings up the problem of what file format to use for that description... (will even plain ascii files still exist in 200 years? Maybe not, but I think it is reasonnable to expect that people will at least still have an idea of how to read them...)

      Comparing this to the problem faced for dead languages gives a good idea of the repercussions... There is already countless documents written in very old ages that we cannot decipher because the language used to write it is loss. People are working all their lives trying to understand a dead language. But with computers, we're not talking about something that happened 4000 years ago, but 30 years ago... That means that in the course of your lifetime, You could see obsolete file formats 3 times!

      Someone will need to find a solution for this, and preferably before the problem happens for real...

    3. Re:Please understand... by Ominous+Coward · · Score: 3, Insightful

      What we need to do is have a large book that describes all of the file formats. ASCII encoding, JPEG encoding, etc.

      The real worry I'd have is how someone will be able to get the stuff off of the media if the directory and interface standards change. Will their advanced computers even be able to read the disk to see that goatse.jpg is on that disk? Even if they had the algorithm to decode the image, they might not see it's there.

      --
      Ceci n'est pas une sig.
    4. Re:Please understand... by Sique · · Score: 4, Insightful

      The rosetta stone contained the message in Ancient Greek (a dead but widely known language at the time of deciphration), Coptian and Hieroglyphic. Even though Coptian was at least known to some specialists and people able to read Ancient Greek were abundant at the time (and still are), it took about 25years to decipher the hieroglyphic texts.

      And this was with a language which itself was very easy mapped to the letters (every consonant mapped to a letter, vowels omitted).

      The rules which encode a file may be much more complicated. Look just at the most common compression methods (Run Length for instance), how they just add another layer above the already encoded contents. And they remove something very important for deciphration, the redundancy, out of the data. Then the subjects that are stored in files are much more diverse. We have not only language, we have music and graphics, 3D data and cryptographic certificates, configuration files and program binaries.

      Just to be able to know what the file is about and thus have an idea how to get started can prove to be more complicated than any deciphration from archaeologic texts.

      --
      .sig: Sique *sigh*
    5. Re:Please understand... by DiscoDave_25 · · Score: 5, Insightful

      It's not just the file format that will be the problem (although MS aren't helping in that respect) but simply ensuring that the media that the file is written on can be read. Physical media degrade and the hardware to read them become obselete. An example of this was the BBCs Doomsday disk which contained a huge amount of information (for those days) on a laser disk that is today virtually unreadable. Thankfully this has been recently transferred onto DVD before ALL the readers died but just because someone can understand HOW to read a file doesn't mean they'll be able to access it in the first place.

    6. Re:Please understand... by 4of12 · · Score: 4, Insightful

      an archaeologist of tomorrow can figure out ascii.

      To be sure.

      And will they be able to figure out PowerPoint?

      And how about Secure PointPoint 2005 with automatic DocuSafe technology that incorporates encryption with a public key that is automatically downloaded over the network from microsoft.com after your VISA card number has been authenticated with citibank.com?

      No, tomorrow's archaeologists will miss out on the whole indecipherable morass that is today's data formats.

      Documents and presentations will look indistinguishable from random noise.

      And, honestly, a lot of what gets attached in those formats looks that way already to me in 2003.

      --
      "Provided by the management for your protection."
    7. Re:Please understand... by ncc74656 · · Score: 1, Insightful
      and handguns can be used for things other than shooting people

      Ted Kennedy's car has killed more people than my guns.

      --
      20 January 2017: the End of an Error.
    8. Re:Please understand... by micromoog · · Score: 2, Insightful
      Thanks for the well-articulated response. My original point still stands, however . . . to state it more clearly:
      • the 1% of non-killing-people applications of handguns are used to justify the other 99%
      • the 1% of non-piracy applications of KaZaa are used to justify the other 99%
      • the 1% of non-DRM-related applications of TCPA will be used to justify the other 99%.
      Everyone I know that's involved in "sports shooting" also considers it to be practice for that mythical day that the evil man breaks in and tries to kill the whole family. And finally, with the minor exception of non-lethal weapons, police officers are trained to shoot to kill (specifically to shoot at the center of the body mass). There's none of this "shooting in the leg" going on, at least not on purpose.
  2. Nestalgia by jbottero · · Score: 0, Insightful

    What real use other than nostalgia would this serve? And, personally, I think not too many people will care that much if obsolete software is de-constructed?

  3. Re:Just a thought you guys.... by CastrTroy · · Score: 2, Insightful

    Plaigarism would be if he copied the article, and claimed it was his own. However, this could constitute copyright infringement. I'm not sure how it works. You're allowed to copy sections(small?) from a book and put them in an essay, as long as you specify where they came from. Why would you not be allowed to post something from somewhere else as long as you specified where it came from?

    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  4. Fair Use by yorkrj · · Score: 2, Insightful

    This probably falls under the category of fair use.

    If it doesn't then there is still the matter of the government (the US at least) being able to do whatever it pleases with copywrited material. In this case the government's authority to copy what it wants is a good thing.

    The Library of Congress is already making archival coppies of copywrited music and it is going to continue this dispite any hypothetical protestations of the RIAA. Why, because it is deemed neccessary for the preservation of culture. It will ultimately be the governement who will have the authority to do the kinds of backup that is neccessary to preserve our programming heritage.

    It is our job as citizens to open the government's eyes to the need to copy this code before the technology that will allow us to do so becomes obsolete and otherwise unusable. Like any other technology programming will continue to advance but it is important to remember simpler the roots of the technology in order to provide the kind of perspective that lets us know where we've been and where we might be going.

  5. Re:Heh... by __past__ · · Score: 2, Insightful
    DOS is still useful now, for a limited problem domain, but that's not the point.

    Software development as an art/craft/science/whatever you think it is has evolved rapidly. There are "fashions" in code - try reading 20 year old C code: the language itself hasn't really changed much, but you will immediatly notice the differerence. People have tried things that failed, and have found interesting solutions that are now forgotten. This will all be lost.

    What would literature be like if we hadn't accesss to the classics? Or architecture? There is a lot of knowledge that is worth being preserved.

    And, of course, digging in old software is way cool.

  6. Re:Just a thought you guys.... by MarkLR · · Score: 2, Insightful

    Wouldn't they want the link?

    Assuming that the people goto the site to read the article (as opposed to reading it here from the comment in which the whole article was posted) it would drive up the number of ads served which would be a good thing. I would think

  7. Re:full article text, no pass required by mozumder · · Score: 5, Insightful

    You know, it really isn't fair-use to repost an entire article from another website site.

  8. Re:Explain the Pyramids? by KalvinB · · Score: 4, Insightful

    There's also the problem of grave robbers and that whole burning of the great library thing.

    The Egyptians could very well have written down the instructions for building them. There have been numerous opportunities for that information to be have been destroyed. Or they may have viewed their construction as too sacred and only passed down information on a need to know basis.

    Our problem is that we charge for rocks and lack the motivation. We just assume we couldn't build such things as they did but never really bother to try.

    Ben

  9. Re:Knuth is only one foundation that won't be lost by binaryDigit · · Score: 2, Insightful

    No one ought to knock VB because it really is the best tool for what it does, but it also lowers the barrier to entry for would-be programmers. This can only lead to worse programs.

    This is coming from someone who started in assembler and has been programming for over 20years now (primarily various assembly, C, C++), but I completely disagree with that statement. It's all in the context. Applications are about solving problems and if VB is the best tool for a particular problem, then it and the programmer who uses it don't necessarily lead to "worse programs". What leads to bad programs are things like bad programmers (regardless of background), poorly/undefined requirements, lack of resources, etc. I've met the gamut of programmers high level/low level and the common thread is the individuals ability to understand a problem and use the tools at their disposal to solve it. Obviously if you're looking for someone to code a compiler for you, you are going to avoid the VB guy who thinks C is no different than assembler. By the same token, I've seen apps written by assembler/C guys that were basically useless because, while the code may be good, the app itself didn't solve the problem (or did it in a very poor way).

    In this day and age, the apps are way too large and there are too many specialties/languages/environments to simply discount anyone because they never happened to program in C/assembler.

  10. Another red herring from salon? by poptones · · Score: 4, Insightful
    In one part of the article they mention losing "structure" of programs and talk about source code, then they talk about "losing" old code like the original DOS - for which, so far as I know, there is no publically available archive of source code. So too of Lotus 123, another piece of code mentioned in the article. this is just more fatalistic nonsense people spew when criticising the DMCA. Yeah, it's a bad law, but this nonsense about "losing old works" is just that.

    If you have the source code for something then you have no cause to fear the DMCA, since you don't need to decrypt it. And if you don't have the source code, where is the value? Is there really any value in running lotus 123 for the Apple//? Perhaps if you have an Apple//, but so what? You cannot "fly over the code" from any height (as was mentioned in the article) because you don't have any code to fly over. You have an executable, and the "structure" there is quite different than looking at source code.

    If you want source code for DOS, hit freedos.org and download it. It's not Microsoft's source, but so what? It does the very same job and, in many cases, it's superior to the original. Works that have value will be replicated and emulated; works thta have no value simply have no value - where is the need (or logic) in "preserving" them?

  11. Formats not the problem by Mr_Silver · · Score: 3, Insightful
    I don't think the format issue is that big a problem. A large number of closed formats have been reverse engineered to a point where you can extrapolate the pertinant information. Your biggest problem is availability of the hardware.

    Take the Doomsday Project (in the UK) as an example. An Acorm Archimedies lazerdisc full of content relating to life in the 20th century. The problem came when they wanted to get the data off .. and couldn't easily find a compatible lazerdisc reader.

    Of course, the format of the data is an issue. But if you can't get the data off the media, then the format of it isn't going to matter in the slightest.

    --
    Avantslash - View Slashdot cleanly on your mobile phone.
  12. Re:Storage of old data / hardware by linuxtelephony · · Score: 2, Insightful

    On my last move I had to "retire" a couple of 11/725s and most of my "wall of orange". It was a sad day, but I had moved those heavy monsters far too many times and there just wasn't room this last time. One worked, the other was parts, had DECnet and a coax ethernet, not to mention dual tape drives and a removable platter (I think it was 26 meg ramovable, and 26 meg internal, it's been a while).

    Your right, those things cost money to keep them going. And for what? A novelty? These things were doing any work or anything for me. I ended up buying them for their documentation. Then, when they were no longer needed, do you know how hard it is to keep the wife happy when she wants to decorate, and can find nothing that goes with an orange wall? :)

    The sad thing is these were not "interesting" enough for any of the "computer museums" or "computer history" places I was able to contact. I even tried to give them away to anyone that would pick them up on craig's list in san francisco. In the end, they were trashed because absolutely no one wanted them.

    --
    . 62,400 repetitions make one truth -- Brave New World, Aldous Huxley
  13. Re:full article text, no pass required by Andrew+Leonard · · Score: 4, Insightful

    At least with jay-walking, no matter how many times you do it, the road will still be there. But if you post the full text of Salon stories without either subscribing or getting the FREE day-pass, eventually we will no longer be able to pay fine writers like Sam Williams and Rachel Chalmers to write the stories that Slashdot readers like to read.

    --

    Editor, Salon Business & Technology

    Salon.com

  14. Re:Knuth is only one foundation that won't be lost by Kaa · · Score: 5, Insightful

    The most fundamental concept in computer science is logic, not algorithms (or worse programming languages). If a 'programmer' hasn't written a program in a low level language like C or assembler, the hiring manager should beware. Without hands-on experience with the fundamentals of computer science that person is lacking at the most basic level, regardless of whether he knows 1 language or 50 languages. He is handicapped.

    Bullshit.

    "Computer science is about computers in the same way astronomy is about telescopes" --Edsgar Dijkstra

    Programming isn't about knowing how to twiddle bits in registers or even how to leverage strengths of a particular processor.

    Programming is about dealing with complex problems which can be solved by manipulation of information. I would say the the quality a programmer needs most of all is not logic or math, but just the ability to hold and manipulate large and complicated structures inside his head. And no, it doesn't have anything to do with assembler, low-level languages, ALUs, bits, etc. etc.

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  15. Re:Explain the Pyramids? by Ominous+Coward · · Score: 2, Insightful

    What's really amazing about the pyramids is not really their size so much as the fact that they're nearly perfectly square, to within under 1% error. Also, that they're aligned North-South nearly perfectly as well. The ancients were much more clever than we typically give credit.

    --
    Ceci n'est pas une sig.
  16. Re:full article text, no pass required by Andrew+Leonard · · Score: 3, Insightful

    You watch an ad to get a day pass. Advertisers pay to sponsor the daypass. The more people use the daypass, the more valuable that sponsorship, and the more we can charge for it.

    --

    Editor, Salon Business & Technology

    Salon.com

  17. Bah. by mblase · · Score: 1, Insightful

    Without hands-on experience with the fundamentals of computer science that person is lacking at the most basic level

    That's like saying that a journalist is lacking in his ability to write if he's not fully competent in Latin. Just because someone doesn't know how to allocate memory doesn't mean he can't code in a language that does it for him automatically.

  18. Re:full article text, no pass required by Seek_1 · · Score: 2, Insightful

    I get the same thing.

    Having the Day-Pass system is only useful if it actually works.

  19. Bloatware by yintercept · · Score: 2, Insightful

    Microsoft is already doing this. Each version of a new MS operating system and office product generally includes a pretty much unedited copy of the previous copy of all prior editions of the software. So they are preserving history.

    Each new version, the software gets bigger and bigger and biggers. It is an archealogical wonder in itself. Another name for this coding style is called bloat. Linux has many of the same things going on.

    This argument about the need to preserve prior formats has been around for quite awhile. The truth of the matter is that software is largely an evolutionary process. Most file formats build upon the past, so there is a tendency for software to naturally preserve its path.

    Of course, for Grady Booch, who wants to be reconized as an intellectual giant a thousand years from now, the main question is if his name will invoke the same awe as say Euclid and Archimedes. He is, after all, one of the trinity of OO modeling approaches.

  20. Mandatory source code deposit by Animats · · Score: 3, Insightful

    This is a good argument for mandatory source code deposit. To get a copyright on code, you should have to deposit a copy of the source with the Library of Congress. The Library of Congress has the authority to require this, but currently they only require a printout of the first ten and last ten pages, because they didn't want to store all the paper. That should change.

  21. Re:Knuth is only one foundation that won't be lost by Tokerat · · Score: 2, Insightful

    I would say the the quality a programmer needs most of all is not logic or math, but just the ability to hold and manipulate large and complicated structures inside his head.
    ...and without the logic and math and technical skills to properly implement such a thing, you end up with slow, buggy-by-design code, which ends up costing more to maintain and is a big waste of time. I would never hire someone who has only worked in VB and Java, for example.
    --
    CAn'T CompreHend SARcaSm?
  22. Re:full article text, no pass required by gr8_phk · · Score: 2, Insightful
    I like the Salon format. Read the intro, and if it's interesting, sit through an ad for the rest. Unfortunately, that ad wouldn't work in my browser (an old Mozilla with some features turned off). Then I saw the full text here at /. and had 2 thoughts: 1) This is not good. and 2) Great I can read it. [in that order actually] In my case, I don't feel bad because I couldn't get to the full article on Salon. In general, I'd have to agree that it's not right.

    What if the software acheologists don't have the required plugin?

  23. Re:full article text, no pass required by cK-Gunslinger · · Score: 2, Insightful

    You know, my first response to this is "tough cookies." I don't see any other popular sites using this forced-ad-viewing method. If they did, I would just delete my bookmarks for them.

    Any entity that begins to implement anti-consumer actions in order to stay afloat are doomed to begin with (RIAA, SCO, etc.) If you can't stay out of the red by simply providing your service with a *reasonable* amount of revenue-generating methods, then that should tell you that either:

    a) You need better revenue-generating methods
    or
    b) Your service isn't profitable

    Like most online entities in trouble, you assume (a) and look for alternate ways to get paid. Unfortunatly, instead of finding better "quality" services, you sacrifice your customer's resources (time, effort, patience, etc) instead. Eventually, you cross that fine line between mild-nuisance and "not worth the effort."

    I find your recent actions "not worth the effort" and will not be visiting your site. But hey, that's just one netizen. What harm can that do, right?

  24. Re:Other technologies go obsolete too, So what? by Anonymous Coward · · Score: 1, Insightful

    Why is obsolete software technology worth preserving where obsolete manufacturing technologies are not?

    Who's making this double standard you speak of? Have you read the article? Perhaps you could point out the bits where the author states that manufacturing technologies aren't worthy of recording?

    In a 100 years, will we really need access to the billions of JPEGs that were spewed out by digital cameras everywhere?

    Yes. If you don't understand this, then you don't understand why history is so important. History gives us a sense of our past - it provides a link to where we came from, and gives us a better understanding of who we are.

    More importantly, even if we don't need every picture, we at least need some of them, which is what the article says - if we lose the ability to decode JPEGs, then not only do we lose the ability to view "unimportant" (in quotes because this is very subjective) JPEGs, but we also lose the ability to view the ones that ARE important.

    There is no double standard.

  25. Re:here's an easy howto: by pmz · · Score: 2, Insightful

    Conserving data is not as easy as it seems. I wonder whether it'd be more efficient to print out the source codes on acid-free paper and store them like books - or perhaps microfiches - in a number of locations around the world.

    One modern 80GB hard disk.

    80GB = 80,000,000,000 bytes = 80,000,000,000 ASCII characters.

    One stanarded printed US-letter-sized page is 80 X 60 characters or 4800 characters.

    80,000,000,000 characters / (4800 characters/page) = 16,666,667 pages (rounded off).

    This is potentially just the data on Joe Schmoes Best Buy laptop. Now consider that the amount of data generated by humans is something like terabytes per day...

  26. Re:Other technologies go obsolete too, So what? by jafac · · Score: 3, Insightful

    That same SciAm article mentioned the impending loss of archived data from NASA, data collected from satellites launched at a cost of hundreds of millions of dollars.

    Some of this data is useless, today. In the future, someone might find it useful. Do we allow this data to degrade, and then possibly launch a new satellite to collect new data (if that's even possible, in some cases, it's not - how do you gather climate data from the 1970's?).

    The main problem is the tape backup companies no longer support the old tape drives, and new tape drives don't support the old tapes and tape formats.

    Funny thing is, 5 years ago, I was there with everyone else saying that we should put this data on CD ROM, because that format will never, ever, ever go away. Now, I'm not so sure - if they ever straighten out the DVD standard, I can see a future, 10 years from now, when you won't be able to buy a new device that can read a CD ROM.

    --

    These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  27. Anyone have a spare 8" floppy drive? by spun · · Score: 3, Insightful

    I have some old AutoCAD 3 files from high school, a hopelessly optimistic design for an automatic vacuum cleaner, if I recall.

    My dad still has a program he wrote on punch cards someplace.

    That's the trouble, isn't it? Even if the data survives, the hardware to read it might not.

    --
    - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton