Dealing with Inherited Data and Code?

← Back to Stories (view on slashdot.org)

Dealing with Inherited Data and Code?

Posted by Cliff on Tuesday November 9, 2004 @04:32PM from the organizing-the-informational-glut dept.

bhima asks: "Recently I have inherited an embedded project which developed and maintained by a recently acquired company. The 'technology transfer' consisted of me traveling to their facility for two weeks of special high intensity training and returning with a couple of hard drive, equivalent DVD-ROMS, 200 kilograms of paper and a stack of tape backups. These contain a lot of interesting and important data but it is in every conceivable format: hundreds of megabytes of Outlook PST files, Adobe PageMaker & Illustrator (4 different versions for Mac & PC), Gerber files, Microsoft Office files (every version ever), Visio Files, Tiffs, Jpegs, AutoDesk Files, Pro-E files. To top it all off they used no concurrent versioning system for their firmware so I have hundreds of tar.gz files that are snapshots of code, plus the resultant binary record for version represented by the tar file. We have a student translating all of the CAD data to our system, but that's only part of the story. Is there an easy way to get the firmware in to CVS or subversion? What's the best way to organize all of this data so that it's actually usable?"

34 comments

Min score:

Reason:

Sort:

Shove it into CVS by TheSHAD0W · 2004-11-09 16:40 · Score: 4, Informative

I had a similar problem, and by taking the code snapshots in order and shoving them into CVS, it was a great help in figuring out what changes were done when and for what reason. Obviously not as good as a commented changelog per file, but better than nothing.
Depends on your time requirements by Aloekak · 2004-11-09 16:48 · Score: 5, Insightful

What's the best way to organize all of this data so that it's actually usable?

It really depends on your time requirements.

Personally, I hate writing documentation, but if you have time, you really need to write a migration plan. Basically you need to write down what all you have and what you want to do with it.

The migration plan should list all the milestones and even individual steps. This really sounds like a big project, not something you should spend a day or two on by cramming it into your system. This might seem tedious, but if you spend at least a few days organizing your thoughts and planning this, you'll save a lot of time later. The plan should probably also be passed around higher up, which means it should be readable, to make sure you're doing everything with the data and documentation that management wants.

Sounds like you'll be having fun for a while :)
Mission Impossible by Darth_Burrito · 2004-11-09 16:57 · Score: 4, Interesting

From the description, it seems like you might be looking for a little sympathy. You've got it.

Without knowing the constraints or anything about the project, it's hard to give specific pointers. My advice would be to first prioritize the information. Figure out what do you need to know first, when you need to know it, and where you are likely to find it. If there are particular constraints that will be impossible to meet, truthfully determine that this is the case and report your findings.

If you have a lot of time, you might want to consider setting up some kind of document management system as a sort of knowledge base. If you don't have a dms, you can probably find one on sourceforge. I checked and the first one that popped up was call Owl Intranet Engine.

If you don't have a lot of time, select a point in the problem that you think is both understandable and provides a great potential to shed light on other aspects of the problem and then dive in. Think of it like you're mapping an unknown territory. Look for a mountain you can climb and scramble to the top and then use your perch as a vantage point to see everything within range. (This is how I design software - don't tell).
Start over. by bergeron76 · 2004-11-09 17:16 · Score: 0

Take the concept and re-design from the ground up. Create a 'historical research team that will dig into the 'archives' and figure out how to do anything you can't redesign easily.

Good luck.

--
Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.
1. Re:Start over. by bigsteve@dstc · 2004-11-09 17:36 · Score: 4, Insightful
  
  Given you don't know what the current codebase is like, that is BAD advice. If the code is moderately well written, the chances are that a total rewrite would not improve things much. In short, it would be a waste of time / money. Don't confuse poor software engineering processes with bad code.
Fuck 'em! by kinema · 2004-11-09 18:05 · Score: 5, Funny

Burn it all.
1. Re:Fuck 'em! by cakefool · 2004-11-11 07:30 · Score: 1
  
  seriously? Threaten to burn it - then when someone comes running, save the important bits.
2. Re:Fuck 'em! by cakefool · 2004-11-12 03:40 · Score: 1
  
  well, that serves me right, my hard disk just blew its controller.
Fire--and lots of it... by mkcmkc · 2004-11-09 18:06 · Score: 4, Insightful

My advice would be to consider that you're starting the project from scratch. The dumpster full of stuff you inherited from the previous project can (and probably should) be mined for requirements and possible implementation ideas, but as a working base for further development, it's worse than worthless. Certainly not something you'd want to put into CVS.
Management types usually seem to think that source code per se is a precious commodity. You read hysterical quotes in the trade rags all the time about the dire effect of source code being stolen, etc. Serious practitioners know that source code by itself is virtually worthless--you need access to, and the good will of, the people that designed and implemented it. That's what's precious.
(Aside from being stupid and evil, software patents are pointless for this very reason. Even source code copyright is barely worthwhile.)
Mike

--
"Not an actor, but he plays one on TV."
1. Re:Fire--and lots of it... by dubl-u · 2004-11-09 18:22 · Score: 5, Insightful
  
  I agree heartily with about 98% with this, especially this part:
  
  Serious practitioners know that source code by itself is virtually worthless--you need access to, and the good will of, the people that designed and implemented it. That's what's precious.
  
  But if you do have to start reverse-engineering the product, the source code can be useful. Assuming that you can get it to build and run in a debugger, that is.
  
  Or, if the code base contains a good automated test suite, that makes it very much worth the effort. Then you can trace the what of the code back to the why of the tests.
  
  My advice would be to consider that you're starting the project from scratch.
  
  The problem with this is there are probably a number of executives who think that by buying the psuedo-tangible assets, they've gotten a big leg up on a from-scratch project. I think your advice is accurate, but the poster is going to have a hell of a time getting the execs to have the same expectation. And unless he does, it's going to be a long slog of insane deadlines and disappointed bosses.
  
  Personally, I'd consider this a fantastic time to update my resume.
Possibly not as bad as it looks by JavaRob · 2004-11-09 18:46 · Score: 4, Insightful

Obviously I don't know the particulars of this project, but I've been in similar situations before.

My advice: don't worry about most of it. Don't throw it away(!), but don't go loading every revision from the past 3 years into CVS and converting every document to a readable, searchable format.

If the project was at a milestone (and the last code snapshot you have was fully tested), just load that into CVS and work from there. If it was in active development, maybe take a 5 snapshots and commit them in order, reviewing the diffs to get a sense for the direction things were heading.

If you can also get some tips for where to find the details on the features/changes that were "next up", that's also good -- but DO NOT take the time to read through the earlier documents and discussions. All that has changed by now; you'll just get confused. 99% of those docs are talking about software or features that didn't exist yet, and probably doesn't exist in the same form now, either. Do you have real changedates on the files? If you do that helps -- there may be a few documents that were actively updated and used (risk assessments, to-do lists, etc.) that might be nice to skim over. But software dev is a ceaseless process of change, so anything older than a few months is basically guaranteed to be obsolete and useless. Developers and managers keep this stuff as a CYA measure, or because "one of these days" they are going to update them and make them useful again.

Going forward, your best way to understand what the software does now is by talking through it with the people you have access to, and using it (reading and commenting the code when you aren't sure what's going on). Your best way to understand what functionality should be added next depends on where your company wants to go with it (which may not match up with the other company's plans...).
1. Re:Possibly not as bad as it looks by Dr.+Manhattan · 2004-11-10 02:26 · Score: 2, Informative
  
  Going forward, your best way to understand what the software does now is by talking through it with the people you have access to, and using it (reading and commenting the code when you aren't sure what's going on).
  My group inherited a bunch of code from another group; almost the worst possible situation (the original product was a prototype that had been shoved into production, the code was meant to be 'portable' but was never actually ported, and hence full of gotchas, etc.).
  Ever read 'The Art of Unix Programming'? When he said, "The combination of threads, remote-procedure-call interfaces, and heavyweight object-oriented design is especially dangerous... if you are ever invited onto a project that is supposed to feature all three, fleeing in terror might well be an appropriate reaction.", he wasn't kidding. One of our guys found 1500 lines of code that didn't do anything.
  The key thing is to figure out where the joints are. Find the interfaces, the ways different peices talk to each other. Understanding this is usually the key to how the whole code is organized. It tells you how the authors thought about it. And it also tells you what parts can be incrementally replaced without having to throw out the whole shebang.
  
  --
  PHEM - party like it's 1997-2003!
2. Re:Possibly not as bad as it looks by JavaRob · 2004-11-10 05:11 · Score: 1
  
  The key thing is to figure out where the joints are. Find the interfaces, the ways different peices talk to each other. Understanding this is usually the key to how the whole code is organized. It tells you how the authors thought about it. And it also tells you what parts can be incrementally replaced without having to throw out the whole shebang.
  
  That's great advice -- right, once you have control over the interfaces you can do lots of things. You can put a good-interface wrapper around a clump of spaghetti code with a poor interface, then replace that whole section at your leisure. Etc. etc. Make sure you have good tests written, and you can just *drop* those 1500 lines of code that you suspect aren't doing anything -- if your tests are good and they pass, you were right. This is where a good IDE really shines as well -- it can tell you what methods are never called, what parameters are never used, and so on; just cleaning out that junk makes any code much more manageable.
  
  Good reading suggestion for those interested in this: Martin Fowler's Refactoring.com. All of these refactoring patterns have names that I don't remember, and there are plenty more strategies discussed there that'll make your eyes light up if you've been in this situation before.
3. Re:Possibly not as bad as it looks by bhima · 2004-11-10 05:35 · Score: 1
  
  What's weird is that we've had the most trouble with the extreme ends (time wise) of this whole project. We need to provide spares for the devices that are no longer in production. And the devices that were "mostly finished development and heading to series production" probably will have to be redesigned from the ground up. The devices in series production all have unreleased firmware slated to go production in 2005 but no one really remembers or cares so I've put off the validation until someone whines about.
  I work in an industry where various government bodies can knock on the door at any time and ask to review our records so the Idea of throwing something away is akin to committing suicide with a rusty spork. What's totally bizarre, in my mind, is that previous owners reaction to this was to simply never put certain types of problems down on paper or any other electronic format so the why of many changes is lost...
  
  --
  Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Whoa there by JavaRob · 2004-11-09 19:04 · Score: 5, Insightful

I always wonder about the code quality I'd get out of developers who make these comments...

Source code *can* be worthless, and it *can* be extremely valuable. It all depends on the talent and good sense of the developers who came before. If the code is well-organized (even if it's not well-commented!), it's probably well worth it to keep it. Even if moderately heavy refactoring is required, you're still starting with a WORKING product. [I think -- hard to tell from the description]

In a business environment, that is *way* better than starting off with nothing. Look at Mozilla -- sure they got a sweet browser out eventually, years and years after scrapping the original Netscape browser and starting from scratch. But if they'd been a real company selling a line of browsers as their business, that decision would have destroyed them.

If you inherit good code, celebrate and learn. If you inherit bad code, write automated tests and refactor until you can understand what's going on. It'll be painful for a bit, but you'll be better off. Only if you inherit really abhorrent, non-functional software is a ground-up rewrite really the best choice.
1. Re:Whoa there by Chembryl · 2004-11-10 00:52 · Score: 1
  
  Exceedingly good advice.
  I can think of more than one instance were my current employers could have benefited from it.
  
  --
  - This and all my posts are public domain. I am a Physicist. I am not your Physicist. This is not Physically advice
2. Re:Whoa there by bhima · 2004-11-10 06:10 · Score: 2, Insightful
  
  Every young developer I have ever hired says that and I've always put it down to the self confidence or arrogance it takes to be good at doing what they do.
  The simple fact is that in my business the path that begins with throwing ANY electronic document away ends in either unemployment, court or worse.
  We have more than 20 devices that are no long in production that we provide spares for, another 10 or so that are obsolete, 15 in series production and 7 in the development pipeline.
  Like you said it's a lot easier to re-factor than write from scratch in most cases and in the rest you still use the old code to write the new.. .
  
  --
  Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
As you do... by slittle · 2004-11-09 19:11 · Score: 4, Funny

for two weeks of Special High Intensity Training and returning with...

undreds of megabytes of Outlook PST files, Adobe PageMaker & Illustrator (4 different versions for Mac & PC), Gerber files, Microsoft Office files (every version ever), Visio Files, Tiffs, Jpegs, AutoDesk Files, Pro-E files.

Yep, I'd say you inherited a pile of shit.

--
Opportunity knocks. Karma hunts you down.
by data and timestamp by awerg · 2004-11-09 19:15 · Score: 3, Interesting

Aside from the obvious horror and sense of doom that you must be feeling, I would start by organizing everything by date and timestamp.

Here is what I did in a previous project.

1. I created a wiki website on my laptop and put all the files in one directory. I used a wiki that was not tied to a database, but only file based.
2. I read every single thing from the beginning and wrote my thoughts in the wiki (including links to important docs). Management gave you all the stuff to be read, right.
3. Then I converted everything to pdf files with the date-timestamp in the naming convention.

When I was done, I had a storable, printable and searchable trail of how we ended up where we are today.

Be sure you tell the management how long it will take you to organize all the stuff, they gave you.

I hope this helps.

--
-- Andy
What we did in a similar situation... by the_ed_dawg · 2004-11-09 20:11 · Score: 3, Interesting

When I was interning at a sensor company, we had a similar situation. Basically, the new product development lab was asked to build an updated version of one of our production machines, which would sound simple enough. The original was designed in 1979 and completed in 1984, and the original staff was no longer working there for various reasons (mostly due to retirement and a neurotic obsession with all things Macintosh -- no trolling, just wait until later).
We had a drawer full of old schematics in no particular order, a printout of some Pascal code, and a good intuition about how the machine worked because it had been in constant use since 1984. However, careful examination showed that the inside of the machine (an 8-foot tall rack) consisted of wire-wrapped boards of TTL logic with ribbon cables soldered into the motherboard of an old Mac SE. We couldn't tell if the schematics were accurate at all because of the rat's nest of wires inside. The source code was nowhere to be found on the Mac, so we didn't know if the printout of code had been modified. So many optical and mechanical parts on the machine were obsolete that we couldn't find datasheets without some research from the manufacturers. It didn't help that the guy who built it was eventually fired because he was artifically providing job security by making convoluted designs without adequate documentation.
What did we do? We spent the first month reading the schematics and drawing our own block diagrams down as low as we could go. Then, we tried correlating those features with the source code to determine how much of the code made sense. However, the most important thing that we did was keep immaculate notes of the process as we went through it about why the original machine was designed the way it was.
We took the "why" and as much existing hardware/software as we could. When we couldn't take their work directly or it had been obsoleted by significantly advanced technology, we replaced it with something similar. By the time I left, we had managed to sort through the mess enough to send a board out for layout, which I think is pretty good since they only put one engineer on it full time and an intern.
I guess what I'm saying is that we couldn't do it quickly, so it's likely you won't either. Get the lastest version into CVS and start reading. Take good notes and draw lots of diagrams. I'd probably start with the hardware first because otherwise the source is likely to make zero sense. Hire a good intern to import all the electronic documentation because they're cheaper anyway and will probably surprise you with some of the things they find. Good luck!
When in doubt, find an old priest and a young priest...

--
There are two types of people: those prepared for the zombie apocalypse and those who will be eaten.
1. Re:What we did in a similar situation... by bhima · 2004-11-10 05:58 · Score: 2, Informative
  
  "artifically providing job security"... While I try to always attribute things like this to stupidity or poor work habits. I have found several "events" in each device's documentation or source that appear almost like one thread in a giant tapestry has been singled out and purposely removed. Usually it's subtle and a few weeks archeologist work (and a little assembly coding) resolves it. But it always leaves an "on purpose" taste in my mouth. Still though it's fun in a perverse way and I would not be adverse to being a full time reverse engineer (if I ever have to find another job)
  The first thing I did was hire summer students to convert all of the electronic documents, some of was challenging (as I described in another reply) but doable. All the kids are back in school now so it's a lot more quiet in my office & labs.
  .
  You're right though some days I do feel like I need a priest and a good deal of the refactoring juice
  
  --
  Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Kids these days... by Anonymous Coward · 2004-11-09 21:41 · Score: 0

Inherited data and code? Those kids these days... Back in my days we were inheriting only physical stuff. And we were grateful!
SubVersion by DarkDust · 2004-11-09 21:54 · Score: 3, Interesting

Should be quite easy with SubVersion to do the versioning:

First, create a new repository (or just create a new directory in an exisiting repository, that'd be easier).

Then unpack the first tar.gz into that directory, "svn add" everything and commit.

Next step is to unpack the next tar.gz into the same directory, do an "svn status" and "svn add" all files that have a question mark as status. Commit. Repeat.

This can even be scripted quite easily.

This procedure does have one problem, though: you won't catch when a file got deleted. You could do a script that compares the (sorted ?) file listings of the tar.gz that lists all files that were present in 1.tar.gz but are missing in 2.tar.gz and thus have to be deleted.

SubVersion is propably the better choice here anyways simply because if you're talking about firmware then I assume there are some binary files. And SVN handles binary files way more efficient than CVS. Plus SVN versions the repository as a whole, not just single files. So with the method described above each revision in the SubVersion repository would map to the exact content of tar.gz that you've used to create that revision.
1. Re:SubVersion by bhima · 2004-11-10 05:18 · Score: 1
  
  In the weeks that have gone by since I submitted the question, this is what I accomplished the easiest (I assume because I am most comfortable with source code). This also allowed me to covert the random line endings to a consistent UNIX ending, remove all the errant tabs, run the source through a source formatter (this removed a remarkable number of differences in various files) and then check it into SubVersion. That was the what in firmware that changed.
  The why is mostly contained with in Outlook files, which I have at least completely converted to the current version or *.OST files. However the only thing that read these files is Outlook and I find that very, very scary (given the number of corrupt files I went thorough) What I'm thinking of doing now is to converting them to something Mozilla'ish thinking that would be more common and more safe.
  Then there is the all of the Adobe stuff, the Mac stuff and the Adobe on Mac stuff. That has been very challenging to update mostly because the newest Adobe apps only import older files to a point and then not so I (OK my Student) has wound up using there different versions and my old Mac Cube to do a three step update.
  
  --
  Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
2. Re:SubVersion by dubl-u · 2004-11-10 07:57 · Score: 1
  
  This procedure does have one problem, though: you won't catch when a file got deleted. You could do a script that compares the (sorted ?) file listings of the tar.gz that lists all files that were present in 1.tar.gz but are missing in 2.tar.gz and thus have to be deleted.
  
  I use rsync for this. I've never used SVN, but for CVS, I have two directories, the real version and the CVS version. You can have rsync update from the real to the CVS directory and delete anything different while excluding the CVS directories.
Save your company a step by jayayeem · 2004-11-10 01:51 · Score: 1

While you are processing all this information, have it translated into Hindi.

--
I metamoderate, therefore I am
Poor Guy! by QEDog · 2004-11-10 02:56 · Score: 1

We have a student translating all of the CAD data to our system
So it is true what some researchers say: students ARE slave labor. Poor guy!

--
"There is no teacher but the enemy."-Mazer Rackham
1. Re:Poor Guy! by bhima · 2004-11-10 05:20 · Score: 1
  
  That, my Friend, is the whole point of students!
  Besides I'm not a Pro-E jockey and don't want be. He, on the other hand, did when he started and has learned an important life lesson.
  
  --
  Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Google Desktop by Anonymous Coward · 2004-11-10 05:03 · Score: 0

Any mileage in using desktop to make the Office documents and Outlook psts searchable at least? Obviously this will not help in anyway with everything else.

You have my sincerest sympathy.
Make it Production Ready by HeyLaughingBoy · 2004-11-10 07:00 · Score: 2, Insightful

Assuming the hardware and software are debugged and ready for production, start with the most recent stuff. Put it (source code, hardware schematics, mechanical drawings, compiler & build tools, etc) into Version Control and label it Release 1.0 Now you have a starting point that consists of everything you need to build this thingie and sell it. That should be your first concern.

It's a lot less important to be able to go back and look at earlier changes (note I said *less* important; not unimportant).

Next, start from the earliest archives and try to find actual requirements/specs. The point here is that now that you can build these things, you need to be able to test them and fix bugs, add features, etc.

After that, you can consider checking through everything else to see if it's worthwhile adding to the Product File, but at this time you may be into diminishing returns. Product requirements and current code are the two most important things to capture out of the total mess of files.
I recently ditched outlook...heres how by Phil+John · 2004-11-10 09:28 · Score: 1

I tried a few methods, mozilla's import (strips any html emails down to a pseudo text only format (the text bit of a rtf file I believe).

I then setup a simple imap server and tried to drag the emails from outlook into the imap folder I mounted. Big mistake, outlook "hangs" after a random number of messages (always less than 30) so I ditched that.

outlook2mac from littlemachines looked good and was only $10.00 but I wanted something free.

In the end I downloaded a 30 day trial of communigate pro from stalker software plus their mapi outlook connector. Setup the account and checked "convert outlook rtf into html" and copied stuff accross. I then connected to the imap server that comes with communigate pro using thunderbird and copied the stuff to its folders (and thus converted it to mbox).

Convoluted but it did the trick!

--
I am NaN
Load into a document management system by sushi · 2004-11-10 11:24 · Score: 1

This doesn't apply to the source code, which should obviously go into a version control system, but for all the other documentation:

I'd look at bulk-loading everything into an electronic document management system (EDMS). This will full-text all the various formats and then at least allow you to search over them, apply metadata to them to profile them etc.

No point in re-inventing the wheel by starting from scratch... use the information that already exists.

--
--- cut: Eat well, exercise, die anyway.
Google by aero6dof · 2004-11-10 12:49 · Score: 1

Google sells a search appliance which can index a wide range of formats. Buy it, scan your files, and query it while building up your knowledge of the project state.