Dealing with Inherited Data and Code?
bhima asks: "Recently I have inherited an embedded project which developed and maintained by a recently acquired company. The 'technology transfer' consisted of me traveling to their facility for two weeks of special high intensity training and returning with a couple of hard drive, equivalent DVD-ROMS, 200 kilograms of paper and a stack of tape backups. These contain a lot of interesting and important data but it is in every conceivable format: hundreds of megabytes of Outlook PST files, Adobe PageMaker & Illustrator (4 different versions for Mac & PC), Gerber files, Microsoft Office files (every version ever), Visio Files, Tiffs, Jpegs, AutoDesk Files, Pro-E files. To top it all off they used no concurrent versioning system for their firmware so I have hundreds of tar.gz files that are snapshots of code, plus the resultant binary record for version represented by the tar file. We have a student translating all of the CAD data to our system, but that's only part of the story. Is there an easy way to get the firmware in to CVS or subversion? What's the best way to organize all of this data so that it's actually usable?"
From the description, it seems like you might be looking for a little sympathy. You've got it.
Without knowing the constraints or anything about the project, it's hard to give specific pointers. My advice would be to first prioritize the information. Figure out what do you need to know first, when you need to know it, and where you are likely to find it. If there are particular constraints that will be impossible to meet, truthfully determine that this is the case and report your findings.
If you have a lot of time, you might want to consider setting up some kind of document management system as a sort of knowledge base. If you don't have a dms, you can probably find one on sourceforge. I checked and the first one that popped up was call Owl Intranet Engine.
If you don't have a lot of time, select a point in the problem that you think is both understandable and provides a great potential to shed light on other aspects of the problem and then dive in. Think of it like you're mapping an unknown territory. Look for a mountain you can climb and scramble to the top and then use your perch as a vantage point to see everything within range. (This is how I design software - don't tell).
Aside from the obvious horror and sense of doom that you must be feeling, I would start by organizing everything by date and timestamp.
Here is what I did in a previous project.
1. I created a wiki website on my laptop and put all the files in one directory. I used a wiki that was not tied to a database, but only file based.
2. I read every single thing from the beginning and wrote my thoughts in the wiki (including links to important docs). Management gave you all the stuff to be read, right.
3. Then I converted everything to pdf files with the date-timestamp in the naming convention.
When I was done, I had a storable, printable and searchable trail of how we ended up where we are today.
Be sure you tell the management how long it will take you to organize all the stuff, they gave you.
I hope this helps.
-- Andy
We had a drawer full of old schematics in no particular order, a printout of some Pascal code, and a good intuition about how the machine worked because it had been in constant use since 1984. However, careful examination showed that the inside of the machine (an 8-foot tall rack) consisted of wire-wrapped boards of TTL logic with ribbon cables soldered into the motherboard of an old Mac SE. We couldn't tell if the schematics were accurate at all because of the rat's nest of wires inside. The source code was nowhere to be found on the Mac, so we didn't know if the printout of code had been modified. So many optical and mechanical parts on the machine were obsolete that we couldn't find datasheets without some research from the manufacturers. It didn't help that the guy who built it was eventually fired because he was artifically providing job security by making convoluted designs without adequate documentation.
What did we do? We spent the first month reading the schematics and drawing our own block diagrams down as low as we could go. Then, we tried correlating those features with the source code to determine how much of the code made sense. However, the most important thing that we did was keep immaculate notes of the process as we went through it about why the original machine was designed the way it was.
We took the "why" and as much existing hardware/software as we could. When we couldn't take their work directly or it had been obsoleted by significantly advanced technology, we replaced it with something similar. By the time I left, we had managed to sort through the mess enough to send a board out for layout, which I think is pretty good since they only put one engineer on it full time and an intern.
I guess what I'm saying is that we couldn't do it quickly, so it's likely you won't either. Get the lastest version into CVS and start reading. Take good notes and draw lots of diagrams. I'd probably start with the hardware first because otherwise the source is likely to make zero sense. Hire a good intern to import all the electronic documentation because they're cheaper anyway and will probably surprise you with some of the things they find. Good luck!
When in doubt, find an old priest and a young priest...
There are two types of people: those prepared for the zombie apocalypse and those who will be eaten.
Should be quite easy with SubVersion to do the versioning:
First, create a new repository (or just create a new directory in an exisiting repository, that'd be easier).
Then unpack the first tar.gz into that directory, "svn add" everything and commit.
Next step is to unpack the next tar.gz into the same directory, do an "svn status" and "svn add" all files that have a question mark as status. Commit. Repeat.
This can even be scripted quite easily.
This procedure does have one problem, though: you won't catch when a file got deleted. You could do a script that compares the (sorted ?) file listings of the tar.gz that lists all files that were present in 1.tar.gz but are missing in 2.tar.gz and thus have to be deleted.
SubVersion is propably the better choice here anyways simply because if you're talking about firmware then I assume there are some binary files. And SVN handles binary files way more efficient than CVS. Plus SVN versions the repository as a whole, not just single files. So with the method described above each revision in the SubVersion repository would map to the exact content of tar.gz that you've used to create that revision.