Ask Slashdot: Taming a Wild, One-Man Codebase?
New submitter tavi.g writes "Working for an ISP, along with my main job (networking) I get to create some useful code (Bash and Python) that's running on various internal machines. Among them: glue scripts, Cisco interaction / automatization tools, backup tools, alerting tools, IP-to-Serial OOB stuff, even a couple of web applications (LAMPython and CherryPy). Code has piled up — maybe over 20,000 lines — and I need a way to reliably work on it and deploy it. So far I used headers at the beginning of the scripts, but now I'm migrating the code over to Bazaar with TracBzr, because it seems best for my situation. My question for the Slashdot community is: in the case of single developer (for now), multiple machines, and a small-ish user base, what would be your suggestions for code versioning and deployment, considering that there are no real test environments and most code just goes into production ? This is relevant because lacking a test environment, I got used to immediate feedback from the scripts, since they were in production, and now a versioning system would mean going through proper deployment/rollback in order to get real feedback."
rectify the testbed lack.
'cos there's nothing more likely to cause immediate termination of your employment than a bit of rogue code taking down the bread of the business.
Test it first.
Operation Guillotine is in effect.
I don't understand how code versioning has to be coupled with deployment? You have no test environment, as you said... so just make releases and deploy them manually. Since you are going straight to production, you had better be there in person to roll it back if you screwed up. Right? So, SVN should be all you need...
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
Given the situation you describe, it won't be long before the whole system falls into corruption. Your only hope is to save two lines from every script on a USB stick, then flood the rest.
Everything is better with chainsaws.
My question for the Slashdot community is: in the case of single developer (for now), multiple machines, and a small-ish user base, what would be your suggestions for code versioning and deployment, considering that there are no real test environments and most code just goes into production ?
The simple answer is, "Whatever works best for you." You're the only developer for these projects. Unless your manager is giving you direction on a specific process or requirements, it's your ball game. You know how you work best -- pick your tools accordingly.
#fuckbeta #iamslashdot #dicemustdie
1. Buy or get a machine to host SVN for version control. I work on my wife's company website and some basic management tools. SVN has saved my bacon on multiple times where I thought I had lost some code.
2. Get a pre-production server and test your code! Sounds like you're living in the wild west and that shit flies until something goes horribly wrong and you're the guy who gets blamed.
Hold up, wait a minute, let me put some pimpin in it
You can still change everything in place. Then you can run the script and get feedback. When it works, you commit. When it doesn't, you remove the problem, check and commit.
Or you can make your changes, review them and commit them, then do a run. When you have a problem, you commit again.
It is not because you use a versioning system that you need extra formality. You can still work the way you used to, but now you have an extra safety measure due to the versioning system.
Using trac is a way to better organise your problems. The main thing I can say about using trac effectively is that you always need to have a browser window open on it, and when you have an idea, or notice something, or have problem, then enter it immediately. Afterwards, take your time to look at new and open problems, classify them and process them.
You say that "now a versioning system would mean going through proper deployment/rollback in order to get real feedback."
But then, no, it wouldn't.
Storing your code on a versioning system doesn't mean but that: that you store your code in a versioning system, nothing more, nothing else.
I'm starting to be an old fart so you can believe me when I tell I've already been in your position.
Back then I used CVS and it didn't change my deployment procedures in the slightest -only that I had all those scripts in a single convenient place and I could look in past history when I found a regression or I wanted to look for the way I did something in the past.
The most naive approach is you just got working just the way you are doing now, only that when you are confident on a script/set of scripts you check them in for posterity. You mainly develop in your own desktop and you push your scripts to the servers with an rsync-based script. A bit over this, you use a CM tool (say, puppet) so instead of pushing to the servers you push to the puppetmaster and then run a `puppet agent --test` on the servers: that way configuration becomes code and therefore, repeatibility.
It allows for almost a novel but the basic idea is just the same: SCM is SCM is SCM; nothing more, nothing less.
Quick! Rename all the files f1, f2, f3 etc, rename all the variables i1, i2, i3, etc and remove all whitespace.
Keep a translation sheet on you at all times. Suddenly, you're irreplaceable.
(:-) for the humor impaired. This is actually a riff on a joke from WKRP, when an engineer said he was replacing all the color-coded wiring with black wires for job security. (B.t.w. the engineer was played by one of the writers of the show)
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Most of you whom have seen this may have read it in the Jargon File. It's relevant. The short answer is "you don't":
The Story of Mel, a Real Programmer
This was posted to USENET by its author, Ed Nather (utastro!nather), on May 21, 1983.
A recent article devoted to the *macho* side of programming made the bald and unvarnished statement:
Real Programmers write in FORTRAN.
Maybe they do now,
in this decadent era of
Lite beer, hand calculators, and "user-friendly" software
but back in the Good Old Days,
when the term "software" sounded funny
and Real Computers were made out of drums and vacuum tubes,
Real Programmers wrote in machine code.
Not FORTRAN. Not RATFOR. Not, even, assembly language.
Machine Code.
Raw, unadorned, inscrutable hexadecimal numbers.
Directly.
Lest a whole new generation of programmers
grow up in ignorance of this glorious past,
I feel duty-bound to describe,
as best I can through the generation gap,
how a Real Programmer wrote code.
I'll call him Mel,
because that was his name.
I first met Mel when I went to work for Royal McBee Computer Corp.,
a now-defunct subsidiary of the typewriter company.
The firm manufactured the LGP-30,
a small, cheap (by the standards of the day)
drum-memory computer,
and had just started to manufacture
the RPC-4000, a much-improved,
bigger, better, faster --- drum-memory computer.
Cores cost too much,
and weren't here to stay, anyway.
(That's why you haven't heard of the company,
or the computer.)
I had been hired to write a FORTRAN compiler
for this new marvel and Mel was my guide to its wonders.
Mel didn't approve of compilers.
"If a program can't rewrite its own code",
he asked, "what good is it?"
Mel had written,
in hexadecimal,
the most popular computer program the company owned.
It ran on the LGP-30
and played blackjack with potential customers
at computer shows.
Its effect was always dramatic.
The LGP-30 booth was packed at every show,
and the IBM salesmen stood around
talking to each other.
Whether or not this actually sold computers
was a question we never discussed.
Mel's job was to re-write
Before it gets out of hand, I'd look to set up four things.
1. Set up a proper split environment. Even if you don't have the hardware for it, set it up in such a way that when the hardware becomes available, you can move it appropriately. That being, a standard dev -> qa -> stress -> prod infrastructure.
2. Set up a good revision control. I've started to really enjoy using GIT for this, as there's other software like gitolite that can give you fine-grained access control to your repositories. However, feel free to use subversion or any other well contained revision control platform.
3. Set up a good method for deployment. My suggestion? Try puppet. It's free, and it's powerful, and if you get it configured, adding new systems to it is exceedingly easy to do.
4. Packaging for your deployment. If you are installing a bunch of software (scripts, job control, etc) package it and give it a revision, then it's easy to upgrade systems with the 'new package', or revert it to the 'previous package' instead of having to manually copy around files or (re)editing them.
Hope that helps.
Yea that's interesting actually, I just ran into this myself. We're putting a project together and when something breaks I end up doing small fixes and losing the changes across deployments (we only have 3 active) so its very small. But I feel your pain, I'm not totally convinced that a full SVN system is necessary but once you break down the problems it likely is. Given your closed infrastructure you may want to consider adding some phone home features to your scripts, something intelligent enough to auto update smoothly in an automated way or manually. Make things easy for yourself so they're not difficult to work with and you will be encouraging yourself (and others) to use it.
The absolute best advice I can give is keep it simple, there are a million different ways to do it, try not to do a massive migration of everything all at once or you may find out later that some minute bug is hindering everything you do.
Lastly plan what you want it to look like and how, it will save you weeks of work.
Good leaders run toward problems, bad leaders hide from them.
Yes, set up a test environment. And implement some kind of versioning system, even if it's just "cp current_code old_code". You should always be able to fall back if you have a botched deployment.
But one of the best things you can do is to start writing documentation. I like to write my documentation assuming it will be my replacement reading it, and so I try to include everything. Justify every unusual implementation detail, explain why each task was down the way it was. List bugs, and any code you had to write to work around it. The best part of documenting your project will be that as you work through it, you'll find things that no longer make sense and make them better.
git
Yes!!! Create git repos of all those various parts on some central git server. Create backups of those repos periodically, like a sane person...
Git really doesn't require a ton of understanding to "just start using git" competently. It's not going to trash whatever you have in place; it's mathematically proven to *not* lose data.
Also, freaking set up a dev server already! (That's like 2 machines, or a private, 3rd party git repo (bitbucket is what I use) and a dedicated test/dev machine).
PS: I don't reply to ACs.
A great deal of the version wrangling you are facing is best done with a tool like Git.
The bigger problem (development discipline) is much harder to fix.
You want something to track changes, deploy changes, and test software. Bazaar will track your changes.
Chef is open source infrastructure management. The central server maintains a searchable database of your nodes and all of the scripts (recipes) that run on them. The nodes query this database and run the scripts that they are supposed to. This is similar to your environment now. You can also check your chef-repo into scm. This allows you to mess around with production and only commit back into scm when you are fairly certain that it works.
Jenkins has a similar setup but each node is ostensibly there to build and test software although we have used it for deployment and integration testing.
Chef & Jenkins can definitely help in deploying code and maintaining your infrastructure but you will need to take responsibility for testing your code somewhere along the process whether it be with on-commit with Jenkins or on deploy with unit or other tests. I definitely feel the value after investing time to learn these powerful tools.
I keep it in a Mercurial repository and use symlinks into the repository to deploy it. I also make free use of Mercurial's subrepo feature for tools that others wrote that are not yet found as packages on the Linux distributions I use.
Yes, there is still a testing issue. For most of this code it's not a big deal because I'm the only user. I test it as I write it with a few simple hand tests and then it's good to go.
If I were doing this for something where the code mattered to other people I would just add unit tests for various subsections as made sense. I would also start sectioning off the tools and making them into separate repositories of their own. I'd also make much sparer use of the sub-repo feature and instead have deployment scripts that handled making sure the correct version was in place.
You still need test environments though for integration testing. And as the code grows, ad-hoc test environments stop being very practical. You should dedicate a VM or two (or even a machine or two) to replicating miniature versions of the real-world setups the code is expected to work in.
Lastly, it's never too early to start using source control on your code. 98% of my code is under source control, even most stuff I think is 'throwaway' or ad-hoc.
I would also strongly recommend Mercurial (or git (if you must)) over Bazaar. It's faster, and the mental model those two tools encourage is a much more accurate representation of what they're really doing. Bazaar lets you pretend that branching is still a big deal and takes some effort to resolve. It lets you continue to think in the model of centralized source systems even though it's not. You will be doing yourself a huge favor in productivity (yes, even for a single developer) to not use it and go for something that doesn't let you pretend anymore. Of those tools, I think Mercurial has a far more carefully thought out and better set of commands and options than git does.
Need a Python, C++, Unix, Linux develop
Forget that you're a lone programmer. Set up a proper environment anyway.
This is going to seem like hard work, but once you've done the upfront effort, it will pay dividends.
Do *everything* that you'd do if you were a team. There are plenty of books / web sites on the subject.
Pick a version control system -- since you're starting from scratch, Git or Mercurial. Get your code into it.
Pick a continuous build system -- Jenkins is popular and free.
Write one unit test, and make Jenkins run it as part of the build process.
Decide on some sort of repository for your build artefacts.
Establish an integration testing box, and have your CI system deploy to that every build. Ideally use something like Puppet for this, and also use Puppet on your production machines.
Write one integration test, and make Jenkins run it after deployment.
You can dedicate a server to all of this, several servers, run it all on your laptop or in VMs; it really doesn't matter. But think ahead so that you can move it to dedicated machines later if you need to.
Lots of work, but now you have a nice, confidence inspiring build / code management system.
Once that's going, you can decide how to fix your lack of tests. One approach is to take a few weeks just writing tests. Another is to write tests as the need arises -- for new code as you write it; to demonstrate bugs before you fix them. Or somewhere in between.
Python isn't my area, but there is probably an ecosystem of pythonesque tools for a lot of this stuff. pyUnit, code coverage tools, etc.
You will have problems unit testing, since you won't have designed the code for testability. The choice is, live with fewer tests than might otherwise be possible, or refactor your design into something more unit testable. (IOC is unit testing's best friend)
Just get one of the inexpensive commercial subs for GitHub. This solves all sorts of issues. Remote backup, robust version system, issue tracking etc.
You need to fire this cowboy. He doesn't think he needs to test his scripts.
I know he seems irreplaceable. That should be a big red flag.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Proclaim yourself the most interesting coder thinkgeek style.
I don't often test my code, but when I do, I do it in production.
If I'm the people who run the company, I start firing people. If I'm the developer, I run like hell before anybody realizes what a complete mess I've made.
No versioning, no test environment, live changes in production ... these are warning signs of something which has been cobbled together, and which continues working by sheer dumb luck.
I had a developer once who edited a live production environment without telling anybody and broke it even worse -- he very quickly found himself with no access to the machines and being told that we no longer trusted him with a production environment.
Having worked in highly regulated industries where the stakes are really high, I've had it drilled into me that you simply have no room whatsoever to be doing this kind of thing that ad hoc.
Glad you're starting to use something. But the risk to your employer of all of your stuff tanking and becoming something you can't recover is just too great. From the sounds of it, if you get abducted by aliens or hit by a bus, your company would come to a screeching halt.
Lost at C:>. Found at C.
[...]the test environment would have to be a clone of the production environments. Good luck with that with the described environment![...]
There is stuff like Puppet (for declaratively deploying "services") and Vagrant to provision Virtualbox guests.
Downsides:
Because I never, ever want to rely on anything you build this way. You are headed for a disaster, unless you 1) set up a test environment, and 2) use a revision control system.
Really, anything less than that is just a complete waste of everyone's time.
... it's mathematically proven to *not* lose data.
I love git and use it on a daily basis, but you can't mathematically prove that it won't lose data. It is written by humans, and I have encountered bugs in it. You also still have to deal with manual merges, which are error prone. I've also had my local repo get in weird states that are very difficult to get out of. When this happens, I always copy out all my changes because I'm afraid of losing anything.
"how would a substantial fraction data representative of real data be created if the real data contains people's shipping addresses or other PII?"
Do you really have to ask? You either clutter the fields or clutter their relationships:
Exhibit A:
* John Doe | Lexington Av.
* Betty Lamarr | Main St.
becomes
* John Doe | Main St.
* Betty Lamarr | Lexington Av.
Exhibit B:
* John Doe | Lexington Av.
becomes
* Nhjo Ode | Aevtginon Lx.
This is real. The solution is to manage expectations. If people know that the tests just show functionality and not scalability, and that scalability testing is required (when warranted), you should be good. More importantly if the decision makers know this, you are good.
Scrub the data. Addresses are not personal information though. The fact a specific person lives there might. Open a phone book (if you can find one now-a-days. They have reams of addresses as well as phone numbers tied to real people. This is public knowledge. Personal information involves things more like name, age, finances, medical records, etc.
For the stuff that is real personal information, randomizing names to create fake people tied to real addresses is not hard at all (real addresses are often necessary when system tie into others where shipping or location are requirements). You can take real information and put it in a can and scrambled to make fake people. I think testers should be proficient enough to be able generate this kind of data.
As to one other comment made by the OP:
Versioning systems do no such thing if you don't use them that way. If you want a "proper deployment and rollback cycle" you can do that. Or not. But at least you'll be able to go back in time to find the code that actually worked if you need to. No coder should work without the safety net of version control. Whether it be CVS, SVN, GIT, it matters less what it is than whether you have one or not. Pick one and use it.
-- I ignore anonymous replies to my comments and postings.
>
One guy of the caliber of a Stallman or a Thovalds will probably do much better than a team of Visual Source Safe users, even if that guy has no source control system.
Linus Torvalds, author of Git?
Richard Stallman, author of GNU diff; without which many revision control systems wouldn't work?