Ask Slashdot: Taming a Wild, One-Man Codebase?
New submitter tavi.g writes "Working for an ISP, along with my main job (networking) I get to create some useful code (Bash and Python) that's running on various internal machines. Among them: glue scripts, Cisco interaction / automatization tools, backup tools, alerting tools, IP-to-Serial OOB stuff, even a couple of web applications (LAMPython and CherryPy). Code has piled up — maybe over 20,000 lines — and I need a way to reliably work on it and deploy it. So far I used headers at the beginning of the scripts, but now I'm migrating the code over to Bazaar with TracBzr, because it seems best for my situation. My question for the Slashdot community is: in the case of single developer (for now), multiple machines, and a small-ish user base, what would be your suggestions for code versioning and deployment, considering that there are no real test environments and most code just goes into production ? This is relevant because lacking a test environment, I got used to immediate feedback from the scripts, since they were in production, and now a versioning system would mean going through proper deployment/rollback in order to get real feedback."
rectify the testbed lack.
'cos there's nothing more likely to cause immediate termination of your employment than a bit of rogue code taking down the bread of the business.
Test it first.
Operation Guillotine is in effect.
git
I don't understand how code versioning has to be coupled with deployment? You have no test environment, as you said... so just make releases and deploy them manually. Since you are going straight to production, you had better be there in person to roll it back if you screwed up. Right? So, SVN should be all you need...
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
Given the situation you describe, it won't be long before the whole system falls into corruption. Your only hope is to save two lines from every script on a USB stick, then flood the rest.
Everything is better with chainsaws.
My question for the Slashdot community is: in the case of single developer (for now), multiple machines, and a small-ish user base, what would be your suggestions for code versioning and deployment, considering that there are no real test environments and most code just goes into production ?
The simple answer is, "Whatever works best for you." You're the only developer for these projects. Unless your manager is giving you direction on a specific process or requirements, it's your ball game. You know how you work best -- pick your tools accordingly.
#fuckbeta #iamslashdot #dicemustdie
1. Buy or get a machine to host SVN for version control. I work on my wife's company website and some basic management tools. SVN has saved my bacon on multiple times where I thought I had lost some code.
2. Get a pre-production server and test your code! Sounds like you're living in the wild west and that shit flies until something goes horribly wrong and you're the guy who gets blamed.
Hold up, wait a minute, let me put some pimpin in it
You can still change everything in place. Then you can run the script and get feedback. When it works, you commit. When it doesn't, you remove the problem, check and commit.
Or you can make your changes, review them and commit them, then do a run. When you have a problem, you commit again.
It is not because you use a versioning system that you need extra formality. You can still work the way you used to, but now you have an extra safety measure due to the versioning system.
Using trac is a way to better organise your problems. The main thing I can say about using trac effectively is that you always need to have a browser window open on it, and when you have an idea, or notice something, or have problem, then enter it immediately. Afterwards, take your time to look at new and open problems, classify them and process them.
You say that "now a versioning system would mean going through proper deployment/rollback in order to get real feedback."
But then, no, it wouldn't.
Storing your code on a versioning system doesn't mean but that: that you store your code in a versioning system, nothing more, nothing else.
I'm starting to be an old fart so you can believe me when I tell I've already been in your position.
Back then I used CVS and it didn't change my deployment procedures in the slightest -only that I had all those scripts in a single convenient place and I could look in past history when I found a regression or I wanted to look for the way I did something in the past.
The most naive approach is you just got working just the way you are doing now, only that when you are confident on a script/set of scripts you check them in for posterity. You mainly develop in your own desktop and you push your scripts to the servers with an rsync-based script. A bit over this, you use a CM tool (say, puppet) so instead of pushing to the servers you push to the puppetmaster and then run a `puppet agent --test` on the servers: that way configuration becomes code and therefore, repeatibility.
It allows for almost a novel but the basic idea is just the same: SCM is SCM is SCM; nothing more, nothing less.
You are going to put yourself through a heap of misery changing over to something you have to learn all over again. Best stay where you are with Python.
Check it into subversion. You can get your build/packaging tools to embed the svn revision into the artifact.
For all the git lovers out there - r564 is so much easier for a human deal with than a large hex string, and most git advantages don't really apply a single developer.
Quick! Rename all the files f1, f2, f3 etc, rename all the variables i1, i2, i3, etc and remove all whitespace.
Keep a translation sheet on you at all times. Suddenly, you're irreplaceable.
(:-) for the humor impaired. This is actually a riff on a joke from WKRP, when an engineer said he was replacing all the color-coded wiring with black wires for job security. (B.t.w. the engineer was played by one of the writers of the show)
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Most of you whom have seen this may have read it in the Jargon File. It's relevant. The short answer is "you don't":
The Story of Mel, a Real Programmer
This was posted to USENET by its author, Ed Nather (utastro!nather), on May 21, 1983.
A recent article devoted to the *macho* side of programming made the bald and unvarnished statement:
Real Programmers write in FORTRAN.
Maybe they do now,
in this decadent era of
Lite beer, hand calculators, and "user-friendly" software
but back in the Good Old Days,
when the term "software" sounded funny
and Real Computers were made out of drums and vacuum tubes,
Real Programmers wrote in machine code.
Not FORTRAN. Not RATFOR. Not, even, assembly language.
Machine Code.
Raw, unadorned, inscrutable hexadecimal numbers.
Directly.
Lest a whole new generation of programmers
grow up in ignorance of this glorious past,
I feel duty-bound to describe,
as best I can through the generation gap,
how a Real Programmer wrote code.
I'll call him Mel,
because that was his name.
I first met Mel when I went to work for Royal McBee Computer Corp.,
a now-defunct subsidiary of the typewriter company.
The firm manufactured the LGP-30,
a small, cheap (by the standards of the day)
drum-memory computer,
and had just started to manufacture
the RPC-4000, a much-improved,
bigger, better, faster --- drum-memory computer.
Cores cost too much,
and weren't here to stay, anyway.
(That's why you haven't heard of the company,
or the computer.)
I had been hired to write a FORTRAN compiler
for this new marvel and Mel was my guide to its wonders.
Mel didn't approve of compilers.
"If a program can't rewrite its own code",
he asked, "what good is it?"
Mel had written,
in hexadecimal,
the most popular computer program the company owned.
It ran on the LGP-30
and played blackjack with potential customers
at computer shows.
Its effect was always dramatic.
The LGP-30 booth was packed at every show,
and the IBM salesmen stood around
talking to each other.
Whether or not this actually sold computers
was a question we never discussed.
Mel's job was to re-write
Create a git repository on 'production' and then a fork on your development machine. (Or a fork on a test machine would be better really, which you then fork to development)
Do your development, checkin and then pull to test, execute there, if all goes well, pull to prod and execute there.
Before it gets out of hand, I'd look to set up four things.
1. Set up a proper split environment. Even if you don't have the hardware for it, set it up in such a way that when the hardware becomes available, you can move it appropriately. That being, a standard dev -> qa -> stress -> prod infrastructure.
2. Set up a good revision control. I've started to really enjoy using GIT for this, as there's other software like gitolite that can give you fine-grained access control to your repositories. However, feel free to use subversion or any other well contained revision control platform.
3. Set up a good method for deployment. My suggestion? Try puppet. It's free, and it's powerful, and if you get it configured, adding new systems to it is exceedingly easy to do.
4. Packaging for your deployment. If you are installing a bunch of software (scripts, job control, etc) package it and give it a revision, then it's easy to upgrade systems with the 'new package', or revert it to the 'previous package' instead of having to manually copy around files or (re)editing them.
Hope that helps.
Yea that's interesting actually, I just ran into this myself. We're putting a project together and when something breaks I end up doing small fixes and losing the changes across deployments (we only have 3 active) so its very small. But I feel your pain, I'm not totally convinced that a full SVN system is necessary but once you break down the problems it likely is. Given your closed infrastructure you may want to consider adding some phone home features to your scripts, something intelligent enough to auto update smoothly in an automated way or manually. Make things easy for yourself so they're not difficult to work with and you will be encouraging yourself (and others) to use it.
The absolute best advice I can give is keep it simple, there are a million different ways to do it, try not to do a massive migration of everything all at once or you may find out later that some minute bug is hindering everything you do.
Lastly plan what you want it to look like and how, it will save you weeks of work.
Good leaders run toward problems, bad leaders hide from them.
Use Git for source control.
Use Vagrant to create virtualized testing environments (via headless Virtualbox) that you can ssh into, develop in, and test... all running directly within your laptop. Use Puppet or Chef to create recipes for all of your servers, and you can virtualize all of them in a "pretend" network of virtual machines. All of that can be checked into Git too.
Store your central Git repositories in Github or some other reliable place (you can stick them on one of your own servers too). Code in your virtual machine, commit, and push up to the central Git repo. Then pull it down to your live servers to automatically update them.
You can use Puppet in server/client mode to automate the deployment of server configuration changes out to your live machines also.
And if you want to get REALLY fancy, just throw a third set of machines in there and use that as your "staging" environment, where changes go after your virtualized environment, but before your live environment (mimicking the live environment as closely as possible).
I know there are plenty of OpenSource tools out there, but I still prefer perforce. Also, recently (as of February) Perforce opened up its 2-user license to 20 users/20 workspaces! This is fantastic news!
Check in your mainline (or migrate) to perforce under /depot/mainline /depot/testing/VERSION, and check that in. /depot/testing/VERSION to a non existent branch /depot/release/VERSION, and check that in.
Integrate to a non existent branch
Integrate
Now with P4V, moving changesets from mainline to testing is as simple as drag and drop. Then move changesets from mainline to, then testing (and the changes found in testing) to release, and drag those back to mainline. (Dragging is in 'integrate' step) You now have come full circle and you have 2 places where you can make changes and have a release snapshot.
Now, get VirtualBox because it supports snapshotting. Set up perforce on that and take a snapshot. Then sync from perforce, run your tests, and deploy as needed. Then revert the snapshot to after you installed perforce.
Then you can make packaging/deployment scripts that only work on release branches.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Yes, set up a test environment. And implement some kind of versioning system, even if it's just "cp current_code old_code". You should always be able to fall back if you have a botched deployment.
But one of the best things you can do is to start writing documentation. I like to write my documentation assuming it will be my replacement reading it, and so I try to include everything. Justify every unusual implementation detail, explain why each task was down the way it was. List bugs, and any code you had to write to work around it. The best part of documenting your project will be that as you work through it, you'll find things that no longer make sense and make them better.
If you work on a single server install RCS. You only need to
Learn ci & co to start.
If you work on many boxes you need a network friendly tool.
The obvious ones are git and mecurial (CVS too).
Simple cp works too.
More important may be version tags and date time hints in the scripts.
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
A great deal of the version wrangling you are facing is best done with a tool like Git.
The bigger problem (development discipline) is much harder to fix.
You want something to track changes, deploy changes, and test software. Bazaar will track your changes.
Chef is open source infrastructure management. The central server maintains a searchable database of your nodes and all of the scripts (recipes) that run on them. The nodes query this database and run the scripts that they are supposed to. This is similar to your environment now. You can also check your chef-repo into scm. This allows you to mess around with production and only commit back into scm when you are fairly certain that it works.
Jenkins has a similar setup but each node is ostensibly there to build and test software although we have used it for deployment and integration testing.
Chef & Jenkins can definitely help in deploying code and maintaining your infrastructure but you will need to take responsibility for testing your code somewhere along the process whether it be with on-commit with Jenkins or on deploy with unit or other tests. I definitely feel the value after investing time to learn these powerful tools.
I keep it in a Mercurial repository and use symlinks into the repository to deploy it. I also make free use of Mercurial's subrepo feature for tools that others wrote that are not yet found as packages on the Linux distributions I use.
Yes, there is still a testing issue. For most of this code it's not a big deal because I'm the only user. I test it as I write it with a few simple hand tests and then it's good to go.
If I were doing this for something where the code mattered to other people I would just add unit tests for various subsections as made sense. I would also start sectioning off the tools and making them into separate repositories of their own. I'd also make much sparer use of the sub-repo feature and instead have deployment scripts that handled making sure the correct version was in place.
You still need test environments though for integration testing. And as the code grows, ad-hoc test environments stop being very practical. You should dedicate a VM or two (or even a machine or two) to replicating miniature versions of the real-world setups the code is expected to work in.
Lastly, it's never too early to start using source control on your code. 98% of my code is under source control, even most stuff I think is 'throwaway' or ad-hoc.
I would also strongly recommend Mercurial (or git (if you must)) over Bazaar. It's faster, and the mental model those two tools encourage is a much more accurate representation of what they're really doing. Bazaar lets you pretend that branching is still a big deal and takes some effort to resolve. It lets you continue to think in the model of centralized source systems even though it's not. You will be doing yourself a huge favor in productivity (yes, even for a single developer) to not use it and go for something that doesn't let you pretend anymore. Of those tools, I think Mercurial has a far more carefully thought out and better set of commands and options than git does.
Need a Python, C++, Unix, Linux develop
and now a versioning system would mean going through proper deployment/rollback in order to get real feedback.
not true. using a versioning system does not necessitate testing. just to be clear, testing is always necessary, and not enforced by any versioning system. you can use svn or git or cvs to keep versions of your files so when you do your testing on the production environment (shame on you) you won't have a stack of the same files with extensions like .bak, .bak.bak, .old, .delete, .undo, etc. sitting on your server.
test because it's the right thing, the proper thing, to do. not because you think some tech you choose to use is forcing you. you should be forcing you.
insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
Keep the files per project in whatever production directory you want and start a Git repository in it. Version numbers are irrelevant and only a nuance, you have every version of every file with any (commit) comment you want now! Then add scripted backup (such as FTP) to a central location of course to recover from disasters if your production files get damaged.
Add version number if you start rolling out to multiple sites.
It's possible to exchange files between git repositories, or merge back changes made in another production system.
Hivemind harvest in progress..
Forget that you're a lone programmer. Set up a proper environment anyway.
This is going to seem like hard work, but once you've done the upfront effort, it will pay dividends.
Do *everything* that you'd do if you were a team. There are plenty of books / web sites on the subject.
Pick a version control system -- since you're starting from scratch, Git or Mercurial. Get your code into it.
Pick a continuous build system -- Jenkins is popular and free.
Write one unit test, and make Jenkins run it as part of the build process.
Decide on some sort of repository for your build artefacts.
Establish an integration testing box, and have your CI system deploy to that every build. Ideally use something like Puppet for this, and also use Puppet on your production machines.
Write one integration test, and make Jenkins run it after deployment.
You can dedicate a server to all of this, several servers, run it all on your laptop or in VMs; it really doesn't matter. But think ahead so that you can move it to dedicated machines later if you need to.
Lots of work, but now you have a nice, confidence inspiring build / code management system.
Once that's going, you can decide how to fix your lack of tests. One approach is to take a few weeks just writing tests. Another is to write tests as the need arises -- for new code as you write it; to demonstrate bugs before you fix them. Or somewhere in between.
Python isn't my area, but there is probably an ecosystem of pythonesque tools for a lot of this stuff. pyUnit, code coverage tools, etc.
You will have problems unit testing, since you won't have designed the code for testability. The choice is, live with fewer tests than might otherwise be possible, or refactor your design into something more unit testable. (IOC is unit testing's best friend)
Just get one of the inexpensive commercial subs for GitHub. This solves all sorts of issues. Remote backup, robust version system, issue tracking etc.
I almost posted this exact question about 9 months ago. I ended up using git + GitHub for version control. There are enough comments posted already about version control so all I'll say is that even working by myself, using git for real branching is why I still have a job.
Python's unittest package is really great, at least for the small (10k lines of code) project I'm working on. Using no third-party code at all, you can set up a testing package for yourself so all you have to do is type "python test" before you commit to your repo (make a Python package called "test" and put a "__main__.py" module in it that calls "unittest.main()"). Python is great for glue code so you can write Python unit tests for your Bash code as well as your Python code, and make it all automatic using Python's unittest test discovery (start your file names with "test_*.py").
Perhaps unit tests aren't as good as having a whole box dedicated to testing the entire environment, but it's *dead simple* to maintain, and its simplicity will encourage you to test before every commit (or at least before you merge your topic branches back into master).
Take it out back and shoot it. If it's rabid, there is no cure.
I was promised a flying car. Where is my flying car?
What's with the Bazaar hate people? The summary says he's already migrating to bazaar so there's no reason to say 'switch to my XXX version control system because I use it thus it's better for you too'. All the free distributed version control systems (git, mecurial, bazaar, etc..) have the same feature sets (with slightly different names) and none of them have to be used in a distributed fashion. Like RCS, CVS, SVN, and everything else you'll only be using two main commands: check in and check out. The other main commands you'll use are tag, revert, and diff. All the control systems are equally easy to setup and maintain.
Both RCS and CVS don't track changes across multiple files (feature called atomic commits). They should be instantly dropped from consideration for that alone.
I would recommend Bazaar for a beginner as it has excellent step by step tutorials for many use cases and has good GUI tools (which reduce the learning curve until one's ready for custom command line scripts). SVN is fine, but a distributed control system would provide more flexible as you add more people.
Personally, I try to stay away from git and mecurial because of their communities. They tend to have a lot of people saying 'Git/Mecurial is the best versioning control system ever because it can do and the can't.' They never even try to explain the how their distributed system is better than any other distributed system. Those other ones simple don't exist in their warped fanboyism world. The less popular products tend to have a truer view of the world (and thus in my view are better overall).
Ignore any claims of speed. 20,000 lines of code isn't a large project. It might be large for you personally, but in the field of software development it is a largish small project. Large projects have millions of lines. While you're under a million lines and don't have a lot of binary files under version control (the commercial systems handle binary files much better than all the OSS version control software), ignore speed. The minor differences won't add up to the time you spend figuring out which ones are faster under your work conditions.
One last point, you don't need a dedicated computer to host your source code. You're not using 486s are you?
Use Jenkins for deployment. You can automate the entire process. For example, imagine automatically deploying after checking in a revision that contains the word "***DEPLOY***" in the commit comment.
You need to fire this cowboy. He doesn't think he needs to test his scripts.
I know he seems irreplaceable. That should be a big red flag.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Proclaim yourself the most interesting coder thinkgeek style.
I don't often test my code, but when I do, I do it in production.
At first, please try to test your code. At least if you can't formally proof it's right testing is the only way to get the most bugs out of the code. Working test first can improve code quality substancially in function and in form, as you can refactor safely with tests in place. Try writing some mock-ups for things outside your own code.
I would choose some distributed versioning system. Not so much because it's distributed, but the most known ones (git, mercurial, bazar) behave way better than svn. The merging algorithms are better and checking out/in on svn with many small files is really, really slow, as it transfers one file after another. Bonus point: working on a local "copy" is fast as no network is going to slow you down.
you are having too much fun dabbling and playing
If I'm the people who run the company, I start firing people. If I'm the developer, I run like hell before anybody realizes what a complete mess I've made.
No versioning, no test environment, live changes in production ... these are warning signs of something which has been cobbled together, and which continues working by sheer dumb luck.
I had a developer once who edited a live production environment without telling anybody and broke it even worse -- he very quickly found himself with no access to the machines and being told that we no longer trusted him with a production environment.
Having worked in highly regulated industries where the stakes are really high, I've had it drilled into me that you simply have no room whatsoever to be doing this kind of thing that ad hoc.
Glad you're starting to use something. But the risk to your employer of all of your stuff tanking and becoming something you can't recover is just too great. From the sounds of it, if you get abducted by aliens or hit by a bus, your company would come to a screeching halt.
Lost at C:>. Found at C.
In case you use debian, you can setup your own repository with reprepro. It's really easy to deploy stuff using apt.
[...]the test environment would have to be a clone of the production environments. Good luck with that with the described environment![...]
There is stuff like Puppet (for declaratively deploying "services") and Vagrant to provision Virtualbox guests.
Downsides:
I personally keep all my admin config files, scripts, etc under RCS control. I want per file granularity of comments describing changes. git, mercurial & company solve a different problem than what you have doing system administration.
I'm working on a project right now that will probably have RCS for tracking individual files and mercurial for tracking the project at release level. I use RCS a lot during development. Write some code, test it, check it in, write some more, test it, check it in. Saves time if I find something doesn't work as expected. On one code I currently have 51 versions saved w/ comments describing the changes and reasons for making them. It scientific research code so lots of experiments in different methods of doing things.
I commit everything to an SVN, then use jenkins to manage updates. Once you create the jenkins job all you have to do in the future is run it. and you can string jobs together to that if the change needs to be pushed to a number of servers it is still one click.
My personal programming hero, D. Richard Hipp, works with a very small team on SQLite (which you may have heard of). He uses his own, home-grown SCM called fossil. It probably doesn't scale to a zillion contributors but, like all of Hipp's work that I'm aware of, it's super clean and easy to use. Sounds pretty great for your use case.
And, as other people on this thread have already said: your habit of throwing stuff into production without testing it is similar to playing Russian Roulette with your company. Stop that. Stop that right now.
Because I never, ever want to rely on anything you build this way. You are headed for a disaster, unless you 1) set up a test environment, and 2) use a revision control system.
Really, anything less than that is just a complete waste of everyone's time.
You won't need source control until you start using it and then you'll wonder how you ever lived without it. Then you'll start making wild changes because, hey, you have source control now so you can always roll back. This quickly leads to needing a testing environment.
dude don't worry; within a few weeks you'll be able to deal with all the spaghetti nonsense, since you'll eventually just learn the crazy, non-standard, and ridiculous code anyway...
and hey, once you're fired or quit, it won't be your problem anyway. So, short answer, there's NEVER a good time/reason to do tests unless required by regulations.
Bitbucket supports both Git and Mercurial and has free accounts for unlimited private repositories. In addition to version control you get issue tracking, wikis etc.
with mercurial you can also uses the -r564 syntax, and as with git you don't need a repository sever to run anywhere (whether it's on the same machine or a distant server). So if your only objection to git is "I don't want large hex string", then use mercurial, not subversion. really.
as far as "most git advantages don't really apply a single developer" is concerned, once you get used to high speed versionning, full local history, rollback, topic branching and merging, trust me you never want to got back to svn, even when you're developing alone. never.
Check in all major changes individually. This allows for nice rollback and for later analysis if something goes wrong. Also, if you fuck up something while editing you can always roll back. SVN is quite easy after some usage experience and it can be scripted. Don't use the graphical B.S. tools for SVN. Use proper commit comments.
Also, Check in all important documents (e.g. important configurations, network topology plans, setup procedures etc). Protect the repository well, as it will probably contain passwords (I know that is not optimal, but the God Of the Mighty Dollar dictates that).
..he should continue that risky practice. The bozo will only ask him "why things have slowed down", if he does some proper testing. Apparently, it is not (yet) necessary.
There are situations where you can do this - especially if you have excellent people who can quickly patch the problems up. And if you don't operate these commercialware clusterfucks from HP to manage your network.
This is probably a small telecoms operation. Not a billion-dollar bank. I once heard that you need a VP to sign off the smallest change on an Exxon mainframe. That is the opposite end of insanity.
Nobody in the commercial world ever proves code correct. They only test it to various degrees. Same with hardware. And SVN will be the perfect match for this guy. After all, a one-man team will do all the checkins and merge ops.
Maybe this is a very competent guy and has zero communications and bullshitting overhead ? Maybe he is a true genius and most of his decisions are actually sound ? Maybe his architecture makes sense ? Maybe he does not need the corporate bullshit bureaucracy ?
One guy of the caliber of a Stallman or a Thovalds will probably do much better than a team of Visual Source Safe users, even if that guy has no source control system.
a test db box for testing your SQL scripts [...] can have the exact same software, OS and patches, and with equivalent database configuration and schemas, but on lower-cost hardware and with a fraction of the data.
I too maintain a test environment, but I've run into two problems with creating a useful "fraction of the data": First, testing code on a fraction has led to misconceptions about scalability to a far larger data set. Second, how would a substantial fraction data representative of real data be created if the real data contains people's shipping addresses or other PII?
..also kill that noisy crow. He could have used an air-pressured rifle (SVN) to scare it off, but if he can use a nuke - it will be more fun. And it will also take care of all the ants.
ggdG
I prefer bzr, coming from SVN. It actually intrinsically supports various workflow models, including a very SVN-like central model.
When our corporate dev team of ~20 tried git it was an unmitigated disaster. Since we also moved to a "branch per feature" model simultaneously, this only exacerbated the problems.
May you never experience the "joys" of your devs hoarding commits locally because the DVCS allows this, thus reducing collaboration.
Or having the complex/confusing git push system cause a dev to rewrite history on the central git server and effectively "lose" another devs' previous commits by unhooking those from the branch history. (Why is the ability to rewrite commit history & orphan commits in branches on the central server considered a feature in git? God only knows...)
Or have your devs spend more time with git's two phase commit crap and so forth when all they wanted was to mimic the effects of a simple "svn commit" operation (hahaha, silly devs, committing should be *complicated*!), leaving them uncertain whether their crap is actually committed, committed locally, or actually pushed to the central server.
BTW, hope you think monotonically increasing revision numbers for commits in SVN are worthless, because git has no way to do anything like that. BZR has an approximation of revision numbers, on a per branch basis, though. Also, hope you have no use for having an empty directory checked into your repo, because git doesn't support that either. Naturally, you probably have no use for having renames be a supported operation in your VCS, because git doesn't do that either (it uses heuristics instead, though bzr *does* support true file rename capability/tracking).
Of course, the list goes on and on. Eventually, the team decided to go back to using SVN. Unfortunately, while there is a SVN-to-git importer, there's no way back. We decided it was better to write off years worth of commit history to get back to a useful VCS, so we merged all our git branches, stomped them flat, exported, and committed a rev 0 into a brand new SVN repo.
Thus ended that debacle.
Now, my view is extremely unpopular around the internet. Git zealots abound who are loathe to acknowledge any other VCS could be superior to git in any way, under any circumstance. They just don't seem to understand that not every project is the Linux kernel, which requires coordinating thousands of devs around the world in a distributed fashion.
That said, branching & file renaming blows in SVN. BZR does all those things very well, and "bzr commit" actually works like a SVN user would expect, etc, etc. I suggest you give it a try.
The scripts are irrelevant if not ran on the real environment,
Well, that's an oxymoron
It does not mean what you think it means.
Cold fire is an oxymoron, or dry water, or dumb genius. An oxymoron is an inherently contradictory combination of terms.
20K lines? Tiny!
(1) You have job security. (2) You have delivered what is needed so far Now what happens... Your organisation could carry on as-is. It has worked in the past but you might get eaten by a rogue killer whale. Oho! A /real/ risk of you going AWOL for (100 genuine reasons). So how do you suggest this is is handled? In-line comments? Development Wiki? etc.
IMHO I would try to list the things that could go wrong... ...Then put them into categories such that responsibility for a certain category can be handed to whoever.
Two days later ask whoever is supposed to be 'in charge of foo' how they intend dealing with a certain detail. (This is Health and Saefty... Gone mad) if you ask people happily fumbling about their own business.ask to see the legal details.
I don't want to pick nits here because Luis is giving out a lot of very valid information and observations here. Just want to take it one step further.
Mirror the production environment DB with an identical amount of data. The data doesn't have to match row-for-row. But the test environment DB should have the same scale as the production DB. Here's why: If you want to run an alter table or alter index command, you want a sense of how much of an impact that is going to have on the database before running the operation in production. If you only have 10k rows in the table in test and run the command, it'll complete pretty quickly. You run that against a 2-million-row table in production, depending on the vendor, that could mean a table lock that takes down your production environment until it finishes (could be hours). Been there. Done that. It ain't pretty.
Seth
$5 / month hosted VPS on linux = awesome!
I've done a little bit of environment taming in my day.
Everybody's already told you the "right" things to do. They're all right. Thing is, you need to get there somehow, and you're looking for a path from here to there. At least, I think that's what you're asking.
You already have bazaar. Good tool. Don't worry about bzr versus cvs versus hg right now. You picked something. Run with it.
I suggest a quick shell script that replaces your editor with "edit; check-in; offer to push". Create another quick script (call it "oops") that asks you whether you need a local or global revert, then issues the relevant commands. Push those scripts to all machines (maybe as their own bzr project). Now, you basically have the same process with a much larger safety net. This isn't software that's being released to the world. Don't worry about version numbers or branching. If you EVER have to change a file, throw it into version control.
Now, you can start to synchronize your scripts. With an environment that wild, you probably have a script that's almost-but-not-quite the same running on a bunch of machines. Maybe there's a hard-coded hostname or directory or something. Come up with a version that's more universal (Use big "if hostname = foo" blocks if you have to), and get that new universal script added to a project and pushed to all machines. Once they're all using it, you can slowly clean it up.
Cool. You have unified scripts. Now let's talk about those configuration files. I'll bet that they're also 99% identical across all of your machines. Get them all into the same project (call them config-machine1, config-machine2, etc.). Get them as identical as possible. Now, think about how you might handle differences. For a quick fix, I like the "magic comment" ("## BEGIN foo.sh MANAGED SECTION" and "## END foo.sh SECTION") and a perl script that looks for those strings. m4 also works well, and isn't too hard to learn.
"Okay, smart guy. I have all of the common config scripts, but I have a bunch of single-purpose machines and scripts, too!" Yup. Awesome. Get them into version control, too. You never know when that machine's going to suddenly die or your boss will break out in a fit of generousity and get you that second server for load balancing. (Hey! It could happen.) When it does, setup will be a lot easier if you have all of the config files in a project.
At this point in the game, you'll be pretty comfortable with version control. You'll have been burned once or twice, and it'll have saved your butt a few times. You'll have some experience, and you'll be kicking yourself for the way that you first set it up. Now's the time to revisit those decisions. Is it time to split up some projects or roll some together? Maybe git or hg might make more sense. Maybe you hate your life and your coworkers so much that you want to go to Perforce, ClearCase, or some other commercial software. You'll have the experience to design it right.
--