Ask Slashdot: Selecting a Version Control System For an Inexperienced Team
An anonymous reader writes: I have been programming in Python for quite a while, but so far I have not used a version control system. For a new project, a lot more people (10-15) are expected to contribute to the code base, many of them have never written a single line of Python but C, LabVIEW or Java instead. This is a company decision that can be seen as a Python vs. LabVIEW comparison — if successful the company is willing to migrate all code to Python. The code will be mostly geared towards data acquisition and data analysis leading to reports. At the moment I have the feeling, that managing that data (=measurements + reports) might be done within the version control system since this would generate an audit trail on the fly. So far I have been trying to select a version control system, based on google I guess it should be git or mercurial. I get the feeling, that they are quite similar for basic things. I expect, that the differences will show up when more sophisticated topics/problems are addressed — so to pick one I would have to learn both — what are your suggestions? Read below for more specifics.
These are the requirements I can see so far:
- __Server_running_locally__ (as opposed to in the cloud) on windows (IT departments choice, non-negotiable)
- Good/easy to use Windows clients (IT departments choice / company policy, again non-negotiable)
- Use windows credentials (maybe, single sign on)
- Open source server/client (personal preference)
- Well established Project that will not disappear/ get unmaintained within a foreseeable future
- Do basic test on the code (Syntax errors, pytest/nose/or alike with coverage (of tests), check coding style)
- email notifications
- good documentation
- reasonable price for 5 — 10 users : free — 500€
Things that would be great ...
- web interface (like github) would be nice
- integration of bug tracking / bug reports
- possibility to do and print out a code review
- some kind of jupyter / ipython integration
Things I am not sure I will need but seem to be a good idea at the time of writing...
- Include other files/ file types for measurement data, documentation and user manuals (docx, xml, xlsx, gz, ...)
- When thinking about measurement data /reports it would be great to have digital signatures (--> FDA compliant). I know this is extremely hard, if this exists I would love it, if not I am fine. Somehow this feels like mixed document/version control, but I would love to have data + code + text = report at the same place to easily find implications of a bug — which data has to be re-evaluated and so on.
- __Server_running_locally__ (as opposed to in the cloud) on windows (IT departments choice, non-negotiable)
- Good/easy to use Windows clients (IT departments choice / company policy, again non-negotiable)
- Use windows credentials (maybe, single sign on)
- Open source server/client (personal preference)
- Well established Project that will not disappear/ get unmaintained within a foreseeable future
- Do basic test on the code (Syntax errors, pytest/nose/or alike with coverage (of tests), check coding style)
- email notifications
- good documentation
- reasonable price for 5 — 10 users : free — 500€
Things that would be great ...
- web interface (like github) would be nice
- integration of bug tracking / bug reports
- possibility to do and print out a code review
- some kind of jupyter / ipython integration
Things I am not sure I will need but seem to be a good idea at the time of writing...
- Include other files/ file types for measurement data, documentation and user manuals (docx, xml, xlsx, gz, ...)
- When thinking about measurement data /reports it would be great to have digital signatures (--> FDA compliant). I know this is extremely hard, if this exists I would love it, if not I am fine. Somehow this feels like mixed document/version control, but I would love to have data + code + text = report at the same place to easily find implications of a bug — which data has to be re-evaluated and so on.
As far as I can tell, you're describing the classic CVS or Subversion small team setup. You can run a server on the network (via Apache, or via SSH), run ViewCVS, set up checkin hooks, and give your clients a nice client like TortoiseCVS/TortoiseSVN built into Windows Explorer.
If you want integration with bug tracking tools, then have a look at Bugzilla and Bonsai.
All your users need to know about, is check in, and checkout, so the cognitive overhead is low.
It would take one engineer half a day to set all this stuff up on a spare machine, and you could try it out fairly quickly.
And best of all, this setup is gratis as well as Free. This has worked really nicely for me in both an academic and a commercial environment.
Oh God, stay the fuck away from SourceSafe.
SourceSafe is an absolutely terrible choice, since it is actively user-hostile, and has the alarming habit of eating your source code at the worst possible time. Rational Clearcase is almost as bad.
First we need to take a step back and figure out what you are actually doing. You have pulled up with a "software version control" bandwagon and everyone just jumped on without looking to see if it would take you where you wanted to go.
Are you wanting to keep track of the versions of your code or the reports generated by that code or the data that the code used to generate the reports? Each type of information is best suited for a different kind of versioning system. Are the reports generated only by the code or are they written by humans? Trying to use a code versioning system to keep track of modifications to reports or data is a loosing game. Don't make the mistake of thinking every problem is a nail just because you have a hammer.
I was with a small team of very experienced developers, and even for us going to git had a bunch of surprises. For me it's not so much the UI tools, it's understanding what's going on, and why git does what it does.
That's what I mean when I say there's no simple formula of "do these 3 commands to do this, those 2 commands to do that". You have to understand WHY the commands are doing what they are doing.
That's certainly a common view of Git, but after using it for the last few years, I think that a lot of the problems that beginners have with it are happening because of this assumption. That is, when a developer asks how to merge their code into the shared Git repo for the first time, the wise old Git gurus point them at a site that explains how Git works at the molecular level, called The Git Book. This is almost never helpful, because your average Joe C. Programmer doesn't have time in his schedule to read an entire book, and even if he reads it over the weekend instead of, you know, having a life, he just ends up with his head full of crazy circles-and-arrows diagrams, which, divorced from any concrete, hands-on practice, only serves to confuse the issue more.
What the inexperienced Gitsperson actually needs at that point is a short and to-the-point workflow that he can use to get his goddamn code in the goddamn repo, like (commands for illustration purposes only, I use a Fischer Price GUI): "git clone MyRepo; git switch master; git pull; git branch MyFeature; git switch MyFeature; [implement the code changes]; git commit; git push; git switch master; git pull; git merge MyFeature; [fix conflicts, resolve, commit again if necessary]; git push". And for the love of God, Newbie, please don't try to use "rebase", you'll just cripple our entire product at 5:30 pm on a Friday.
There's documentation of that kind out there, admittedly, but it's really hard to find among all the indistinguishable-from-autogenerated-prank-nonsense man pages and fifteen-part seminars on how the version hashing algorithm works.