Ask Slashdot: Selecting a Version Control System For an Inexperienced Team
An anonymous reader writes: I have been programming in Python for quite a while, but so far I have not used a version control system. For a new project, a lot more people (10-15) are expected to contribute to the code base, many of them have never written a single line of Python but C, LabVIEW or Java instead. This is a company decision that can be seen as a Python vs. LabVIEW comparison — if successful the company is willing to migrate all code to Python. The code will be mostly geared towards data acquisition and data analysis leading to reports. At the moment I have the feeling, that managing that data (=measurements + reports) might be done within the version control system since this would generate an audit trail on the fly. So far I have been trying to select a version control system, based on google I guess it should be git or mercurial. I get the feeling, that they are quite similar for basic things. I expect, that the differences will show up when more sophisticated topics/problems are addressed — so to pick one I would have to learn both — what are your suggestions? Read below for more specifics.
These are the requirements I can see so far:
- __Server_running_locally__ (as opposed to in the cloud) on windows (IT departments choice, non-negotiable)
- Good/easy to use Windows clients (IT departments choice / company policy, again non-negotiable)
- Use windows credentials (maybe, single sign on)
- Open source server/client (personal preference)
- Well established Project that will not disappear/ get unmaintained within a foreseeable future
- Do basic test on the code (Syntax errors, pytest/nose/or alike with coverage (of tests), check coding style)
- email notifications
- good documentation
- reasonable price for 5 — 10 users : free — 500€
Things that would be great ...
- web interface (like github) would be nice
- integration of bug tracking / bug reports
- possibility to do and print out a code review
- some kind of jupyter / ipython integration
Things I am not sure I will need but seem to be a good idea at the time of writing...
- Include other files/ file types for measurement data, documentation and user manuals (docx, xml, xlsx, gz, ...)
- When thinking about measurement data /reports it would be great to have digital signatures (--> FDA compliant). I know this is extremely hard, if this exists I would love it, if not I am fine. Somehow this feels like mixed document/version control, but I would love to have data + code + text = report at the same place to easily find implications of a bug — which data has to be re-evaluated and so on.
- __Server_running_locally__ (as opposed to in the cloud) on windows (IT departments choice, non-negotiable)
- Good/easy to use Windows clients (IT departments choice / company policy, again non-negotiable)
- Use windows credentials (maybe, single sign on)
- Open source server/client (personal preference)
- Well established Project that will not disappear/ get unmaintained within a foreseeable future
- Do basic test on the code (Syntax errors, pytest/nose/or alike with coverage (of tests), check coding style)
- email notifications
- good documentation
- reasonable price for 5 — 10 users : free — 500€
Things that would be great ...
- web interface (like github) would be nice
- integration of bug tracking / bug reports
- possibility to do and print out a code review
- some kind of jupyter / ipython integration
Things I am not sure I will need but seem to be a good idea at the time of writing...
- Include other files/ file types for measurement data, documentation and user manuals (docx, xml, xlsx, gz, ...)
- When thinking about measurement data /reports it would be great to have digital signatures (--> FDA compliant). I know this is extremely hard, if this exists I would love it, if not I am fine. Somehow this feels like mixed document/version control, but I would love to have data + code + text = report at the same place to easily find implications of a bug — which data has to be re-evaluated and so on.
Oh God, stay the fuck away from SourceSafe.
SourceSafe is an absolutely terrible choice, since it is actively user-hostile, and has the alarming habit of eating your source code at the worst possible time. Rational Clearcase is almost as bad.
It is perfectly possible to branch in SVN and manage it. Git is better for branching and developing in complex and large team environments. But this is not the case here. They probably have max 3 guys maintaining and max 3 guys on a development branch. SVN is more than capable of handling that.
I'll start by answering your question. Use GIT. It's the most widely supported system at this time and it works really well.
Next let me be a typical slashdot asshole that makes abrasive comments that may be well intended by will come off as being a dick. I'll explain that I already see endless problems coming from this.
If you're working with a team of 10-15 developers who all lack experience with version control, you have a major problem with out-of-date programmers and you're throwing them into a hell called Python. If you generally accomplish projects using C and LabView, the developers you have more than likely lack a modern development skill set and coding in a language like Python will produce some of the worst code ever written. If C is like shooting yourself in the leg and C++ is like blowing the whole damned leg off, Python is like dropping a nuke. You will have an endless supply of options for writing terribly bad code in the worst ways possible. The only redeeming feature will be it will have nice uniform spacing.
I would highly recommend doing what always works best which is to hire a Python developer with good GIT skills that can lay the majority of the foundation of the project and create a uniform set of standards of coding for the project and then bring the other developers on 3 at a time and perform constant code review. Focus heavily on test driven development and use a system like SCRUM for lifecycle management. If you want to teach old dogs new tricks, don't just throw them in the fire and tell them to figure it out. The programming paradigms are so drastically different between your old method and new that without some sort of leader with experience, it will turn out to be a disaster and jungle of crap code. I personal avoid Python projects not because the language is bad, but instead because they tend to be like this.
You should of course know by now that if you are traditionally a LabView shop, you're going to sacrifice a massive number of really important features to save a buck. Python has great support too multi-threading but it's not an awesome environment for event driven programming like LabView is. You of course can accomplish all the same things, but even with the thousands of toolkits/libraries out there, you'll have to write the entire underlying architecture yourselves and you'll lose almost all visualization you've come to depend on.
I was with a small team of very experienced developers, and even for us going to git had a bunch of surprises. For me it's not so much the UI tools, it's understanding what's going on, and why git does what it does.
That's what I mean when I say there's no simple formula of "do these 3 commands to do this, those 2 commands to do that". You have to understand WHY the commands are doing what they are doing.
That's certainly a common view of Git, but after using it for the last few years, I think that a lot of the problems that beginners have with it are happening because of this assumption. That is, when a developer asks how to merge their code into the shared Git repo for the first time, the wise old Git gurus point them at a site that explains how Git works at the molecular level, called The Git Book. This is almost never helpful, because your average Joe C. Programmer doesn't have time in his schedule to read an entire book, and even if he reads it over the weekend instead of, you know, having a life, he just ends up with his head full of crazy circles-and-arrows diagrams, which, divorced from any concrete, hands-on practice, only serves to confuse the issue more.
What the inexperienced Gitsperson actually needs at that point is a short and to-the-point workflow that he can use to get his goddamn code in the goddamn repo, like (commands for illustration purposes only, I use a Fischer Price GUI): "git clone MyRepo; git switch master; git pull; git branch MyFeature; git switch MyFeature; [implement the code changes]; git commit; git push; git switch master; git pull; git merge MyFeature; [fix conflicts, resolve, commit again if necessary]; git push". And for the love of God, Newbie, please don't try to use "rebase", you'll just cripple our entire product at 5:30 pm on a Friday.
There's documentation of that kind out there, admittedly, but it's really hard to find among all the indistinguishable-from-autogenerated-prank-nonsense man pages and fifteen-part seminars on how the version hashing algorithm works.
I'll second the recommendation of Mercurial. There's also EasyMercurial, a nice little dead-simple GUI front end that lets you do a handful of the most common things (checkpoints, history overview, version comparisons, reversions, branching and merges) without having to learn much about the details. Very nice for beginners to get their feet wet with version control systems, and if/when they need something more powerful they can always use the command-line tools directly, or migrate to a more feature-rich GUI. But honestly, for a bunch of people without prior experience even the heavily restricted feature set will be a huge step forward.
There's probably similar "beginner" GUIs available for most of the major VCSes, but EasyMercurial is the only one I can vouch for. I would also lean towards recommending a distributed VCS that offers easy branching and merging for a bunch of VCS beginners - they seem to offer far less conceptual/procedural overhead to the "lone wolf" work flow, and thus are more likely to actually get used effectively.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
I never thought I would see a recommendation for CVS in 2015. The OP is on the right track looking at git and mercurial to start with. The only probem with his requirements are the Windows server. Maybe a virtual machine running on the Windows server would be acceptable to IT? While it is possible to run a git or mercurial server on Windows, there are a lot of good tools that would give the "things that would be great" that are not supported on Windows. On the client side, TortoiseGit and TortoiseHg are available, giving the same Explorer integration as TortoiseSVN/TortoiseCVS.