Ask Slashdot: How Do You Assess the Status of an Open Source Project?
Chrisq writes: "Our software landscape includes a number of open source components, and we currently assume that these components will follow the same life-cycle as commercial products: they will have a beta or test phase, a supported phase, and finally reach the end of life. In fact, a clear statement that support is ended is unusual. The statement by Apache that Struts 1 has reached end of life is almost unique. What we usually find is:
- Projects that appear to be obviously inactive, having had no updates for years
- Projects that are obviously not going to be used in any new deployments because the standard language, library, or platform now has the capability built in
- Projects that are rapidly losing developers to some more-trendy alternative project
- Projects whose status is unclear, with some releases and statements in the forums that they are 'definitely alive,' but which seem to have lost direction or momentum.
- Projects that have had no updates but are highly stable and do what is necessary, but are risky because they may not interoperate with future upgrades to other components.
By the treating Open Source in the same way as commercial software we only start registering risks when there is an official announcement. We have no metric we can use to accurately gauge the state of an open source component — but there are a number of components that we have a 'bad feeling' about. Are there any standard ways of assessing the status of an open source project? Do you use the same stages for open source as commercial components? How do you incorporate these in a software landscape to indicate at-risk components and dependencies?"
sourceforge, github, and other major OSI project hosts feature both last updated dates and when a project is discontinued often times notices stating so. Ultimately, some responsibility is placed on the author(s) & maybe even the community for managing this. Search engine rankings take care of the rest. And of course, there is no way to bat 100% here, some will be missed with this and just about any other method.
Try and find someone looking for help using it online. See what people say to them. If there are lots of recent problems and responses that don't seem to suggest using other products, its likely in a good state to use.
If no one is looking for help using the library, its either not in use, or way too easy to use (has that ever happened?).
One thing to look out for is that if something works well, it might not need updates very often (or at all, depending on what it is). Don't abandon something simply because its old, or not being updated. Now, it its not being updated, has lots of open issues, and no users, thats a problem.
You can also look for some issues/tickets, and see the response times on them.
This isn't a problem that is unique to open source. Several commercial libraries that we have used in the past have entered the twilight zone where the developer is neglecting them, and refuses to release any sort of roadmap or EOL announcement. Eventually, you just have to make your own call based on how much work it will be to move to a new library vs the risk of staying with the current one. At least with open source if you get stuck with a dead library you can choose to take over maintaining it on your own either as a long term strategy or a short-term stop-gap until you can move onto something else.
Some open source projects are in a better state than others, but in my experience it is a good idea to treat all of them as if they can stop working at any time, and manage that risk. In other words, have a contingency plan ready. In some cases you may be able to fx a broken bit of software yourself (or hire a company to do this). In other cases there are alternative software products you can switch to. Or simply accept the fact that whatever it is you've put together will stop working some day (obviously nothing mission critical). The last option may sound scary, especially to managers, but often it's better to have something rather than nothing, even if its for a limited amount of time.
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
One metric yielding interesting results is the concept of "technical debt", as introduced by Martin Fowler. Sonar Source, for example, measures this metric very well. A project that has seen neither increase ( recently taken risk ) nor decrease ( recent moves toward stabilization ) may very well be dead. I recently used it upon our own software of 580 KSLOC. The interesting conclusion: core stable, some utilities half dead or worse, much life springing up at the functional fringes. This also holds for e.g tomcat. The tactical and strategical conclusions one may draw from such considerations are fascinating.
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
The first thing I do with regard to investigating any OSS is to find their developer list and skim the last few months of it. It's a good way to see the level of activity, responsiveness, and how cohesive or combative the core is.
Most really usefull software needs maintenance, or at least reviews to verify none is needed, on a routine basis. This is usually dull, thankless work. In business, it is often done by old codgers (like me before i retired) that are well paid for very little actual work. It is a vital function, that was supposed to have been covered in open source by users paying for the service.
In many cases this seems to have worked out well with large organizations footing the bill. iBM, HP, AT&T etc, have staff people who kept the components they need working. Their priorities aren't yours.
Do we need a system for keeping codgers comfortable and personal use software working?
I've had a couple of cases where I needed a feature, that there had been lots of requests for, in existing software whose development had slowed or stopped. I offered to hire the developer, bounty style, but they weren't interested.
I hired professional programmers to add the feature or make necessary changes to the existing code. I then submitted the code as patches to the original developer, hoping that he would accept the patches and make it so I didn't have to patch and compile everytime there was an update or distro change. My patches were always GPL and there were no restrictions on them, so if the developer didn't like the style or specific implementation, they could use my patch as a starting point or model and change whatever they chose.
In all cases, the developers have not incorporated the patch. In most cases, they have done nothing at all. I'd likely have been better off just buying Windows COTS.
Another good technique is to search Stackoverflow for questions about the project you are considering. Look at both the number of questions asked and the quality of the answers. Especially look for questions like "Should I be using XYZ?" and "XYZ vs {Alternative to XYZ}".
Stackoverflow is moderated somewhat like Slashdot, so the best answers will usually bubble to the top.
Sometimes, a program can be dead because it's obsolete. Others can appear dead because they have simply been completed.
For example, I'd guess that xclock hasn't been updated in many years... but it's still widely used for testing X11.
of course, if you're using it and you have the source code, then its not dead - except the old project page might no longer point to the currently updated project site (ie your fork).
All the FOSS sites need a 'takeover' policy for dead projects that is more than just fork. That link says to contact the abandoned project admin and ask to be added to the project to continue it, and if they do not respond, create a new project site with the old code. Personally, I think if they do not respond, then the site should try to contact them - if they still do not respond (after a suitably lengthy time) then it should re-assign you as the new owner. They could rate-limit takeover requests to 1 a year per project without incurring much inconvenience to project admins. Alternatively they could mandate a minimum of 2 admins per project and give a list of "non-exec" admins that are simply there for such contingency purposes.
For example, I see Fuppes project on sourceforge, it works well but needs a tweak or two to make it work great - and I'm willing to do the work, but the admin doesn't seem to be around anymore. I could fork it, but I'd much rather keep continuity of the original project. We have way too many forks anyway (usually because Oracle took over the project :) ).
Yeah you want to be careful with activity metrics. Awk hasn't seen many updates in the last two years. Mostly because it hasn't NEEDED much in the last ten or twenty years. That means it's already rock solid, not that it should be avoided.
0) If the project does what you need today, USE IT. Don't get so bound up in "future-proofing" your technology stack that you get paralyzed looking for "the perfect product that will do exactly what we want forever and never let us down."
1) Define your standard software stack. Mandate that all software written internally using open source components use these standard components & versions, or coordinate making a new version available to all projects if there's a particular new feature of a new version that is absolutely mandatory;
2) Always, always, always, download source for the version of the package you're installing (even if you just grab binary-only distributions to install & run), and archive it for posterity in some location YOU control and backup - DO NOT rely on "the internet" to help you find an old version of software; this allows you to fix (or hire someone to fix) any problem you have down the road in case of real critical issues where no active project maintainers can be found/hired/worked with.
3) Every few months (we shoot for ~6 months), review your stack and grab the latest versions of each component and make it available in your dev / testing environments;
4) If a component starts getting stale (no updates for 2 or more of these cycles), we'll start thinking about replacements for that component, and investigate likely alternatives, and bump this item up into the "needs monitoring" risk category - no production impact yet, but as soon as you need to release a patch of that production version using the outdated component, you're gonna be in trouble.
5) Periodically (nightly if you have resources - get something like jenkins or similar for this sort of thing) ensure that you can build these components from source successfully. Especially as they get 'stale,' you'll run into issues - system libs, headers, etc. will change over time, and there will come a point where you are no longer able to build the software without code modification. At that point, if any of your software is still using the version, then you should start raising alarms and bump the risk level up to "severe." This could cripple your production env.
6) If a crisis comes up and a dead project is the culprit... well, we've got the code and can always modify it ourselves, if we haven't found any suitable alternative.
There's really no magic to it - just make sure that developers aren't downloading "every version under the sun," and ensure that the versions you're using are reproducible, available, and actively managed on your end. Risk management is paramount.
An important difference with open source is that, if a component you rely on is abandoned, you have the choice of maintaining it yourself. I'm not suggesting you want to take over development of large projects, but in some cases this is a real option. It's especially relevant to the last category mentioned in the post: "Projects that have had no updates but are highly stable and do what is necessary, but are risky because they may not interoperate with future upgrades to other components." If you're using a library that's stable and does what you want, and your only concern is keeping it working when other things change in the future, that may be quite easy to do yourself.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
the typical example that i give here is "python htmltmpl". htmltmpl was written to solve a very specific problem: minimalist templating of HTML by allowing dictionaries of key-value pairs to substitute into HTML (value text replaces the key when named) and to do likewise for lists of dictionaries in order to e.g. create tables.
very very simple.
the problem is this: the actual scope of the work required means that the actual programming required was extremely straightforward. i.e. it was done, completed - problem solved. the scope of the work required is clear; the scope of the work required does not change; the scope of the work required does not *NEED* to change.
therein lies the problem, namely that the fact that python-htmltmpl has quotes not had any development quotes means that, as far as sourceforge is concerned, the project is "dead". look at the release dates - 2001 for god's sake!
http://htmltmpl.sourceforge.net/
the point is: just because a project hasn't had any development done on it, that DOES NOT automatically mean that it doesn't do the job. correlation != causation. python-htmltmpl *clearly* does the job it's intended to do.
i mention this case specifically because i have seen a large number of HTML "templating" languages come and go. the php-inspired one which used syntax. zope with the dreadful and insane embedding of python in templates and templates in python. many many more, all of which caused me to despair when i saw them, so much so that i was inspired to talk at one UKUUG conference at some length about best practices of keeping programming languages declarative i.e. *never* embedding programming languages into HTML (even if it's php).
and once you follow the sanity-restoring rule of keeping a programming language declarative (e.g. in the php case beginning the file with as the last two characters and AT NO POINT EVER NOT FOR ANY REASON WHATSOEVER FALLING BACK TO OR PERMITTING STATIC HTML TO BE OUTPUT IMPLICITLY)... ... once you follow that rule, then you find that you need a templating system such as php-htmltmpl or any of the others that exist. and, once you've looked closely at what you actually need out of an HTML templating language, then actually, htmltmpl provides a *really* good very simple system which covers pretty much everything you'll need. need to do an expression which is a mixture of variables and HTML? generate it explicitly in php, put it into the array - don't for god's sake try to use a god-awful mix of print, echo, dots and christ knows what else. just.. don't.
so i'm putting this out there because in certain cases, what you find is that the code that you need appears "dead", but that's not actually the case: the failure of sourceforget and github by their "metrics" have relegated perfectly good and *completed* code to obscurity.
you are therefore encouraged to participate in *unfinished* projects, with their constant changes, moving targets and massive contributions which may or may not be correctly managed, because it is those projects that have "99% activity". does that sound like a good thing to you?
AFAIK zlib is still the best if you measure the speed/compression ratio.
But technically the best way to get speed over compression is no compression at all (infinitely fast / 1).
No, because you also have to consider disk I/O time, and CPU time is relatively cheap, so on-the-fly compression is faster than no compression for many types of data.
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.