Clearcase Problems with Linux?
joecooler asks: "I work for an ASIC company in the verification group. We use VCS and Vera to write and run simulations, Clearcase for revision control, and LSF to manage our server farm. At my instigation my employer has begun to move to Linux PC's for our simulation server farm instead of the much more expensive and much slower Solaris Sun machines. Everything has been working well and everyone has been very pleased with the performance except for one 'small' problem - every two weeks or so we will suddenly see all jobs running on Linux machines crash. After much pain we have been able to isolate this to an issue with Clearcase returning files 'slowly' to the Linux machines, causing VCS compiles to die. Has anyone else had issues with Clearcase and Linux running on a PC? If so, how did you debug this and isolate the exact source of the problem? Is this solvable, or is it one of the mysteries of networking?"
Clearcase is returning files slowly? I don't believe it!
No personal offense intended. We've wrestled with the same problems ourselves and have ultimately decided to look at alternatives to clearcase.
There's a couple big problems. The biggest one is that clearcase requires you to use a modified linux kernel, and they only provide stable modifications for a handful of older, stale kernels. If you want to keep up with security updates, you are on your own. If you want to update to a newer kernel that solves some device driver problem, forget it. If your product depends on you using a custom kernel like ours does, you are totally screwed. Unless rational finds some way to make their product work without requiring specific kernel versions, it will never be a good fit with Linux. Your stability problems may be caused by not using a Rational-approved kernel.
The second huge problem with clearcase is not linux specific- it has to do with clearcase's architecture. Clearcase requires each client to use a proprietary NFS-like filesystem (MVFS) in order to interface nicely with the server. MVFS has a very high overhead both in terms of network traffic and server CPU time. It has poor security, poor performance, and poor reliability. Even on solaris it's ugly, and on rational's second tier systems such as Linux and Irix it's even worse. Imagine trying to maintain an entire closed-source network filesystem codebase just for one application. That's the problem that clearcase's development team faces, and I guess I can't fault them for not doing it well.
Clearcase's architecture realistically limits your clients to being on the same local network with a persistent, always-on connection. In addition, the server needs to be a very expensive top-end solaris box. Also, if you want to support remote development you either have to wrestle with the unfriendly, unpolished "snapshot views" configuration or shell out huge dollars for a multisite license and a dedicated person to support it.
If you are misfortunate enough to be stuck with an older or poorly performing network clearcase can be unusable. You absolutely must have high bandwidth, low latency paths between your clearcase server, build platforms, and clients. It sounds offhand like this may be your problem. Put in a direct (no hops) 100bT line between a linux client and the clearcase server, make sure the clearcase server isn't under heavy load from other people, and rerun your tests.
Rational encourages you to use clearcase to manage your entire build operation, and version binaries and object files as well as source. This does has some benefits, but it makes already bad performance become downright abyssal and makes it very difficult to switch products once you realize Clearcase is no longer the right fit for your organization.
Finally, Rational appears to be completely ignoring these shortcomings with clearcase on Unix. Over the last couple years they have ported Clearcase to Windows and rewritten all of the administration tools. However, the second-generation admin tools are WINDOWS ONLY. If you want to use tools that don't suck, you need a Windows box. I find it incredulous that rational had a cross-platform product, and when they had the opportunity to make cross platform tools using any number of high quality cross-platform libraries, they chose to go with one platform only. I've asked when the next generation tools will be ported back to Unix/Linux, and they have no plans to do that. I love the command line as much as any card-carrying unix geek, but I demand the best tools for the job. I don't like being on rational's second-class platform.
To me, this underscores the fact that sales and marketing are running the show over at Rational. Rational aquires products so they can lock in customers, and then they scale back development and move on to the next product. Unfortunately people using clearcase on unix have invested so much time integrating clearcase into their workflow that the costs of changing to a different SCM platform are unbearable. Yet, if you look around, you will find competitors like Perforce and BitKeeper offering better products at orders of magnitude less license/maintenance fees. These competing products scale better, can be used over the internet easily, don't require a custom kernel (!!!), and require substantially less dedicated support staff to maintain.
Shop around. Moving to Linux might be a good time to use something that works better and costs less than maintaining clearcase, even in the short term.
Sorry to have ranted a little off subject to your original question.
One thing you might try is switching your build machines to use snapshot views. This reduces the network overhead and allows for more disconnected style of operation. It's a huge win for compile-farms where you only want to pull recent files and rarely if ever commit changes back. Doing this may solve your reliability issues as well speed up compiles.
Do a strace of cleartool. After every file they have a 6 second sleep. So we just link in a glibc with sleep(6) overridden to not sleep. Works like a charm.
We've supposedly opened a call with Rational, but I haven't heard anything.
I can make the glibc binaries available that we use on my website if anyone is interested and doesn't want to go through the effort of recompiling glibc themselves.
Now why would they be sleeping for 6 seconds when it doesn't appear to be necessary:
1: conspiracy theory-- M$ told them to
2: inept programming-- deadlock in their code, ahh, just put a sleep in to fix it
3: smart programmer-- It's review time, need to make this faster, I know, change that sleep(6) to sleep(5).
4: problem with Linux NFS... no can't be!
ClearCase servers do not require MVFS, just the client machine that accesses the VOBs or dynamic views. On Linux to use MVFS you need to insmod a kernel module to support MVFS, this module is can be re-linked to support a slightly different version of kernel, however if any kernel structure sizes change it will not work.