Clearcase Problems with Linux?
joecooler asks: "I work for an ASIC company in the verification group. We use VCS and Vera to write and run simulations, Clearcase for revision control, and LSF to manage our server farm. At my instigation my employer has begun to move to Linux PC's for our simulation server farm instead of the much more expensive and much slower Solaris Sun machines. Everything has been working well and everyone has been very pleased with the performance except for one 'small' problem - every two weeks or so we will suddenly see all jobs running on Linux machines crash. After much pain we have been able to isolate this to an issue with Clearcase returning files 'slowly' to the Linux machines, causing VCS compiles to die. Has anyone else had issues with Clearcase and Linux running on a PC? If so, how did you debug this and isolate the exact source of the problem? Is this solvable, or is it one of the mysteries of networking?"
IIRC, we had that problem at a former place of employment once.
You could've hired me.
You do not specify that you are using dynamic views, but it sounds like you are.... Try using snapshot views instead. Another ( ugly ) idea is a preemptive reboot.
Rational customer support is always very friendly too... Have you called yet?Check the Knowledge Base too...
Clearcase is returning files slowly? I don't believe it!
No personal offense intended. We've wrestled with the same problems ourselves and have ultimately decided to look at alternatives to clearcase.
There's a couple big problems. The biggest one is that clearcase requires you to use a modified linux kernel, and they only provide stable modifications for a handful of older, stale kernels. If you want to keep up with security updates, you are on your own. If you want to update to a newer kernel that solves some device driver problem, forget it. If your product depends on you using a custom kernel like ours does, you are totally screwed. Unless rational finds some way to make their product work without requiring specific kernel versions, it will never be a good fit with Linux. Your stability problems may be caused by not using a Rational-approved kernel.
The second huge problem with clearcase is not linux specific- it has to do with clearcase's architecture. Clearcase requires each client to use a proprietary NFS-like filesystem (MVFS) in order to interface nicely with the server. MVFS has a very high overhead both in terms of network traffic and server CPU time. It has poor security, poor performance, and poor reliability. Even on solaris it's ugly, and on rational's second tier systems such as Linux and Irix it's even worse. Imagine trying to maintain an entire closed-source network filesystem codebase just for one application. That's the problem that clearcase's development team faces, and I guess I can't fault them for not doing it well.
Clearcase's architecture realistically limits your clients to being on the same local network with a persistent, always-on connection. In addition, the server needs to be a very expensive top-end solaris box. Also, if you want to support remote development you either have to wrestle with the unfriendly, unpolished "snapshot views" configuration or shell out huge dollars for a multisite license and a dedicated person to support it.
If you are misfortunate enough to be stuck with an older or poorly performing network clearcase can be unusable. You absolutely must have high bandwidth, low latency paths between your clearcase server, build platforms, and clients. It sounds offhand like this may be your problem. Put in a direct (no hops) 100bT line between a linux client and the clearcase server, make sure the clearcase server isn't under heavy load from other people, and rerun your tests.
Rational encourages you to use clearcase to manage your entire build operation, and version binaries and object files as well as source. This does has some benefits, but it makes already bad performance become downright abyssal and makes it very difficult to switch products once you realize Clearcase is no longer the right fit for your organization.
Finally, Rational appears to be completely ignoring these shortcomings with clearcase on Unix. Over the last couple years they have ported Clearcase to Windows and rewritten all of the administration tools. However, the second-generation admin tools are WINDOWS ONLY. If you want to use tools that don't suck, you need a Windows box. I find it incredulous that rational had a cross-platform product, and when they had the opportunity to make cross platform tools using any number of high quality cross-platform libraries, they chose to go with one platform only. I've asked when the next generation tools will be ported back to Unix/Linux, and they have no plans to do that. I love the command line as much as any card-carrying unix geek, but I demand the best tools for the job. I don't like being on rational's second-class platform.
To me, this underscores the fact that sales and marketing are running the show over at Rational. Rational aquires products so they can lock in customers, and then they scale back development and move on to the next product. Unfortunately people using clearcase on unix have invested so much time integrating clearcase into their workflow that the costs of changing to a different SCM platform are unbearable. Yet, if you look around, you will find competitors like Perforce and BitKeeper offering better products at orders of magnitude less license/maintenance fees. These competing products scale better, can be used over the internet easily, don't require a custom kernel (!!!), and require substantially less dedicated support staff to maintain.
Shop around. Moving to Linux might be a good time to use something that works better and costs less than maintaining clearcase, even in the short term.
Sorry to have ranted a little off subject to your original question.
One thing you might try is switching your build machines to use snapshot views. This reduces the network overhead and allows for more disconnected style of operation. It's a huge win for compile-farms where you only want to pull recent files and rarely if ever commit changes back. Doing this may solve your reliability issues as well speed up compiles.
What you are describing is classic, textbook Clearcase behaviour. It's not known for speed or stability. It's most likely to be a bug in the kernel patches you're required to install.
The horrible problem (that you don't mention in your post) is that because changes aren't atomic, any time the system crashes, your repository could be left in a corrupt state. At this point it takes a Clearcase trained admin to unwedge it, which could take a while.
In any event, don't beat yourself up over it; it's not likely to be something that your IT department is able to fix.
"...Is this solvable, or is it one of the mysteries of networking?"
How about it's one of those "Linux Networking problems."
For the love of god, go use FreeBSD.
Its a decent OS, not some crap slapped together that is linux.
Looks like you aren't sure what the problem is, just don't panic, these things do happen once in a while, do read the manual properly..
-- Live Long And Prosper
Now, it's true that one had to handle checkins and checkouts from a Sun box, but, as the build farms mounted the exported views read-only, what's the big deal? Is it really necessary to integrate the source control system that tightly with the Linux-based development environment?
You could've hired me.
I use ClearCase on Linux where I work and haven't had any major problems (except that no one here can quite figure out how to get the Linux automounter to work with ClearCase and I'm too lazy to try to figure it out myself).
That said, I'm no big fan of ClearCase. It seems needlessly complex and sluggish, has limited platform support (compared to CVS, which is what we used to use and would basically run on anything you could compile it on), and I think there's something just wrong about having a version control system have modules that run in kernel mode.
Do a strace of cleartool. After every file they have a 6 second sleep. So we just link in a glibc with sleep(6) overridden to not sleep. Works like a charm.
We've supposedly opened a call with Rational, but I haven't heard anything.
I can make the glibc binaries available that we use on my website if anyone is interested and doesn't want to go through the effort of recompiling glibc themselves.
Now why would they be sleeping for 6 seconds when it doesn't appear to be necessary:
1: conspiracy theory-- M$ told them to
2: inept programming-- deadlock in their code, ahh, just put a sleep in to fix it
3: smart programmer-- It's review time, need to make this faster, I know, change that sleep(6) to sleep(5).
4: problem with Linux NFS... no can't be!
If you can drop ClearCase, do it before you're more entrenched than you already are. If you think you're past the point of no return, I'll perform some last-rites for you, but really there's no such thing as a barrier to a total conversion.
God help you my son. There are better alternatives out there. Try Perforce.
People posting rants about how bad the software is get +5 Informatives (twice even), people suggesting open source alternatives get insightfuls, and an actual cool hack to get around lazy/stupid programming that ANSWERS the question posed and involves actually getting down into the nitty gritty hidden details of how Linux handles system calls and ways to make bad programs behave using some neat coding goes UNMODDED!? Sheesh people!
Why do my Mod points always expire when nothing interesting is going on...
Maxim: People cannot follow directions.
Increases in truth directly with the length of time spent explaining them
I don't mean to be a troll but, I am curious about something you said. You stated that Clearcase uses a custom kernel and that this is bad, which I agree with. But, then you say that the software that you are developing, in Clearcase, requires a customized kernel.
My question is, why does your software require a custom kernel, especially if you think that the use of a custom kernel is a bad idea?
Is this solvable, or is it one of the mysteries of networking?
Here we go again. Developers, using crappy software and not understanding their own tools or what they are developing, immediately turn around and start blaming the network. Then they convince some management moron that the problem must be the network and some poor network engineer has to spend the next three days figuring out what the problem really is. Then after days of diagnostics and pouring over sniffer traces, debugging this and testing that, the problem is finally found in the application's config file. It's always some brilliant entry like; RequireReboot=3 days or Network Optimization=1 #Values range form 1 to 10.
If you can ping, telnet, ftp, www, vnc, ssh and whatever else and your one application ain't working right, IT'S YOUR APPLICATION. DON'T BLAME THE NETWORK!!!!!!!!
I have been burned heavily by ClearCase in the past.
There is a simple rule: maintain a separation of Church and State.
i.e. Decouple your build system from your SCM system. Use Perforce as your SCM and make or jam for your build system.
MVFS has to sleep with the kernel to do what it does , it's very expensive, requires dedicated administrators and the VOBS corrupt regularly. Winking in objects using ClearMake just ties you deeper and deeper into ClearCase hell.
Bite the bullet, rip it out and free yourself.
and hire someone who knows how to tune a Solaris box...
Remember, with Linux, you get what you pay for...
That's why it's free...
We had a similar problem, this may help..
We were using RH 7.2, and when doing a build of our java sourcecode it would regularly crash and then hang the jvm. We found out it was a problem with the very complex tables used by Clearcase that the automounter could not understand.
We solved the problem by updating the system with automounter 4 rc1 and installing the latest versions of libc6 2.2.4
The 6 second sleep issue was corrected many months ago. If you have an active support contract. Download the new version to correct it. (And yes, I use Clearcase LT with snapshot views).
Don't want to put down the commercial software bashers but who do you call to support your 'solutions' ? Rational has good customer support available anytime. We now have CCase clients running on the 7 platforms we port our software to, and clean integration with CQuest, a defect tracking product. Rational's products are expensive but management sees their solutions as supportable even when turnover ( and RIFs ) dilute the company's knowledge base. Not trolling, on to my suggestion. One thing you can try is mounting your dynamic view from your server directly to your linux box. This is what we use for unsupported kernel levels, assuming your server is *nix. Create, or modify, a dynamic view to be MVFS exportable with the '-nca' flag. Then when you use command 'cleartool lsview -long ' you will see 'View export ID (registry): 1'. On your CCase server, then use command '/usr/atria/etc/export_mvfs -I 1 /view//' to share this MVFS drive. Example: usr/atria/etc/export_mvfs -I 1 /view/john/vobs/test. At this point you can mount this view/VOB combination like it is a shared NFS drive.
Good luck, John
Give up on Clearcase, and switch to a less
screwed up system. Best I know of are
Perforce and Bitkeeper. Google uses Perforce
internally, and they're pretty bright people...
That is all.
--
The Homepage of Carl Laurence Gonsalves