Slashdot Mirror


Clearcase Problems with Linux?

joecooler asks: "I work for an ASIC company in the verification group. We use VCS and Vera to write and run simulations, Clearcase for revision control, and LSF to manage our server farm. At my instigation my employer has begun to move to Linux PC's for our simulation server farm instead of the much more expensive and much slower Solaris Sun machines. Everything has been working well and everyone has been very pleased with the performance except for one 'small' problem - every two weeks or so we will suddenly see all jobs running on Linux machines crash. After much pain we have been able to isolate this to an issue with Clearcase returning files 'slowly' to the Linux machines, causing VCS compiles to die. Has anyone else had issues with Clearcase and Linux running on a PC? If so, how did you debug this and isolate the exact source of the problem? Is this solvable, or is it one of the mysteries of networking?"

5 of 32 comments (clear)

  1. Are you kidding me?! by Outland+Traveller · · Score: 5, Informative

    Clearcase is returning files slowly? I don't believe it!

    No personal offense intended. We've wrestled with the same problems ourselves and have ultimately decided to look at alternatives to clearcase.

    There's a couple big problems. The biggest one is that clearcase requires you to use a modified linux kernel, and they only provide stable modifications for a handful of older, stale kernels. If you want to keep up with security updates, you are on your own. If you want to update to a newer kernel that solves some device driver problem, forget it. If your product depends on you using a custom kernel like ours does, you are totally screwed. Unless rational finds some way to make their product work without requiring specific kernel versions, it will never be a good fit with Linux. Your stability problems may be caused by not using a Rational-approved kernel.

    The second huge problem with clearcase is not linux specific- it has to do with clearcase's architecture. Clearcase requires each client to use a proprietary NFS-like filesystem (MVFS) in order to interface nicely with the server. MVFS has a very high overhead both in terms of network traffic and server CPU time. It has poor security, poor performance, and poor reliability. Even on solaris it's ugly, and on rational's second tier systems such as Linux and Irix it's even worse. Imagine trying to maintain an entire closed-source network filesystem codebase just for one application. That's the problem that clearcase's development team faces, and I guess I can't fault them for not doing it well.

    Clearcase's architecture realistically limits your clients to being on the same local network with a persistent, always-on connection. In addition, the server needs to be a very expensive top-end solaris box. Also, if you want to support remote development you either have to wrestle with the unfriendly, unpolished "snapshot views" configuration or shell out huge dollars for a multisite license and a dedicated person to support it.

    If you are misfortunate enough to be stuck with an older or poorly performing network clearcase can be unusable. You absolutely must have high bandwidth, low latency paths between your clearcase server, build platforms, and clients. It sounds offhand like this may be your problem. Put in a direct (no hops) 100bT line between a linux client and the clearcase server, make sure the clearcase server isn't under heavy load from other people, and rerun your tests.

    Rational encourages you to use clearcase to manage your entire build operation, and version binaries and object files as well as source. This does has some benefits, but it makes already bad performance become downright abyssal and makes it very difficult to switch products once you realize Clearcase is no longer the right fit for your organization.

    Finally, Rational appears to be completely ignoring these shortcomings with clearcase on Unix. Over the last couple years they have ported Clearcase to Windows and rewritten all of the administration tools. However, the second-generation admin tools are WINDOWS ONLY. If you want to use tools that don't suck, you need a Windows box. I find it incredulous that rational had a cross-platform product, and when they had the opportunity to make cross platform tools using any number of high quality cross-platform libraries, they chose to go with one platform only. I've asked when the next generation tools will be ported back to Unix/Linux, and they have no plans to do that. I love the command line as much as any card-carrying unix geek, but I demand the best tools for the job. I don't like being on rational's second-class platform.

    To me, this underscores the fact that sales and marketing are running the show over at Rational. Rational aquires products so they can lock in customers, and then they scale back development and move on to the next product. Unfortunately people using clearcase on unix have invested so much time integrating clearcase into their workflow that the costs of changing to a different SCM platform are unbearable. Yet, if you look around, you will find competitors like Perforce and BitKeeper offering better products at orders of magnitude less license/maintenance fees. These competing products scale better, can be used over the internet easily, don't require a custom kernel (!!!), and require substantially less dedicated support staff to maintain.

    Shop around. Moving to Linux might be a good time to use something that works better and costs less than maintaining clearcase, even in the short term.

  2. responding to my own post by Outland+Traveller · · Score: 5, Informative

    Sorry to have ranted a little off subject to your original question.

    One thing you might try is switching your build machines to use snapshot views. This reduces the network overhead and allows for more disconnected style of operation. It's a huge win for compile-farms where you only want to pull recent files and rarely if ever commit changes back. Doing this may solve your reliability issues as well speed up compiles.

  3. They're sleeping by XavierPenguin · · Score: 5, Informative

    Do a strace of cleartool. After every file they have a 6 second sleep. So we just link in a glibc with sleep(6) overridden to not sleep. Works like a charm.

    We've supposedly opened a call with Rational, but I haven't heard anything.

    I can make the glibc binaries available that we use on my website if anyone is interested and doesn't want to go through the effort of recompiling glibc themselves.

    Now why would they be sleeping for 6 seconds when it doesn't appear to be necessary:
    1: conspiracy theory-- M$ told them to
    2: inept programming-- deadlock in their code, ahh, just put a sleep in to fix it
    3: smart programmer-- It's review time, need to make this faster, I know, change that sleep(6) to sleep(5).
    4: problem with Linux NFS... no can't be!

    1. Re:They're sleeping by XavierPenguin · · Score: 5, Informative

      A little more info from my notes in case anyone is interested. YMMV and don't blame me if your views get trashed, but we haven't seen any problems with this approach:

      To generate your own glibc for use by cleartool:

      - extract the src (this is for RedHat)

      mkdir my_glibc; cd my_glibc
      rpm2cpio | cpio -iumd
      tar jxf glibc-...-tar.bz2

      - edit sysdeps/unix/sysv/linux/sleep.c to just return 0 if seconds==6

      - make build dir within glibc-2.2... dir created by extract

      - from within build dir, configure and make

      cd my_glibc/glibc-2.2..../build ../configure --enable-add-ons=yes --without-cvs
      make

      - put the pieces together

      mkdir ~/myct
      cp my_glibc/glibc-2.2..../libc.so.X ~/myct

      create a wrapper script to execute cleartool using this glibc:

      #!/bin/bash
      LD_LIBRARY_PATH=~/myct:$LD_LIBRARY_ PATH
      exec cleartool ${*}

      - use it

      ~/myct/ct update

      Here's a stacktrace when cleartool is making the sleep call, showing that their sysutl_nfs_flush function is indeed calling a sleep(6), luckily I've overwritten the sleep(6) to return immediately:

      #0 0x409a9f01 in __libc_nanosleep () from /home/xp/glibc-2.2.4/build/libc.so.6
      #1 0x409a9e82 in __sleep (seconds=6) at ../sysdeps/unix/sysv/linux/sleep.c:85
      #2 0x40815c3d in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #3 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #4 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #5 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #6 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #7 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #8 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #9 0x407fc15f in fileutl_walk_tree_any () from /usr/atria/shlib/libatriaks.so
      #10 0x407fc389 in fileutl_walk_tree () from /usr/atria/shlib/libatriaks.so
      #11 0x407fdf69 in fileutl_cp () from /usr/atria/shlib/libatriaks.so
      #12 0x406d0482 in ws_copy_file () from /usr/atria/shlib/libatriaview.so
      #13 0x406d40a6 in ws_add_wso_file () from /usr/atria/shlib/libatriaview.so
      #14 0x406d44e9 in ws_add_wso () from /usr/atria/shlib/libatriaview.so
      #15 0x406d70e6 in ws_load_one_object () from /usr/atria/shlib/libatriaview.so
      #16 0x406d63fb in ws_load_dir_ents () from /usr/atria/shlib/libatriaview.so
      #17 0x406d72a0 in ws_load_one_object () from /usr/atria/shlib/libatriaview.so
      #18 0x406d63fb in ws_load_dir_ents () from /usr/atria/shlib/libatriaview.so
      #19 0x406d72a0 in ws_load_one_object () from /usr/atria/shlib/libatriaview.so
      #20 0x406d7678 in ws_load_one_scope () from /usr/atria/shlib/libatriaview.so
      #21 0x406d9c34 in ws_load_scopes () from /usr/atria/shlib/libatriaview.so
      #22 0x40120062 in cmd_update_subr () from /usr/atria/shlib/libatriacmd.so
      #23 0x4011f86c in cmd_update () from /usr/atria/shlib/libatriacmd.so
      #24 0x40050013 in cmdsyn_update () from /usr/atria/shlib/libatriacmdsyn.so
      #25 0x4002feea in cmdsyn_do_command () from /usr/atria/shlib/libatriacmdsyn.so
      #26 0x400300cd in cmdsyn_execv_dispatch () from /usr/atria/shlib/libatriacmdsyn.so
      #27 0x4044b92e in tool_main () from /usr/atria/shlib/libatriatool.so
      #28 0x080499cc in main ()

  4. Re:What's with all the custom kernel crap? by Anonymous Coward · · Score: 1, Informative

    ClearCase servers do not require MVFS, just the client machine that accesses the VOBs or dynamic views. On Linux to use MVFS you need to insmod a kernel module to support MVFS, this module is can be re-linked to support a slightly different version of kernel, however if any kernel structure sizes change it will not work.