Slashdot Mirror


Linux Kernel Archives Struggles With Git

NewsFiend writes "In May, Slashdot discussed Kerneltrap's interesting feature about the Linux Kernel Archives, which had recently upgraded to multiple 4-way dual-core Opterons with 24 gigabytes of RAM and 10 terabytes of disk space. KernelTrap has now followed up with kernel.org to learn how the new hardware has been working. Evidently the new servers have been performing flawlessly, but the addition of Linus Torvalds' new source control system, git, is causing some heartache by having increased the number of files being archived sevenfold."

6 of 45 comments (clear)

  1. This is normal. by A+beautiful+mind · · Score: 4, Insightful

    GIT is focused on trading more filespace for less bandwith. This is important for a lot of scattered developers who can afford 1-2 GB more on a harddrive, but 200-300 mb more would suck on a dsl or dialup connection.

    --
    It takes a man to suffer ignorance and smile
    Be yourself no matter what they say
  2. See? I Told You So! by Larry+McVoy · · Score: 3, Funny

    Bow before my might, l1nux l00s3rs!

  3. Filesystem? by RealBorg · · Score: 5, Interesting

    Maybe kernel.org should finally consider moving to a more appropriate filesystem than ext3, preferably reiserfs for it beeing optimized to handle a lot of small files. Tail packing not only saves disk space but more important a lot of memory in block cache.

    1. Re:Filesystem? by Yenya · · Score: 4, Informative
      Disclaimer: I run one of the kernel.org mirrors.

      Ext3 vs. Reiser is not an issue here. FWIW, I use XFS on my mirror volume, and I have also noticed how the git repository increases load on my server. See the CPU usage graph of ftp.linux.cz - look especially at the yearly graph and see how the CPU system time has been increasing for last two months.

      The problem is in rsync - when mirroring the remote repository it has to stat(2) every local and remote file. So the directory trees have to be read to RAM. Hashed or tree-based directories (reiserfs or xfs) can even be slower than plain linear ext3 directories, because you have to read the whole directory anyway, so linear read is faster.

      --
      -Yenya
      --
      While Linux is larger than Emacs, at least Linux has the excuse that it has to be. --Linus
  4. 10 TB by jo42 · · Score: 3, Funny
    That's a pretty decent sized pr0n collection they gots there...

    Kernel sources take up, what, only a handful of gigabytes?

  5. Re:why blame git? by rossifer · · Score: 4, Insightful

    (wouldn't it be cool to store data from your SQL tables in easy-to-parse flat files for instance? That would make recovery and manipulation a lot simpler)

    *snicker*

    *laugh*

    *great rolling peals of laughter*

    *sigh*

    *wipes tear from eye*

    You haven't done much work that actually required databases (or that would massively benefit from a relational programming model). The whole point of moving from flat files to a database is so that the data is stored already parsed, recovery is done by a tool provided by the db vendor, and manipulation is done within rules (constraints) that prevent "programming accidents" (bugs) or "pilot error" (users) from breaking relationships between parts of your data. That eliminates most of the need for recovery right there.

    CM systems get much more powerful and IMHO, simpler, when you start using a decent database as the backend. As for distributed work, there are plenty of good databases that inexpensively and easily fit onto any modern workstation (PostgreSQL is my personal favorite) that can act as a local backing store, giving you fully detached functionality and the benefits of a relationally organized system.

    Regards,
    Ross