Linux Kernel Archives Struggles With Git
NewsFiend writes "In May, Slashdot discussed Kerneltrap's interesting feature about the Linux Kernel Archives, which had recently upgraded to multiple 4-way dual-core Opterons with 24 gigabytes of RAM and 10 terabytes of disk space. KernelTrap has now followed up with kernel.org to learn how the new hardware has been working. Evidently the new servers have been performing flawlessly, but the addition of Linus Torvalds' new source control system, git, is causing some heartache by having increased the number of files being archived sevenfold."
GIT is focused on trading more filespace for less bandwith. This is important for a lot of scattered developers who can afford 1-2 GB more on a harddrive, but 200-300 mb more would suck on a dsl or dialup connection.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
There's no reason a modern filesystem shouldn't be able to efficiently store billions of files, basically a working as a high-speed key/value database that happens to work with all the existing Unix filesystem tools (ls, grep, etc).
I didn't read the article (duh, this is slashdot) but I hope this improves one of the Linux filesystems so it can support 7x as many files with the same performance. That opens up a lot of possibilities for easy development (wouldn't it be cool to store data from your SQL tables in easy-to-parse flat files for instance? That would make recovery and manipulation a lot simpler).
But yeah git makes a LOT of files!
Then he would be able to Git-R-Done
Overrated / Underrated : Moderation
Remind me again why switching from free bitkeeper was such a great idea??
`grep -r`ing source code under Subversion takes much longer than with CVS, due to all the .svn files.
I think Linus should kiss and make up with BitKeeper.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
(sightly) offtopic. wasn't reiser4 supposed to have 'plugin' support, so things like version control could be built directly into the file system? the prospect of being able to say type:
touch bar
echo 'foo' > bar
revisions bar
output of revision history
cp bar/revision/1 bar-version-1.0.backup
granted yes, the storage requirements and cpu usaged might be horrific, but i think something like this is inevitable in file systems, and certainly i welcome the day it becomes a reality.
- tristan
I've been struggling with stupid gits for years now. (Da-dum-dum). Thank you! I'll be here all week.
Don't just game, Dungeoneer
Aren't file system scalability issues why people start using databases?
Sounds like a software engineering issue.
On the flip-side, if kernel.org is using XFS, JFS, Reiserfs (I doubt they'd risk Reiser4 yet) or any other very high-performance filesystem, then maybe the problem is one of organization.
It is rare that you actually need large numbers of files holding very small amounts of data or metadata. What is probably wanted is a virtual layer that allows the software to see those many small files, but where the files are bundled together to be more efficient to access in this kind of a setting.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Bow before my might, l1nux l00s3rs!
Maybe kernel.org should finally consider moving to a more appropriate filesystem than ext3, preferably reiserfs for it beeing optimized to handle a lot of small files. Tail packing not only saves disk space but more important a lot of memory in block cache.
Kernel sources take up, what, only a handful of gigabytes?
So the rate of files being archived was multiplied by 128?
http://kerneltrap.org/node/5070
this interview with the maintainers has a comment from sombody who claims he asked by email and got the reply that ext3 is used
if thats not a good enough perhaps guessing that as "At this time, the servers run Fedora Core and use the 2.6 kernel provided by RedHat." they might be using ext3 that is the default.
Are you reading this man?
You're responsible for all the world's problems! The linux kernel, bitrot on my cds, war in Iraq, Guantanamo Bay, and now git!
Come on Linus, clean up your act!
(Sorry if this offends *anyone*)
Anything is possible, except skiing through revolving doors.
We did some tests comparing reiser3, xfs, and ext3 with the dir_index option on 2.6 kernels. We were writing thousands (ok tens of thousands) of small files into a couple of directories (specialized app, you don't want to know.)
When directories got large, ext3 with the hash lookups (between 800 and 1500 creations per second on newish hardware) ran much faster than xfs, oh and several orders of magnitude faster than ext3 without the directory hashing. reiser3 was slower than xfs.
We were thinking of going with xfs anyways, because it was so attractive that the directories would shrink when files were deleted (whereas ext3 directories stay big, with a hole in it.) but xfs would crash on us after a couple of days. So In March we chose ext3. We have approximately 9 million files in a single file system at the moment, it seems to work ok, but the system crashes every three weeks or so. We think we might have tortured it too much, and can reasonably keep only about 2 million files on-line, so we'll see if that helps.
of course, ymmv.
you don't want to know
But I _do_ want to know, waaahh!
I wonder if JFS would work, maybe it is too slow - I found that some xfs filesystems that I had got corrupted but even after power outages, etc my jfs partitions just keep plugging away.