The Linux Kernel Archives
Jeremy Andrews writes "KernelTrap offers an interesting look at the history behind the Linux Kernel Archives, home of the Linux kernel. They start from the beginning in 1997, when kernel.org ran on a generic "white box PC" using a shared T1, to the present where it runs on multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet. Much of the article is based on an interview with Peter Anvin, also including quotes from Linus Torvalds, Paul Vixie of Internet Systems Consortium, Inc who donates the bandwidth, and Matt Taggart of Hewlett-Packard who donated the hardware."
Very interesting...will have to check this out.
2006: generic "white box PC" using a shared T1 -- AGAIN
...especially having dealt with something like this (on a much smaller scale) recently.
We were having bandwidth limitations on RubyForge; it was getting up to 80 GB per month at the end of 2004. Mirroring out releases helped get usage back down to 15 GB per month. Many thanks to our mirror providers!
The Army reading list
"The normal bandwidth used by kernel.org is between 150 to 200 megabits per second"...
... "There has been discussion about making the logs available in an anonymized form, but it's not the top priority."
"When asked about viewing the actual access logs, Peter explained that although they do occasionally get requests from various sorts of researchers, they generally don't make them available for privacy reasons."
Perhaps the anonymous logs should be sold to pay for some of this juicy bandwidth they're consuming?
Haydn.
Time is an illusion. Lunchtime doubly so. - Douglas Adams
This was a great article! I can attest the there is quite a difference with the new hardware, I got a 500KBps download last night while downloading rc3-mm2.
Can we please have the same kind of article about slashdot hardware?
"think of it as evolution in action"
The 'kernel.org' domain name was picked because by that time in 1997 the more logical seeming Linux dot names were already taken. The Transmeta domain was intentionally not used to avoid creating the false perception that Transmeta owned Linux.
I wonder what would have happened with Transmeta and Linux if they had used the Transmeta domain to host the kernel archives. Would IBM have gotten involved with Linux? Would SCO have sued Transmeta instead of IBM? Would Linus have left Transmeta?
Seeing as google has thousands of boxes, my estimate would be that the combined google services pump out over 10Gb/s, rather than just 1Gb/s.
see a Text Widget
Referring to 32-bit systems, Peter noted, "we learned that the Linux load average rolls over at 1024. And we actually found this out empirically."
Can you even get the server to TELL you what the load is when it's that high?? That's INSANE!
teeker
It doesn't surprise me that being linked from slashdot is just a minor effect. A kernel package is tens of megabytes, while a single visit will likely consume less than 100KB.
see a Text Widget
DONT USE APACHE.
This was suggested. The kernel.org people didn't seem to have interest in it. Those light http servers are probably good for lots of small static html files. kernel.org is not like that - it needs to serve + 20 MB files and CD ISOs. Your benchmarks don't measure that. I can bet the kernel.org people knows what they're using and they'd have switched if it'd be really useful.
I don't have occasion to use rsync, and I'm not too familiar with its design, but I think it synchs directories by checksumming the files in them to see if they differ. So Peter is saying above that the server's bottleneck is checksumming. I would think that on a server like this, checksums could be cached - why checksum a stable file more than once? Once you have a checksum for linux-2.6.0.tar.bz2, why calculate it again?
This would require a bit of bookkeeping when files change, but wouldn't it be worth it on such a busy system? (Or am I confused?)