The Linux Kernel Archives
Jeremy Andrews writes "KernelTrap offers an interesting look at the history behind the Linux Kernel Archives, home of the Linux kernel. They start from the beginning in 1997, when kernel.org ran on a generic "white box PC" using a shared T1, to the present where it runs on multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet. Much of the article is based on an interview with Peter Anvin, also including quotes from Linus Torvalds, Paul Vixie of Internet Systems Consortium, Inc who donates the bandwidth, and Matt Taggart of Hewlett-Packard who donated the hardware."
Boy what I could do with that and BitTorrent... *rubs palms together*
Yes, I was very surprised to discover that, in the past 8 years, kernel.org has had to expand their server performance and their bandwidth allocation. Why didn't they just keep the T1 and Pentium-66? I guess I will have to RTFA to find out why.
...especially having dealt with something like this (on a much smaller scale) recently.
We were having bandwidth limitations on RubyForge; it was getting up to 80 GB per month at the end of 2004. Mirroring out releases helped get usage back down to 15 GB per month. Many thanks to our mirror providers!
The Army reading list
Good Grief that's a lot of pipe! Saturating a PAIR of gig links? Certainly tends to make one stop and consider how many people are actually USING linux nowadays. Good to see!
Thinking outside my Head
Way to slow.
Mod +5 funny -5 irreverant
Err... I see ads... Is there a chance they break even on bandwidth if they get traffic of the level Slashdot gives instead of losing $$?
My little site.
"The normal bandwidth used by kernel.org is between 150 to 200 megabits per second"...
... "There has been discussion about making the logs available in an anonymized form, but it's not the top priority."
"When asked about viewing the actual access logs, Peter explained that although they do occasionally get requests from various sorts of researchers, they generally don't make them available for privacy reasons."
Perhaps the anonymous logs should be sold to pay for some of this juicy bandwidth they're consuming?
Haydn.
Time is an illusion. Lunchtime doubly so. - Douglas Adams
This was a great article! I can attest the there is quite a difference with the new hardware, I got a 500KBps download last night while downloading rc3-mm2.
Can we please have the same kind of article about slashdot hardware?
"think of it as evolution in action"
The 'kernel.org' domain name was picked because by that time in 1997 the more logical seeming Linux dot names were already taken. The Transmeta domain was intentionally not used to avoid creating the false perception that Transmeta owned Linux.
I wonder what would have happened with Transmeta and Linux if they had used the Transmeta domain to host the kernel archives. Would IBM have gotten involved with Linux? Would SCO have sued Transmeta instead of IBM? Would Linus have left Transmeta?
When Linus Torvalds purchased his first computer on which he began writing the Linux kernel, the state-of-the art PC with 4 megabytes of RAM and running at 33 megahertz was too expensive for him to buy outright.
Oh my god, it's a diesel!
-1, Disagree is not a valid option. Troll, Flamebait and Offtopic are not a substitute.
...it runs on multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet...
Do I smell a challenge?
I'm in awe of that box. It just pushes so much data, all the time. And 1000Mb/s of bandwidth?! That's more bandwidth than Google!*
* I strongly suspect this not to be true.
Get your own free personal location tracker
Exactly, why is copy pasting a paragraph from the linked article + crap formatting is "Interesting"?
I don't think that the linked article (last time i checked it wasn't) is slashdotted.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
Referring to 32-bit systems, Peter noted, "we learned that the Linux load average rolls over at 1024. And we actually found this out empirically."
Can you even get the server to TELL you what the load is when it's that high?? That's INSANE!
teeker
As we have some figures for the numbers of machines in the early days and surely we have the traffic figures for then as well...
We should be able to make a reasonable guess at the number of machines out there with Linux on them...
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
I'm really happy to see HP giving so much support. I'll definitely remember this the next time someone asks my opinion about what server hardware to buy.
It doesn't surprise me that being linked from slashdot is just a minor effect. A kernel package is tens of megabytes, while a single visit will likely consume less than 100KB.
see a Text Widget
...is get themselves mentioned on Slashdot on the same day that there's a simultaneous release of a major distribution and a Linux kernel.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Probably not.
If we're savvy enough to need a kernel tarball, we're savvy enough to run AdBlock, no?
Do not meddle in the affairs of geeks for they are subtle and quick to anger
Unless we're on a corporate link where we can't use anything but g*d*f*ing IE because the tech department is lazy, pathetically understaffed, stupid, incompetent, or REALLY pathetically understaffed and run by someone who doesn't quite care....
My little site.
However, in this post from hpa, it looks like the tools are not ready.
Love salty crackers? catchy electronica? Try !
DONT USE APACHE.
This was suggested. The kernel.org people didn't seem to have interest in it. Those light http servers are probably good for lots of small static html files. kernel.org is not like that - it needs to serve + 20 MB files and CD ISOs. Your benchmarks don't measure that. I can bet the kernel.org people knows what they're using and they'd have switched if it'd be really useful.
I agree. Whats worse is that HP really sucks at making computers. They're so focused on price that they tend to put low quality product in their machines......
it DOES run linux, and they've got several so you can't ask us to imagine a beowulf cluster either...
Does it run MSDOS?
FGD 135
I worked at Globix when we offered free bandwidth to kernel.org. In the beginning, when things were going well and we had hundreds of millions of dollars to spend, we used this to leverage our poision in the open source community. Of course, when the bubbled bursted Globix tried to get rid of all the free riders first. It was done very selectively, though. While some were cut loose as fast as possible (like kernel.org), others were kept because they had better connections to some of the executives. I don't want to name names, just that much: it is one of the better known nudie magazines. It was quite a qide to work at Globix. They are still around, barely.
If it's an announcement of a new kernel, it is likely that at least some percent of the /. crowd will download the new kernel.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
"where it runs on multiple quad Opterons each with 24 gigabytes of RAM, 10 terabytes of disk space, and a gigabit link to the internet"
Can you image a cluster of these...no wait...
I don't have occasion to use rsync, and I'm not too familiar with its design, but I think it synchs directories by checksumming the files in them to see if they differ. So Peter is saying above that the server's bottleneck is checksumming. I would think that on a server like this, checksums could be cached - why checksum a stable file more than once? Once you have a checksum for linux-2.6.0.tar.bz2, why calculate it again?
This would require a bit of bookkeeping when files change, but wouldn't it be worth it on such a busy system? (Or am I confused?)
...I guess that:
a) kernel.org doesn't think the 'enterprise readiness' of RH Enterprise Linux is that great, (vs what Fedora offers) even in what should be considered one of the most mission-critical sites in the Linux ecosystem (or that the difference with Fedora is worth paying for)
b) No one at RH is bright enough to be embarassed by this and offer kernel.org some free licenses...
I'm really happy to see HP giving so much support. I'll definitely remember this the next time someone asks my opinion about what server hardware to buy.
I believe that was the reaction HP's marketing department also expected. Admittedly, providing the hardware was a very nice gesture, but in reality, it's a brilliant marketing move.
Furthermore, I hope you will take other factors and datapoints into consideration when someone asks you for your advice, though. The servers donated were relatively high-end -- they might or might not be reflective of all HP hardware.
I'm Trappped at Berkeley.
We have, indeed, considered that, but it'd not really buy us anything. Earlier Apaches would sit on a lot of memory while serving large files, but current versions just have a thread sitting in sendfile(), which is just about as lightweight as you get.
Sure, the startup cost of the transaction is higher than for a lightweight HTTP server, but the startup cost of the transaction isn't a big deal for us, and we appreciate the flexibility that Apache offers.
thats where we need somekind of link between cvs and a peer to peer system.
Please name the last president that hasn't placed "cronies" in high ranking positions. I doubt you will find one after 1970.
1970???? Try 1829 (Andrew Jackson).
And don't forget that Kennedy put his brother in as Attorney General.
"I don't know, therefore Aliens" Wafflebox1
the torrent would sure be a releaf.
... resist ... urge ... to ... ask ... why ... bit ... torrent ... would ... put ... leaves ... back ... on trees.
Must
"I don't know, therefore Aliens" Wafflebox1
At least on Linux, the load average includes
processes waiting for in uninterruptable sleep.
That would be disk IO, mostly.
Without having ever looked at the apache code I'd like to ask anyways:
Doesn't apache do some additional stuff per request that others don't?
Things like check whether authentication should be requested and the like?
I might be mistaken but if not - don't these things amount to a significant overhead, esp. when talking.. uh.. "lots" of requests per second?
Can't you plug in a Keychain?
I know where my wife works they won't let her use USB ports, which is a real bind when it comes to taking files to work sometimes.
Do not meddle in the affairs of geeks for they are subtle and quick to anger
Err... I think they've got the DNS passworded or something. I'm not that attached to any of the data I use at work, seeing as I'm a temp / contractor / whatever, so I don't give a damn what kind of spyware is on my machine... I like firefox quite a bit more, but I get a NT security style login prompt every time I try to load it up. I guess I could track down whoever needs this password and use the other browser, but wI spend so much time messing with red tape I can't imagine spending any more for software I want to use on my own.
My little site.
What's different about rsync is that it does not ordinarily use a single file checksum (and therefore copy whole files if changed). Instead, to save bandwidth, it uses a more sophisticated system to ensure that only changed parts of a file are transmitted - and it detects changed parts by comparing (many) checksums, I believe. The report sums it up like this:
(Disclaimer, I have only skimmed the rsync report and that was some time ago, but I am a longtime and happy rsync user.)
you had me at #!