Domain: sourceforge.net
Stories and comments across the archive that link to sourceforge.net.
Stories · 1,414
-
Hanna Montana Linux
boisepunk writes "OS News has reported on what some would consider an abomination. Hannah Montana Linux is reportedly a Linux Distro made specifically for fans of Hannah Montana. It is speculated that Disney may shut this down." -
The "Doctor Who" Model of Open Source
Glyn Moody writes "Open source projects are generally fine when there's a long-term leader like Linus; but what happens when nobody is able or willing to run things for extended periods? Peter Murray-Rust explains how the open chemistry group known as the Blue Obelisk has evolved what he calls the 'Doctor Who Model of Open Source': 'You'll recall that every few years something fatal happens to the Doctor and you think he is going to die and there will never be another series. Then he regenerates. The new Doctor has a different personality, a different philosophy (though always on the side of good). It is never clear how long any Doctor will remain unregenerated or who will come after him. And this is a common theme in the Blue Obelisk.' Could other open source projects learn from this experience as long-term leaders start to move on?" -
Better Tools For Disabled Geeks?
layabout writes "We've seen tremendous advances in user interfaces over the past few years. Unfortunately, those UIs and supporting infrastructure exclude the disabled. In the same timeframe there has been virtually no advance in accessibility capabilities. It's the same old sticky keys, unicorn stick, speech recognition, text-to-speech that kind-of, sort-of, works except when you need to work with with real applications. Depending on whose numbers you use, anywhere from 60,000 to 100,000 keyboard users are injured every year — some temporarily, some permanently. In time, almost 100% of keyboard users will have trouble typing and using many if not all mobile computing devices. My question to Slashdot: Given that some form of disability is almost inevitable, what's keeping you from volunteering and working with geeks who are already disabled? By spending time now building the interfaces and tools that will enable them to use computers more easily, you will also be ensuring your own ability to use them in the future." Follow the link for more background on this reader's query.
This question is aimed mostly at the kind of disability we are susceptible to and I have been living with for the past 15 years. Even though we have speech recognition, it doesn't solve any problem except writing text. There have been a couple of attempts at making speech recognition more useful to programmers [0], but they have failed. The needs are clear:
[1] A working full-vocabulary, continuous recognition system on Linux.
[2] Tools that don't expect you to "speak the keyboard."
[3] Tools that let you edit as well as create code.
So why don't more geeks work on securing their own future, or at the very least, work to help their fellow geeks to stay on the economic ladder?
[0] VoiceCode and VR-Mode: VoiceCode or is an amazing piece of work. It makes it possible for a disabled programmer to generate Python code very quickly. Unfortunately, it does not solve the editing problem. Even more unfortunately, it's hand-wearingly complicated to set up and get working. VR-Mode makes it possible to use Naturally Speaking's "Select and Say" mode in Emacs — that is, if you can get it to work. It seems to have drifted into non-functionality as Emacs has moved forward.
[1] Naturally Speaking works well, is reasonably cheap, and works somewhat under Wine today. If we can make it work reliably under Wine, it solves the problem in months rather than decades. Other tools such as Sphinx 1-4 are great IVR systems if you have a vocabulary and grammar under 15,000 words. In contrast, Naturally Speaking's working vocabulary is in the 100,000-word range. Any disabled user will choose Naturally Speaking because it works so much better than the nearest alternative. We have people who are injured now and need these tools. They can't afford to wait 10 years or more for an OSS solution.
[2] "Speaking the keyboard" refers to speech user interfaces developed by people who don't use speech recognition. They expect you to say too much, which creates a vocal form of RSI — see [3]. Listen to what disabled users do, not to what you think they should speak.
[3] See VoiceCode in [0]. Unfortunately, today's tools are only for writing code, not correcting code. Code correction is a very different process and must be spoken in a different way: "change index" instead of "search forward left bracket leave mark search forward right bracket copy region." This is also an example of "speaking the keyboard." -
Better Tools For Disabled Geeks?
layabout writes "We've seen tremendous advances in user interfaces over the past few years. Unfortunately, those UIs and supporting infrastructure exclude the disabled. In the same timeframe there has been virtually no advance in accessibility capabilities. It's the same old sticky keys, unicorn stick, speech recognition, text-to-speech that kind-of, sort-of, works except when you need to work with with real applications. Depending on whose numbers you use, anywhere from 60,000 to 100,000 keyboard users are injured every year — some temporarily, some permanently. In time, almost 100% of keyboard users will have trouble typing and using many if not all mobile computing devices. My question to Slashdot: Given that some form of disability is almost inevitable, what's keeping you from volunteering and working with geeks who are already disabled? By spending time now building the interfaces and tools that will enable them to use computers more easily, you will also be ensuring your own ability to use them in the future." Follow the link for more background on this reader's query.
This question is aimed mostly at the kind of disability we are susceptible to and I have been living with for the past 15 years. Even though we have speech recognition, it doesn't solve any problem except writing text. There have been a couple of attempts at making speech recognition more useful to programmers [0], but they have failed. The needs are clear:
[1] A working full-vocabulary, continuous recognition system on Linux.
[2] Tools that don't expect you to "speak the keyboard."
[3] Tools that let you edit as well as create code.
So why don't more geeks work on securing their own future, or at the very least, work to help their fellow geeks to stay on the economic ladder?
[0] VoiceCode and VR-Mode: VoiceCode or is an amazing piece of work. It makes it possible for a disabled programmer to generate Python code very quickly. Unfortunately, it does not solve the editing problem. Even more unfortunately, it's hand-wearingly complicated to set up and get working. VR-Mode makes it possible to use Naturally Speaking's "Select and Say" mode in Emacs — that is, if you can get it to work. It seems to have drifted into non-functionality as Emacs has moved forward.
[1] Naturally Speaking works well, is reasonably cheap, and works somewhat under Wine today. If we can make it work reliably under Wine, it solves the problem in months rather than decades. Other tools such as Sphinx 1-4 are great IVR systems if you have a vocabulary and grammar under 15,000 words. In contrast, Naturally Speaking's working vocabulary is in the 100,000-word range. Any disabled user will choose Naturally Speaking because it works so much better than the nearest alternative. We have people who are injured now and need these tools. They can't afford to wait 10 years or more for an OSS solution.
[2] "Speaking the keyboard" refers to speech user interfaces developed by people who don't use speech recognition. They expect you to say too much, which creates a vocal form of RSI — see [3]. Listen to what disabled users do, not to what you think they should speak.
[3] See VoiceCode in [0]. Unfortunately, today's tools are only for writing code, not correcting code. Code correction is a very different process and must be spoken in a different way: "change index" instead of "search forward left bracket leave mark search forward right bracket copy region." This is also an example of "speaking the keyboard." -
PLplot Notes Its 10,000th Commit
iliketrash writes "From the PLplot development team is the announcement of their 10,000th commit: 'PLplot is a cross-platform software package for creating scientific plots that has been in continuous development since its inception 17 years ago. On May 23, 2009 the PLplot developers quietly celebrated our ten thousandth commit since our initial software repository was populated back in May 1992. This longevity puts PLplot in some select company amongst open-source software projects. We may even be unique within this group because all PLplot development has been done by volunteers in their spare time. The enthusiasm for PLplot development continues; we have averaged more than 100 commits per month over the last year which is double our 17-year average, and we are looking forward to the celebration of our next ten thousand commits!'" -
PLplot Notes Its 10,000th Commit
iliketrash writes "From the PLplot development team is the announcement of their 10,000th commit: 'PLplot is a cross-platform software package for creating scientific plots that has been in continuous development since its inception 17 years ago. On May 23, 2009 the PLplot developers quietly celebrated our ten thousandth commit since our initial software repository was populated back in May 1992. This longevity puts PLplot in some select company amongst open-source software projects. We may even be unique within this group because all PLplot development has been done by volunteers in their spare time. The enthusiasm for PLplot development continues; we have averaged more than 100 commits per month over the last year which is double our 17-year average, and we are looking forward to the celebration of our next ten thousand commits!'" -
Beginning Python Visualization
aceydacey writes "Sometimes a picture is worth a thousand words. Beginning Python Visualization: Creating Visual Transformation Scripts, published in February 2009 by Apress, shows how Python and its related tools can be used to easily and effectively turn raw data into visual representations that communicate effectively. The author is Shai Vaingast, a professional engineer and engineering manager who needed to train scientists and engineers to do this kind of programming work. He was looking for a tutorial and reference work, and unable to find a suitable text, wound up writing his first book. He writes in the easy and clear style of someone comfortable and engaged with the subject matter." Keep reading for the rest of aceydacey's review. Beginning Python Visualization: Crafting Visual Transformation Scripts author Shai Vaingast pages 363 publisher Apress rating 9/10 reviewer aceydacey ISBN 1430218436 summary learn how to process, organize, and visualize data from various sources using the Python language The book uses several very specific examples that illustrate general principles.
The first example is using GPS data. By using Python one can extract data from GPS receivers and enter it into the computer and manipulate it to do what one wants including creating graphs and charts. In this section he shows how to use CSV, comma separated values, as a most useful file format. He shows show to extract data from real world GPS devices and import it via serial ports and the PySerial module. It would be easy for the reader to duplicate and extend this project.
The heart of the book is coverage of useful examples utilizing MatPlotLib, NumPy and SciPy. These related tools are easy to use and fully integrated with Python. MatPlotLib is for plotting data and graphs, including interactive graphs and image files. NumPy is a powerful math library comparable to commercial tools like MatLab, and SciPy extends NumPy to for the sciences. Examples are numerous and include signal analysis using Fourier transforms.
There is also a section on Image Processing using PIL, the Python Imaging Library. This is used for relatively simple image cropping and sizing and also for bit by bit image processing. Interpolation and curve fitting are also well covered. For anyone wanting an introduction to graphical analysis of statistical data, this would be an excellent resource.
The author is obviously a professional in this field. He has a knack for good organizational style and a pragmatic approach to the work. In the book he says "Most of the time, research is organized chaos. The emphasis, however, should be on organized, not chaos." A real value I got from the book is a better understanding of data files, format, and organization as well as methods and guidelines for selecting file formats and storing and organizing data to enable fast and efficient data processing. It is obvious that this book was written by a practicing engineer.
The theme of the book is that Python can be an all purpose environment for data manipulation and visualization, using nothing but free and open source tools that are easily integrated and scriptable without using multiple programming languages. The book should be an invaluable tool for scientists and engineers but it is also easily accessible to anyone interested in math and data analysis. There is no need for an advanced math background. While, as a matter of full disclosure, I have undergraduate degrees in Math and Physics, I feel the book should be easily accessible to anyone with a solid high school math background who is seriously interested in the subject. The book contains a short introductory tutorial on the basics of Python so anyone familiar with programming in any language should be fine.
The book is an easy read from front to back, and I am sure it will also be a good reference resource for the future. The writing style is very clear and unforced and I found surprisingly few errors. While the Python world has a surplus of introductory and general books, books covering this kind of specific domain are especially welcome, and we could use more on other topics by competent authors.
At 363 pages the book is a surprisingly fast read. Its methodology is to use specific, short code examples to make all the key points. Most of the code samples are well selected, short and written in clear, concise Python. This is not the kind of book that overwhelms you with massive amounts of code. Either the book was well edited or else it was written by an exceptionally lucid thinker, or both.
So, if you want to learn how to process, organize, and visualize data from various sources using the Python language, I recommend this book to you. I have also posted a podcast of an interview with the author at Python411
You can purchase Beginning Python Visualization: Crafting Visual Transformation Scripts from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Beginning Python Visualization
aceydacey writes "Sometimes a picture is worth a thousand words. Beginning Python Visualization: Creating Visual Transformation Scripts, published in February 2009 by Apress, shows how Python and its related tools can be used to easily and effectively turn raw data into visual representations that communicate effectively. The author is Shai Vaingast, a professional engineer and engineering manager who needed to train scientists and engineers to do this kind of programming work. He was looking for a tutorial and reference work, and unable to find a suitable text, wound up writing his first book. He writes in the easy and clear style of someone comfortable and engaged with the subject matter." Keep reading for the rest of aceydacey's review. Beginning Python Visualization: Crafting Visual Transformation Scripts author Shai Vaingast pages 363 publisher Apress rating 9/10 reviewer aceydacey ISBN 1430218436 summary learn how to process, organize, and visualize data from various sources using the Python language The book uses several very specific examples that illustrate general principles.
The first example is using GPS data. By using Python one can extract data from GPS receivers and enter it into the computer and manipulate it to do what one wants including creating graphs and charts. In this section he shows how to use CSV, comma separated values, as a most useful file format. He shows show to extract data from real world GPS devices and import it via serial ports and the PySerial module. It would be easy for the reader to duplicate and extend this project.
The heart of the book is coverage of useful examples utilizing MatPlotLib, NumPy and SciPy. These related tools are easy to use and fully integrated with Python. MatPlotLib is for plotting data and graphs, including interactive graphs and image files. NumPy is a powerful math library comparable to commercial tools like MatLab, and SciPy extends NumPy to for the sciences. Examples are numerous and include signal analysis using Fourier transforms.
There is also a section on Image Processing using PIL, the Python Imaging Library. This is used for relatively simple image cropping and sizing and also for bit by bit image processing. Interpolation and curve fitting are also well covered. For anyone wanting an introduction to graphical analysis of statistical data, this would be an excellent resource.
The author is obviously a professional in this field. He has a knack for good organizational style and a pragmatic approach to the work. In the book he says "Most of the time, research is organized chaos. The emphasis, however, should be on organized, not chaos." A real value I got from the book is a better understanding of data files, format, and organization as well as methods and guidelines for selecting file formats and storing and organizing data to enable fast and efficient data processing. It is obvious that this book was written by a practicing engineer.
The theme of the book is that Python can be an all purpose environment for data manipulation and visualization, using nothing but free and open source tools that are easily integrated and scriptable without using multiple programming languages. The book should be an invaluable tool for scientists and engineers but it is also easily accessible to anyone interested in math and data analysis. There is no need for an advanced math background. While, as a matter of full disclosure, I have undergraduate degrees in Math and Physics, I feel the book should be easily accessible to anyone with a solid high school math background who is seriously interested in the subject. The book contains a short introductory tutorial on the basics of Python so anyone familiar with programming in any language should be fine.
The book is an easy read from front to back, and I am sure it will also be a good reference resource for the future. The writing style is very clear and unforced and I found surprisingly few errors. While the Python world has a surplus of introductory and general books, books covering this kind of specific domain are especially welcome, and we could use more on other topics by competent authors.
At 363 pages the book is a surprisingly fast read. Its methodology is to use specific, short code examples to make all the key points. Most of the code samples are well selected, short and written in clear, concise Python. This is not the kind of book that overwhelms you with massive amounts of code. Either the book was well edited or else it was written by an exceptionally lucid thinker, or both.
So, if you want to learn how to process, organize, and visualize data from various sources using the Python language, I recommend this book to you. I have also posted a podcast of an interview with the author at Python411
You can purchase Beginning Python Visualization: Crafting Visual Transformation Scripts from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Java Program Uses Neural Networks To Monitor Games
tr0p writes "Java developers have used the open source Neuroph neural network framework to monitor video game players while they play and then provide helpful situational awareness, such as audio queues when a power-up is ready or on-the-fly macros for combo attacks. The developers have published an article describing many of the technical details of their implementation. 'There are two different types of neural networks used by DotA AutoScript. The first type is a simple binary image classifier. It uses Neuroph's "Multi-Layer Perceptron" class to model a neural network with an input neurons layer, one hidden neurons layer, and an output neurons layer. Exposing an image to the input layer neurons causes the output layer neurons to produce the probability of a match for each of the images it has been trained to identify; one trained image per output neuron.'" -
The Long-Term Impact of Jacobsen v. Katzer
snydeq writes "Lawyer Jonathan Moskin has called into question the long-term impact last year's Java Model Railroad Interface court ruling will have on open source adoption among corporate entities. For many, the case in question, Jacobsen v. Katzer, has represented a boon for open source, laying down a legal foundation for the protection of open source developers. But as Moskin sees it, the ruling 'enables a set of potentially onerous monetary remedies for failures to comply with even modest license terms, and it subjects a potentially larger community of intellectual property users to liability.' In other words, in Moskin's eyes, Jacobsen v. Katzer could make firms wary of using open source software because they fear that someone in the food chain has violated a copyright, thus exposing them to lawsuit. It should be noted that Moskin's firm has represented Microsoft in anti-trust litigation before the European Union." -
Open Source Shooter Nexuiz 2.5 Released
Michael writes "A new version of Nexuiz, a GPL-licensed, first-person shooter, has been released. There are over 3,000 changes in Nexuiz 2.5, including new maps, new game-modes, enhanced graphics, new audio, and other major changes. Phoronix has posted a preview of this Nexuiz 2.5 release, with screenshots showing the impressive graphics and how it has raised the bar for open-source gaming. Details about the Nexuiz project are available at SourceForge." -
ScummVM 0.13.0 Delivers New Adventure Games
KingofGnG writes "The classics, by definition, never go out of fashion, let alone if they are the graphic adventures of past decades. The preferred tool of true adventurers is ScummVM, software that works as an interpreter between data files of such adventures and modern operating systems. 6 months after the release of version 0.12.0, developers have now delivered a new main release of the virtual machine, which includes novelties both for the interface and supported games." -
How Long Should an Open Source Project Support Users?
Ubuntu Kitten writes "Since October the community-generated database of cards known to work with Ndiswrapper has been down. This is apparently due to an on-going site redesign, but right now the usual URL simply directs to a stock Sourceforge page. Without the database, the software's usability is severely diminished but this raises an interesting question: Is an open source project obliged to provide support for its users? If so, for how long should the support last? Web servers cost money, especially for popular sites. While developers can sometimes find sponsorship, is it possible to get sponsorship simply for infrastructure and user services?" -
Scripting In Commodore BASIC For Windows & Linux
SomeoneGotMyNick writes "Someone more nostalgic than I am, and with a lot of time on their hands, had created a scripting language based on Commodore BASIC for Mac OS X. They recently finished a version that works on Windows and Linux. You can pass the text of a BASIC program as a parameter to the program. I found it odd that it took 1.8 MB of source code to compile to an interpreter that used to fit in 8K of ROM space. If this ever becomes popular, perhaps we'll see Obfuscated CBM BASIC contests." In a simliar vein, in the comments someone points out what is essentially an open source AmigaOS Classic. -
An Open Source Legal Breakthrough
jammag writes "Open source advocate Bruce Perens writes in Datamation about a major court victory for open source: 'An appeals court has erased most of the doubt around Open Source licensing, permanently, in a decision that was extremely favorable toward projects like GNU, Creative Commons, Wikipedia, and Linux.' The case, Jacobsen v. Katzer, revolved around free software coded by Bob Jacobsen that Katzer used in a proprietary application and then patented. When Katzer started sending invoices to Jacobsen (for what was essentially Jacobsen's own work), Jacobsen took the case to court and scored a victory that — for the first time — lays down a legal foundation for the protection of open source developers. The case hasn't generated as many headlines as it should." -
Thomson Reuters Sues Over Open-Source Endnote-Alike Zotero
Noksagt writes "Thomson Reuters, the owner of the Endnote reference management software, has filed a $10 million lawsuit and a request for injunction against the Commonwealth of Virginia. Virginia's George Mason University develops Zotero, a free and open source plugin to Mozilla Firefox that researchers may use to manage citations. Thomson alleges that GMU's Center for History and New Media reverse engineered Endnote and that the beta version of Zotero can convert (in violation of the Endnote EULA) the proprietary style files that are used by Endnote to format citations into the open CSL file format." -
Classic Shooters Heretic and Hexen Released Under GPL
phanboy_iv writes "Fans of both of the Raven classics, Heretic and Hexen, have been trying for almost a decade to convince Raven Software to release engine source code for the games under the GPL, much like the DOOM engine on which both of them are based. Well, they finally did it! Source code is available at Sourceforge. Both of these games have had the source available for a while, but under a restrictive license that hindered ports and modifications. Now, thanks to dedicated fans, that's no longer a problem." -
Strong Court Ruling Upholds the Artistic License
dilute writes "The US Court of Appeals for the Federal Circuit (an authoritative court that normally deals with patent law), has issued a strong ruling (PDF) upholding the Artistic License in a copyright dispute between the developers of the Java Model Railroad Interface (JMRI), and Kamind, a company that used portions of DecoderPro to develop a competing product. The product at issue was DecoderPro, an open source project released on SourceForge under the Artistic License, for interfacing with model railroad control chips. Kamind used a number of DecoderPro files in developing its product, Decoder Commander. However, Kamind did not comply with the Artistic License in a number of respects, including attribution, copyright notices, tracked changes or availability of the underlying standard version." Read on for more, below. Dilute continues: "The lower court denied relief, saying that the Artistic License merely imposed 'contractual' promises, and that a violation did not constitute copyright infringement (any contract-based relief would probably have been meaningless). In a strong ruling, the Federal Circuit found that the Artistic License is legally enforceable, that its terms constituted 'conditions' for reliance on the license, and consequently that a violation of those conditions would put the violating product outside the license and thus make the violator a copyright infringer, potentially liable for an injunction. The case lays out a clear and compelling description of the rationale for open source, and reflects a complete willingness by the court to lend the force of law to these licenses." Reader ruphus13 point to Lawrence Lessig's commentary on the ruling; Lessig calls it "huge and important news," and notes that the reasoning is generalizable to the GPL and other Free software licenses, as well. -
Diagramming Tool For SQL Select Statements
alxtoth writes "Snowflake is a new BSD-licensed tool that parses SQL Select statements and generates a diagram. It shows parts of the underlying SQL directly in the diagram. For example: x=30, GROUP BY (year), SUM (sales), HAVING MIN (age) > 18. The primary reason for the tool was to avoid Cartesian joins and loops in SQL written by hand, with many joined tables. The database will execute such a statement, if syntactically correct, resulting in runaway queries that can bring the database down. If you sit close to the DBAs, you can hear them screaming... " -
EFF Releases Tool For Testing ISP Interference
Placid notes that the EFF has announced Switzerland, a tool for testing if your ISP is interfering with your Net connection (e.g. by resetting BitTorrent transfers). It's command-line only at this point. Of course the tool is FOSS, and you can contribute to it via its SourceForge project. From the announcement: "Developed by the Electronic Frontier Foundation, Switzerland is an open source software tool for testing the integrity of data communications over networks, ISPs, and firewalls. It will spot IP packets which are forged or modified between clients, inform you, and give you copies of the modified packets." -
Getting Inked for Tux at OSCON
OSCON isn't just a gathering for talks on topics like Creating Location-aware Web 2.0 Applications on an Open Source Geospatial Platform and fightin' words from the stage; it's also an excuse for some interesting social gatherings, like this year's Community Choice awards (organized and sponsored by the corporate overlords at SourceForge, as you might recall, and with Slashdot's own special category), at which, among other festive activities, attendees were offered the chance to get open-source-related tattoos. There are shots of some of these up on the SourceForge Community pages, and — with some overlap — even more in this set at Flickr. (My pasty bicep^h^h^h^h^h shoulder is the one now adorned with a circled head of a happy Tux ala IBM; I was expecting it to hurt more than it actually did.) Anyone with techie tattoos, please disclose below. -
Slashdot Discussion System Updates
This week we have a few new functions for you comment readers guaranteed to amaze and enchant. Or at least to make your day a little more efficient. The biggest update is that the system should remember what comments you've already read (for a few weeks anyway) but there's some other less interesting stuff as well. Hit the link below to read more.So D2 now remembers what you have read. This will mostly be useful to readers who use the key bindings to navigate -- we didn't really want to guess if you've read something, but if you use the WASD keys to navigate, moving on from a comment flags it as read. Read comments are slightly faded, and if you re-enter a discussion a few hours later, it should remember what you've read.
We've simplified comment retrieval as well. If you get to the 'End' of a discussion and try to get more comments (either by clicking one of the various 'More' links, or by pressing a keybinding like S or D that tells us to move on to the next comment) a dialog box will show up asking you if you would like to lower your threshold. So if you normally read at Score:4, and read to the end of the Score:4 comments, it will offer to lower your threshold to Score:3 either for all time, or just for this page. This means you don't need to constantly raise and lower your threshold to handle discussions of different sizes. This works really nicely.
Lastly is a user preference in the pref pane labeled 'Collapse Comments After Reading.' I'm actually considering making this one on by default but I'm open to feedback. It does what it says -- after you've navigated off a comment (using the keybindings again), it collapses the comment you just left. This makes it very easy to keep your place in a discussion as it grows. This is especially useful in discussions where you want to leave a tab open for several hours, or else come back later and figure out what's new.
There are undoubtedly bugs: feel free to email me or post them to the bug tracker. Thanks to pudge for hacking all this stuff too. Especially the bugs -- he wrote those first.
-
Using AI With GCC to Speed Up Mobile Design
Atlasite writes "The WSJ is reporting on a EU project called Milepost aimed at integrating AI inside GCC. The team partners, which include include IBM, the University of Edinburgh and the French research institute, INRIA, announced their preliminary results at the recent GCC Summit, being able to increase the performance of GCC by 10% in just one month's work. GCC Summit paper is provided [PDF]." -
Brightnets are Owner Free File Systems
elucido writes "OFF, or the Owner-Free Filesystem is a distributed filesystem in which everything is stored in reference to randomized data blocks, as opposed to a 1:1 copy of the original data being inserted. The creators of the Owner-Free Filesystem have coined a new term to define the network: A brightnet. Nobody shares any copyrighted files, and therefore nobody needs to hide away. OFF provides a platform through which data can be stored (publicly or otherwise) in a discreet, distributed manner. The system allows for personal privacy because data (blocks) being transferred from peer to peer do not bear any relation to the original data. Incidentally, no data passing through the network can be considered copyrighted because the means by which it is represented is truly random." Their main wiki page discusses a bit of what this means and how it might work as well. I've been saying that we need this for many years now, if only because we all have 10 gigs free on our machines and if we could RAID the internet we'd need fewer hard drives. -
Sourceforge.net Blocked In Mainland China
gzipped_tar contributed a link to Moonlight Blog, which says that "SourceForge, the world's largest development and download repository of Open Source code and applications, appears to be blocked in Mainland China. The current blocking may be related to the recent anti-China protests of Beijing Olympic Games, which will begin on 8 August. Some days before, a very popular free source code editor in SourceForge named Notepad++ start to boycott Beijing 2008. The project's developer said that the action is not against Chinese people, but against Chinese government's repression against Tibetan unrest earlier in this year. SF.net has once been banned by China in 2002. However, the ban was lifted later in 2003." gzipped_tar adds: "As a SourceForge user in Beijing, I can confirm this first-hand. I also tried traceroute to sourceforge.net, only to find the connection being dropped at a Beijing ISP's gateway router. It appears that the projects' respective homepages are available even if they are hosted by SF, but the summary and download pages are blocked." (As you probably know, Slashdot and Sourceforge share a corporate overlord.) -
Tru64 Unix Advanced File System (AdvFS) Now GPL
melios writes "In a move that could help boost the scalability of Linux for grids and other advanced 64-bit multiprocessor applications, HP has released its Tru64 Unix Advanced File System (AdvFS) source code to the open source community. Source code, design documentation, and test suites for AdvFS are available on SourceForge." -
Wine 1.0 — Uncorked After 15 Years
pshuke writes "After 15 years of development, Wine version 1.0 has been released. Wine is an Open Source implementation of the Windows API on top of X, OpenGL, and Unix. While perfect windows compatibility has not yet been achieved, full support for Photoshop CS2, Excel Viewer 2003, Word Viewer 2003 and PowerPoint Viewer 2003 have been among the goals prior to the release. For further information about supported applications, head over to the appdb. Get it (source) while it's hot." -
Community Choice Award "Most Likely to be Shut Down By Govt"
Last week we took nominations for a Slashdot category at the SourceForge Community Choice awards. Our category was 'Most Likely to be Shut Down By Government Agency'. Your nominations were tallied, and we arbitrarily selected a few that we think are the best. Today is the day where you can at long last determine the winner, using the incredibly scientifically accurate Slashdot Poll. Our nominees are Truecrypt, EFF Patent Busting, GNU Software Radio, WikiLeaks, Cryptome.org, Tor, Freenet, and CowboyNeal. -
Games Come to Pidgin
Tovok7 writes "Free software instant messengers have long been lacking the support to play games with your friends. The waiting is finally over, because today Pidgin Games was released. It comes as plugins for the popular Instant Messenger Pidgin and is running under Linux and Windows. The special thing about Pidgin Games is that it is written in the new programming language Vala which has a C# like syntax, but compiles to pure C." -
Nominations Open For "Most Likely to be Shut Down By Government"
The corporate overlords at SourceForge asked me to name a Slashdot category for their upcoming Community Choice Awards and to let you guys select the winner. I have named my category "Most Likely to be Shut Down by a Government Agency." We're going to run this like we do an Ask Slashdot call for questions — post your nominations into the comments here. Use moderation to send up good ideas. In the upcoming days we'll post another story where you can vote on the actual winner. Nominations need to include the project name, a link to some sort of official website, and a paragraph of why you think they deserve to win. The project that wins will gain fame, notoriety, and maybe a cease and desist order that they could print out and frame if they had that kind of time. -
Help Slashdot Test Our New Data Center
After many years of living in California, Slashdot is preparing to move to a new data center in Chicago, and we need your help. We have our new site running a dump of our database from a few days ago. You can hit it at beta.slashdot.org. Please go there, post comments, submit stories, and do whatever you do normally. Or maybe abnormally — run crawlers, write poll spamming robots or something. If you find any crazy issues, please submit them to our sourceforge tracker. If you're curious, the new system features 18 2x quad-core 2.3 GHz webservers each with 8 gigs of RAM, and 4 quad-core 2.3 GHz databases with 16 gigs of RAM. -
MiniOn ARM Microcontroller Programming System
profdc9 writes "For the past six months or so I have been working on the MiniOn, a network enabled microcontroller programming system, similar in idea to the Basic Stamp and Arduino hobbyists are fond of, but it is programmable and accessible through a Web browser and TELNET, requiring no installed development software. It uses the cheap, readily available LPC2000 ARM7TDMI micrcontrollers, and the easy to interface Microchip ENC28J60 for ethernet. The MiniOn firmware is written using only the free WinARM development tools (Linux tools work also) for those who wish to improve the MiniOn. I have already implemented an MP3 streaming server and a web-based graphical oscilloscope in MiniOnBasic. The MiniOn should hopefully lower the barriers and costs to getting started learning about embedded systems, and provide a non-proprietary method of data acquisition." -
Unexpected Slashdot Downtime
Netcraft confirmed it ... Slashdot was dying for several hours (along with SourceForge, which shares a corporate overlord and router). Some planned downtime from our provider apparently didn't come back up quite as planned. Sorry for the inconvenience. On the upside, we're moving to a new network and hardware soon, so the site should be much faster and more stable rsn. -
Linus Denounces NDISWrapper, Denies It GPL Status
eldavojohn writes "On message boards, Linus Torvalds was explaining why NDISWrapper is not eligible to be released under the GPL even though the project claims to be. Linus remarked, "Ndiswrapper itself is *not* compatible with the GPL. Trying to claim that ndiswrapper somehow itself is GPL'd even though it then loads modules that aren't is stupid and pointless. Clearly it just re-exports those GPLONLY functions to code that is *not* GPL'd." This all sprung up with someone restricted NDISWrapper's access to GPL-only symbols thereby breaking the utility. Linus merely replied that "If it loads non-GPL modules, it shouldn't be able to use GPLONLY symbols." As you may know, NDISWrapper implements Windows kernel API and then loads Windows binaries for a number of devices and runs them natively to avoid the cost and complication of emulation." -
Preload Drastically Boosts Linux Performance
Nemilar writes "Preload is a Linux daemon that stores commonly-used libraries and binaries in memory to speed up access times, similar to the Windows Vista SuperFetch function. This article examines Preload and gives some insight into how much performance is gained for its total resource cost, and discusses basic installation and configuration to get you started." -
AMD Open Sources the AMD Performance Library
bluephone writes "Today AMD announced that they're now opening the source to the AMD Performance Library (APL) under the Apache license. The newly opened code is now hosted at SourceForge (the corporate overlord of Slashdot) under its new name, Framewave. Phoronix says, "The AMD Performance Library / Framewave covers a multitude of operations from simple math operations to media processing and optimizations for multi-core environments." No word as to if it does your laundry. The SourceForge page says that while Framewave is 'sponsored' by AMD, it is "very much an open-source venture. While AMD will continue to participate in and contribute to the project, third-party developers are welcome and encouraged to implement all or part of the code base and/or to create derivative works." Being Apache licensed, it's quite open, so this doesn't seem to be mere lip service." -
Author of ATSC Capture and Edit Tool Tries to Revoke GPL
The author of ATSC capture and edit tool has announced that he is attempting to revoke the licensing of his product under the GPL General Public License. Unfortunately it appears that the GPL does not allow this particular action. Of course in this heyday of lawyers and trigger happy litigators who can tell. What successes have others had in trying to take something they once operated under the GPL and make it private? And the more pressing question, why? -
Open Source Speech Recognition
bedahr writes "The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words." -
Tools For Understanding Code?
ewhac writes "Having just recently taken a new job, I find myself confronted with an enormous pile of existing, unfamiliar code written for a (somewhat) unfamiliar platform — and an implicit expectation that I'll grok it all Real Soon Now. Simply firing up an editor and reading through it has proven unequal to the task. I'm familiar with cscope, but it doesn't really seem to analyze program structure; it's just a very fancy 'grep' package with a rudimentary understanding of C syntax. A new-ish tool called ncc looks promising, as it appears to be based on an actual C/C++ parser, but the UI is clunky, and there doesn't appear to be any facility for integrating/communicating with an editor. What sorts of tools do you use for effectively analyzing and understanding a large code base?" -
Comcast Promising Ultra-Fast Internet
Espectr0 writes "Comcast's CEO Brian Roberts gave The Associated Press a preview of his speech for the Consumer Electronics show, and said that Comcast expects to demonstrate a technology that delivers up to 160 megabits of data per second over cable. At that speed you could download a high-definition copy of 'Batman Begins' in four minutes. The technology, DOCSIS 3.0, will start rolling out this year." Here's a note about Cisco's announcement of their DOCSIS 3.0 cable modem. -
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Status Report From the Open Source Games Community
qubodup writes "Free Gamer, an open source gaming blog, has recently become the center of open source artists, developers and gamers. In its forums, the GPU-hungry Classical Java RPG and the Neverball-killer irrlamb have found their second home. So did sub-communities like extremist free gamers, who insist on games not only be free software but also to contain free content and want to build a knowledge base of existing free games. There are also free content artists, which address an old problem of open source games and want to supply graphics and sound for projects in need of game media." -
Bossie Awards Honor Open Source Software
The Alliance writes "InfoWorld has announced the 2007 Bossie Awards for the Best of Open-Source Software. Awards were given to 36 winners across 6 categories. Honorees include (among others) SpamAssassin, ClamAV and Nessus in security, Wireshark and Azureus Vuze in networking, and ZFS for storage. Interestingly, they split the operating system winners across two distributions, with CentOS winning for server OS and Ubuntu for desktop." -
Ophcrack Says Your Password Is Insecure
javipas writes "An insightful article at Jeff Atwood's Coding Horror reveals the power inside Ophcrack, an Open Source program that is capable of discovering virtually any password in Windows operating systems. The article explains how passwords get stored on Windows using hash functions, and how Ophcrack can generate immense tables of words and letter combinations that are compared to the password we want to obtain. The program is available in Windows, Mac OS and Linux, but be careful: the generated tables that Ophcrack uses are really big, and you should allow up to 15 Gbytes to store these tables."