Domain: apache.org
Stories and comments across the archive that link to apache.org.
Stories · 484
-
Samsung Amps Up Its Multi-Window Android Upgrade
DeviceGuru writes "New multiwindow, multitasking features in Samsung's recent Jellybean update to the Galaxy Note 10.1 have pushed the user interface of Android tablets into new territory, adding MS Windows-like capabilities that are sure to delight many users — and aggravate others. Although some observers have warned of the dangers of forking Android, Samsung's efforts to extend Android and its ecosystem can be defended as being consistent with Google's master plan for the Android system, most of which is released under ASLv2. And remember: unlike Apple, Android device makers, and the wireless carriers who offer Android smartphones to their customers, need ways to differentiate their products." -
Cassandra NoSQL Database 1.2 Released
Billly Gates writes "The Apache Foundation released version 1.2 of Cassandra today which is becoming quite popular for those wanting more performance than a traditional RDBMS. You can grab a copy from this list of mirrors. This release includes virtual nodes for backup and recovery. Another added feature is 'atomic batches,' where patches can be reapplied if one of them fails. They've also added support for integrating into Hadoop. Although Cassandra does not directly support MapReduce, it can more easily integrate with other NoSQL databases that use it with this release." -
Cassandra NoSQL Database 1.2 Released
Billly Gates writes "The Apache Foundation released version 1.2 of Cassandra today which is becoming quite popular for those wanting more performance than a traditional RDBMS. You can grab a copy from this list of mirrors. This release includes virtual nodes for backup and recovery. Another added feature is 'atomic batches,' where patches can be reapplied if one of them fails. They've also added support for integrating into Hadoop. Although Cassandra does not directly support MapReduce, it can more easily integrate with other NoSQL databases that use it with this release." -
Open webOS Adopts Apache Cordova for Hardware Access
In their December newsletter, Open webOS announced that they've ditched the webOS-specific hardware interface that was part of Enyo 1.x for the Cordova project (formerly PhoneGap). Combined with the portable Enyo 2.0 framework, applications written for webOS are now portable to other platforms (and the other way around). There were also a number of other under-the-hood improvements: "This month we completed and delivered the pluggable keyboard project, WebAppMgr separation and upgrading to Qt 4.8.3. Work continues as planned on upgrading Qt5/webkit2 (more details next month). Also, the complete rewrite of mediaServer has been completed and is now undergoing internal QA testing, look for this to hit the repos in the coming weeks." -
Cloud Version of OpenOffice In the Works
An anonymous reader writes "The Apache Foundation revealed in Sinsheim, Germany their plans for a cloud version of OpenOffice.org based on HTML5. Chinese and German engineers use OpenOffice in 'headless' mode as a base." -
OpenOffice Is Now, Officially, Apache OpenOffice
rbowen writes "Apache OpenOffice has graduated from the Incubator, and now is officially a top-level project at the Apache Software Foundation." From the announcement: "As with all Apache software, Apache OpenOffice software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Information on Apache OpenOffice source code, documentation, mailing lists, related resources, and ways to participate are available at http://openoffice.apache.org." (Download mirror on Sourceforge, too.) -
Advertisers Blast Microsoft Over IE Default Privacy Settings
theodp writes "GeekWire reports that Microsoft is sticking to its decision to implement 'Do-Not-Track' as the default for IE 10, despite drawing the ire of corporate America, the Apache Software Foundation, and the FTC Chairman. Representatives of a veritable Who's Who of Corporate America — e.g., GM, IBM, BofA, Walmart, Merck, Allstate, AT&T, Motorola — signed off on a letter blasting Microsoft for its choice. 'By presenting Do Not Track with a default on,' the alliance argues, 'Microsoft is making the wrong choice for consumers.' The group reminds Microsoft that Apache — whose Platinum Sponsors have branded Microsoft's actions a deliberate abuse of open standards and designed its software to ignore the 'do-not-track' setting if the browser reaching it is IE 10. It also claims that the FTC Chairman, formerly supportive of Microsoft's privacy efforts, now recognizes 'the harm to consumers that Microsoft's decision could create.'" -
NSA Mimics Google, Angers Senate
An anonymous reader writes "In a bizarre turn of events, the Senate would prefer that the DoD use software not written by the government for the government. Quoting: 'Like Google, the agency needed a way of storing and retrieving massive amounts of data across an army of servers, but it also needed extra tools for protecting all that data from prying eyes. They added 'cell level' software controls that could separate various classifications of data, ensuring that each user could only access the information they were authorized to access. It was a key part of the NSA’s effort to improve the security of its own networks. But the NSA also saw the database as something that could improve security across the federal government — and beyond. Last September, the agency open sourced its Google mimic, releasing the code as the Accumulo project. It's a common open source story — except that the Senate Armed Services Committee wants to put the brakes on the project. In a bill recently introduced on Capitol Hill, the committee questions whether Accumulo runs afoul of a government policy that prevents federal agencies from building their own software when they have access to commercial alternatives. The bill could ban the Department of Defense from using the NSA's database — and it could force the NSA to meld the project's security tools with other open source projects that mimic Google's BigTable.'" -
SourceForge Allura Submitted To the Apache Software Foundation Incubator
rbowen writes "The software that powers the SourceForge developer tools (SourceForge is owned by the same corporate overlords as Slashdot) has been submitted to the Apache Software Foundation Incubator. The SourceForge Blog reads: 'By submitting Allura to the Apache Incubator, we hope to draw an even wider community of developers who can advance the feature set and tailor the framework to their needs. With the flexibility and extensibility Allura allows, developers are free to use any number of the popular source code management tools, including: Git, SVN, or Mercurial. We are indeed willing to turn our own open source platform into a tool that everyone can use and extend, and we believe Apache is the best place to steward the process.'" -
Apache OpenOffice Releases Version 3.4
An anonymous reader sends word that Apache OpenOffice 3.4 has been released (download). This is the first release since OpenOffice became a project at the Apache Software Foundation. The release notes list all of the improvements, the highlights of which The H has summarized: "According to its developers, Apache OpenOffice (AOO) 3.4.0, the first update since OpenOffice.org 3.3.0 from January 2011, now starts up faster than its predecessor and introduces a number of new features such as support for documents secured using AES256 encryption. The Linear Programming solver in the Calc spreadsheet program has been replaced with the CoinMP C-API library from the Computational Infrastructure for Operations Research (COIN-OR) project. As in LibreOffice 3.4.0, the DataPilot functionality has been renamed to Pivot Table, and now supports an unlimited number of fields. A new 'Quote all text cells' CSV (Comma Separated Values) export option has been also added to Calc. Other changes include improved ODF 1.2 encryption and Unix Printing support and various enhancements to the Impress presentation and Draw sketching programs." -
Is It Time For NoSQL 2.0?
New submitter rescrv writes "Key-value stores (like Cassandra, Redis and DynamoDB) have been replacing traditional databases in many demanding web applications (e.g. Twitter, Google, Facebook, LinkedIn, and others). But for the most part, the differences between existing NoSQL systems come down to the choice of well-studied implementation techniques; in particular, they all provide a similar API that achieves high performance and scalability by limiting applications to simple operations like GET and PUT. HyperDex, a new key-value store developed at Cornell, stands out in the NoSQL spectrum with its unique design. HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored. Other systems employ indexing techniques to enable search, or enumerate all objects in the system. In contrast, HyperDex's design enables applications to retrieve search results directly from servers in the system. The results are impressive. Preliminary benchmark results on the project website show that HyperDex provides significant performance improvements over Cassandra and MongoDB. With its unique design, and impressive performance, it seems fittng to ask: Is HyperDex the start of NoSQL 2.0?" -
Apache 2.4 Takes Direct Aim At Nginx
darthcamaro writes "The world's most popular web server is out with a major new release today that has one key goal — deliver more performance than ever before. Improved caching, proxy modules as well as new session control are also key highlights of the release. 'We also show that as far as true performance is based — real-world performance as seen by the end-user- 2.4 is as fast, and even faster than some of the servers who may be "better" known as being "fast", like nginx,' Jim Jagielski, ASF President and Apache HTTP Server Project Management Committee, told InternetNews.com." Here's list of new features in 2.4. -
Hadoop 1.0 Released
darthcamaro writes "There has been a tonne of hype about Big Data and specifically Hadoop in recent years. But until today, Hadoop was not a 1.0 release product. Does it matter? Not really, but it's still a big milestone. The new release includes a new web interface for the Hadoop filesystem, security, and Hbase database support. '"At this point we figured that as a community we can support this release and be compatible for the foreseeable future. That makes this release an ideal candidate to be called 1.0," Arun C. Murthy, vice president of Apache Hadoop, said.'" -
Why Can't We Put a BASIC On the Phone?
theodp writes "In the Sixties, we could put a man on the moon. Nowadays, laments jocastette, America's tech giants can't even put a BASIC on the phone. Woz managed to crank out a BASIC interpreter for the 6502 in the '70s. As did Bill Gates and Paul Allen. So, why — at a time when development has never been easier — can't Google, Apple, and Microsoft manage to support a free BASIC or other programming-for-the-masses development environment on desktops, laptops, tablets and phones?" My limited experience with Android development showed using Java to be obtuse and downright obnoxious to do anything (at least without Eclipse, and even with it doing anything non-standard required digging through horrendous ant buildfiles). And, of course, without a REPL things were even more obnoxious. There is the android-scripting project, but it doesn't provide particularly exhaustive access to the platform. -
ASF Lays Out Its Plan For OpenOffice.org
Thinkcloud writes "In an open letter, the Apache Software Foundation has made its plans for OpenOffice clear, including an Apache-branded OpenOffice suite targeted at developers coming next year." From The H: "The ASF says it does not want to force any vision on the ODF community noting that 'it is impossible to agree upon a single vision for all participants, Apache OpenOffice does not seek to define a single vision, nor does it seek to be the only player' in the large ODF ecosystem. Instead, it wishes to offer a neutral 'collaboration opportunity' and notes that its permissive licensing and development model are 'widely recognised as one of the best ways to ensure open standards, such as ODF, gain traction and adoption.'" -
Canonical Drops CouchDB From Ubuntu One
rsk writes "Since the Ubuntu One desktop synchronization service was launched by Canonical it has always been powered by CouchDB, a popular document-oriented NoSQL data store with a powerful master-master replication architecture that runs in many different environments (servers, mobile devices, etc.). John Lenton, senior engineering manager at Canonical, announced that Canonical would be moving away from CouchDB due to a few unresolvable issues Canonical ran into in production with CouchDB and the scale/requirements of the Ubuntu One service. Instead, says Lenton, Canonical will be moving to a custom data storage abstraction layer (U1DB) that is platform agnostic as well as datastore agnostic; utilizing the native datastore on the host device (e.g. SQLite, MySQL, API layers, 'everything'). U1DB will be complete at some point after the 12.04 release." -
Type Safety Coming To DB Queries
An anonymous reader writes "A new type-safe query language for the popular full-text search platform Solr, called Slashem (a Rogue-like), has just been released. Slashem is implemented as a domain-specific language in Scala, providing compile time type-safety, allowing you do things like date range queries against date fields but keeping you from trying to do a date range query against a string field. Hopefully this trend catches on, resulting in fewer invalid queries exploding at runtime." -
NSA Makes Contribution To Apache Hadoop Project
An anonymous reader writes "The National Security Agency has submitted a new database, Accumulo, to the Apache Foundation for incubation. Accumulo is based on the original BigTable paper with some extensions such as the ability to provide cell-level security. It appears there are some hurdles that must be cleared concerning copyright before the project could be accepted." -
Apache Warns Web Server Admins of DoS Attack Tool
CWmike writes "Developers of the Apache open-source project warned users of the Web server software on Wednesday that a denial-of-service (DoS) tool is circulating that exploits a bug in the program. 'Apache Killer' showed up last Friday in a post to the 'Full Disclosure' security mailing list. The Apache project said it would release a fix for Apache 2.0 and 2.2 in the next 48 hours. All versions in the 1.3 and 2.0 lines are said to be vulnerable to attack. The group no longer supports the older Apache 1.3. 'The attack can be done remotely and with a modest number of requests can cause very significant memory and CPU usage on the server,' Apache said in an advisory. The bug is not new. Michal Zalewski, a security engineer who works for Google, pointed out that he had brought up the DoS exploitability of Apache more than four-and-a-half years ago. In lieu of a fix, Apache offered steps administrators can take to defend their Web servers until a patch is available." -
Oracle's Java Policies Are Destroying the Community
snydeq writes "Neil McAllister sees Oracle's buggy Java SE 7 release as only the latest misstep in a mounting litany of bad behavior. 'Who was the first to alert the Java community? The Apache Foundation. Oh, the irony. This is the same Apache Foundation that resigned from the Java Community Process executive committee in protest after Oracle repeatedly refused to give it access to the Java Technology Compatibility Kit,' McAllister writes. 'It seems as if Oracle would like nothing better than to stomp Apache and its open source Java efforts clean out of existence.'" -
Java 7 Ships With Severe Bug
Lisandro writes "Lucid Imagination just posted an announcement about a severe bug in the recently released Java 7. Apparently some loops are mis-compiled due to errors in the HotSpot compiler optimizations, which causes programs to fail. This bug affects several Apache projects directly — Apache Lucene Core and Apache Solr have already raised a warning, noting that the bug might be present in Java 6 as well." -
Book Review: Solr 1.4 Enterprise Search Server
MassDosage writes "Solr 1.4 Enterprise Search Server written by David Smiley and Eric Pugh provides in-depth coverage of the open source Solr search server. In some ways this book reads like the missing reference manual for the advanced usage of Solr. It is aimed at readers already familiar with Solr and related search concepts as well as those having some knowledge of programming (specifically Java). The book covers a lot of ground, some of it fairly challenging, and gives those working with Solr a lot of hands-on technical advice on how to use and fine-tune many parts of this powerful application." Keep reading for the rest of MassDosage's review. Solr 1.4 Enterprise Search Server author David Smiley and Eric Pugh pages 317 publisher Packt Publishing rating 8/10 reviewer Mass Dosage ISBN 978-1-847195-88-3 summary Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more. Solr 1.4 Enterprise Search Server starts off with a brief description of what Solr is, how it is related to the Lucene libraries (which it is built around) and how it compares to other technologies such as databases. This book is not an introduction to search and this chapter covers only the basics and assumes the reader already knows what they are getting into or that they will read up on search concepts themselves before reading further. Solr is free, open-source technology licensed under the Apache license and is available here. This book covers the 1.4 version of Solr and was published before this version was actually released so it is a bit patchy in areas which were still undergoing change but the authors point this out very clearly in the text where applicable.
The book provides details on downloading and installing Solr, building it from source and the manifold options available for configuring and tweaking it. A freely available data set from Music Brainz is provided for download along with various code examples and a bundled version of Solr 1.4 which is used as the basis for many of the examples referred to throughout the text. In some ways this dataset is limited as it only allows for fairly simple usages compared with the challenges of indexing and searching large bodies of text. Again, the authors clearly mention these limits and briefly describe how certain concepts would be better applied to other data sources.
The basics of schema design, text analysis, indexing and searching are covered over the next three chapters and these include a wide-range of essential search concepts such as tokenizers, stemming, stop-words, synonyms, data import handlers, field qualifiers, filters, scoring, sorting etc. The reader is taken through the process of setting up Solr so it can be used to index data that is to be searched and then how this data can be imported into Solr from a variety of sources like XML and HTML documents, PDF's, databases, CSV files and many others. Using Solr to build search queries is covered with examples that the reader can run via the Solr web interface and provided sample data.
More advanced search techniques are covered next and at this point I felt a lot of what was being discussed went over my head. Perhaps this was because my own search experience hasn't extended very far and the behind-the-scenes algorithms powering search aren't something I've had to directly work with. There were sections here that definitely felt aimed at people with a much more thorough understanding of the theory underpinning search and how a knowledge of mathematics and the data being searched are essential for search algorithm design. Having said this, these chapters felt like they would be really useful to come back to at some point in the future and I'm sure that people working with search on a daily basis would find some useful advice here for how to get the best out of Solr.
Solr provides much more than just indexing and search and the fact that various components are available to do many other common search-related functions is one of its main benefits. These components provide things like the highlighting of search terms in returned results, spell-checking, related documents and so on. The authors cover components which ship with Solr to provide this functionality as well as a mentioning a few that are currently separate software projects. One can easily see how all of this would be directly applicable if one was adding search capability to one's own product or web site as there are a lot of wheels that Solr saves you from having to re-invent. The book also mentions the various parts of Solr that can be extended to modify or add new behaviours, which of course if one of the many advantages of its open source nature.
The final three chapters move on to the more practical side of actually using Solr in the "real world" and discuss various deployment options, how it can be monitored using JMX, security, integration and scaling. In addition to Java (which is the probably the most powerful and straightforward way of integrating with Solr) support for languages like JavaScript, PHP and Ruby is described. I felt the Ruby section was way too long, maybe one of the authors has a soft spot for the Ruby language? The sections on writing a web crawler and doing autocomplete were far more interesting and probably also more generally applicable. The book wraps up with a thorough discussion on how to scale Solr from scaling high (optimising a single server through techniques like caching, shingling and clever schema design and indexing strategies), scaling wide (using multiple Solr servers and replicating or sharding data between them) and scaling deep (a combination of the former two approaches).
On the whole this is a very thorough, detailed book and it is clear that the authors have a lot of experience with Solr and how it is used in practice. This book does not cover a lot of theory and assumes a fair amount of prior knowledge and is definitely aimed at those who need to get their hands dirty and get up and running with Solr in a production environment. The authors have a straightforward, open and honest writing style and aren't afraid of clearly stating where Solr has limitations or imperfections. While the book may have a somewhat steep learning curve, this is isolated to certain chapters which can be skipped and returned to later if necessary. The fact that the writing is concise and to the point means one doesn't have to wade through pages of flowery text before getting to the good bits. If you're seriously thinking about using Solr or are already using it and want to know more so you can take full advantage of it, I would definitely recommend this book.
Full disclosure: I was given a copy of this book free of charge by the publisher for review purposes. They placed no restrictions on what I could say and left me to be as critical as I wanted so the above review is my own honest opinion.
You can purchase Solr 1.4 Enterprise Search Server from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Does Google Pin Copyright Violations On the ASF?
An anonymous reader writes "Florian Mueller claims to have produced new evidence that he believes supports Oracle's case against Google on the copyright side of the lawsuit. Oracle originally presented one example to the court, and that file was found to have been part of older Android distributions, with an Apache license header. Mueller has just published six more files of that kind and believes the Apache Software Foundation will disown those just like the first one because those were never part of the Apache Harmony code base. Furthermore, various source files from the Sun Java Wireless Toolkit were found in the Android codebase, containing a total of 38 copyright notices that mark them as proprietary and confidential, but Google apparently published their source code regardless." -
Tomcat 7 Finalized
alphadogg writes "The volunteer developers behind Apache Tomcat have released version 7.0.6 of the open-source Java servlet container. 'This is the first stable release of the Tomcat 7 branch,' developer Mark Thomas wrote in an e-mail announcing the release on various Tomcat developer mailing lists. While not a full application server, Tomcat implements the functionality described in the Java Enterprise Edition Web profile specifications. Most notably, it supports version 3.0 of the Servlet API (application programming interface) and version 2.2 of JavaServer Pages, both part of the recently ratified JEE 6. A servlet container manages Java-based applications that can be accessed from a Web browser. One big area of improvement is in configuration management for Web applications. Previous versions required all Web app configuration changes to be entered in a central file called web.xml, a process that led to unwieldy web.xml files as well as security risks." -
Apache To Steward NASA-Built Middleware
itwbennett writes "The Apache Software Foundation announced Wednesday that the Object-Oriented Data Technology (OODT), first developed by NASA's Jet Propulsion Laboratory, has graduated to a top level project. The software 'provides a one-stop toolkit for building up a database, populating a database, setting up a work flow to get data into that database, and then serving out lots of different content from that database,' said Chris Mattmann, vice president of the OODT project. NASA uses the software to manage data from multiple domains, including astrophysics, earth carbon monitoring and land-water use. The National Cancer Institute also uses the software for its Daily Detection Research Network, which ties together multiple cancer research databases." -
Apache Subversion To WANdisco, Inc: Get Real
kfogel writes "The Apache Subversion project has just had to remind one of its corporate contributors about the rules of the road. WANdisco, Inc was putting out some very odd press releases and blog posts, implying (among other things) that their company was in some sort of steering position in the open source project. Oops — that's not the Apache Way. The Apache Software Foundation has reminded them of how things work. Meanwhile, one of the founding developers of Subversion, Ben Collins-Sussman, has posted a considerably more caustic take on WANdisco's behavior." -
Apache Resigns From the JCP Executive Committee
iammichael writes "The Apache Software Foundation has resigned its seat on the Java SE/EE Executive Committee due to a long dispute over the licensing restrictions placed on the TCK (test kit validating third-party Java implementations are compatible with the specification)." -
Google Wave Looking To Join Apache Software Foundation
MMacFadden writes "The Google Wave team has officially submitted the open source version of Wave to the Apache Software Foundation as a candidate Incubator project. Google hopes that the wave technology will continue to grow, supported by the new open source community (which is made up of Google and non-Google employees alike). Here is the proposal itself." -
Programming Things I Wish I Knew Earlier
theodp writes "Raw intellect ain't always all it's cracked up to be, advises Ted Dziuba in his introduction to Programming Things I Wish I Knew Earlier, so don't be too stubborn to learn the things that can save you from the headaches of over-engineering. Here's some sample how-to-avoid-over-complicating-things advice: 'If Linux can do it, you shouldn't. Don't use Hadoop MapReduce until you have a solid reason why xargs won't solve your problem. Don't implement your own lockservice when Linux's advisory file locking works just fine. Don't do image processing work with PIL unless you have proven that command-line ImageMagick won't do the job. Modern Linux distributions are capable of a lot, and most hard problems are already solved for you. You just need to know where to look.' Any cautionary tips you'd like to share from your own experience?" -
BlackBerry Maker To Buy QNX For RTOS & Dev. Suite
Freshly Exhumed writes "Research In Motion, maker of BlackBerry smartphones, said on Friday it will buy QNX Software Systems, makers of Real-Time Operating Systems, for an undisclosed amount as it moves to boost integration of its devices with in-vehicle audio systems. QNX Neutrino is a Unix-like RTOS, and their Momentics development suite is for embedded applications for a wide variety of industries. While RIM has offered somewhat limited support of open source projects on its BlackBerry platform, the future of QNX's Foundry27 development project, which uses the Apache 2.0 license, has not yet been mentioned." -
Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL
donadony writes "After twitter, now it's Digg who's decided to replace MySQL and most of their infrastructure components and move away from LAMP to another architecture called NoSQL that is based in Cassandra, an open source project that develops a highly scalable second-generation distributed database. Cassandra was open sourced by Facebook in 2008 and is licensed under the Apache License. The reason for this move, as explained by Digg, is the increasing difficulty of building a high-performance, write-intensive application on a data set that is growing quickly, with no end in sight. This growth has forced them into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead." -
How Twitter Is Moving To the Cassandra Database
MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database. Here's some detailed background on Cassandra, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model." Before settling on Cassandra, the Twitter team looked into: "...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting. ... We're currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets. ... Some side notes here about importing. We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast — it would saturate the backplane of our network. We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster." Relatedly, an anonymous reader notes that the upcoming NoSQL Live conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace. -
The Final Release of Apache HTTP Server 1.3
Kyle Hamilton writes "The Apache Software Foundation and the Apache HTTP Server Project are pleased to announce the release of version 1.3.42 of the Apache HTTP Server ('Apache'). This release is intended as the final release of version 1.3 of the Apache HTTP Server, which has reached end of life status There will be no more full releases of Apache HTTP Server 1.3. However, critical security updates may be made available." -
The Final Release of Apache HTTP Server 1.3
Kyle Hamilton writes "The Apache Software Foundation and the Apache HTTP Server Project are pleased to announce the release of version 1.3.42 of the Apache HTTP Server ('Apache'). This release is intended as the final release of version 1.3 of the Apache HTTP Server, which has reached end of life status There will be no more full releases of Apache HTTP Server 1.3. However, critical security updates may be made available." -
SpamAssassin 2010 Bug
SEWilco writes "You might want to check your spam folder, as SpamAssassin has a rule which is tending to mark email sent in 2010 as spam. There is some discussion in a bug report. The SpamAssassin Wiki FH_DATE_PAST_20XX page doesn't have discussion, but it was updated today with a different date rule." -
SpamAssassin 2010 Bug
SEWilco writes "You might want to check your spam folder, as SpamAssassin has a rule which is tending to mark email sent in 2010 as spam. There is some discussion in a bug report. The SpamAssassin Wiki FH_DATE_PAST_20XX page doesn't have discussion, but it was updated today with a different date rule." -
Open Source Solution Breaks World Sorting Records
allenw writes "In a recent blog post, Yahoo's grid computing team announced that Apache Hadoop was used to break the current world sorting records in the annual GraySort contest. It topped the 'Gray' and 'Minute' sorts in the general purpose (Daytona) category. They sorted 1TB in 62 seconds, and 1PB in 16.25 hours. Apache Hadoop is the only open source software to ever win the competition. It also won the Terasort competition last year." -
JaikuEngine Gets Open Sourced
volume4 writes "The switch has been flipped and Jaiku has been moved to App Engine. Google will no longer be developing Jaiku, so the code and the future of Jaiku is in the hands of the open source community. From the Jaiku blog: 'Today, we are open sourcing the Jaiku code base under the Apache License 2.0. The code is available as JaikuEngine on Google Code Project Hosting as of now. Anyone can set up and run their own JaikuEngine instance on Google App Engine.'" We discussed Google's purchase of Jaiku in 2007, and their subsequent decision to halt development a few months ago. -
Microsoft Donates Code To Apache's "Stonehenge" Project
dp619 writes "Several months after joining the Apache Foundation, Microsoft has made its first code contribution to an Apache project. The project, known as Stonehenge, is made up of companies and developers seeking to test the interoperability of Web standards implementations."Reader Da Massive adds a link to coverage at Computer World. -
Software Logging Schemes?
MySkippy writes "I've been a software engineer for just over 10 years, and I've seen a lot of different styles of logging in the applications I've worked on. Some were extremely verbose — about 1 logging line for every 2 lines of code. Others were very lacking, with maybe 1 line in 200 devoted to logging. I personally find that writing debug and informational messages about every 2 to 5 lines works well for debugging an issue, but can become cumbersome when reading through a log for analysis. I like to write warning messages when thresholds or limits are being approached — these tend to be infrequent. I log errors whenever I catch one (but I've never put a 'fatal' message in my code, because if it's truly a fatal error I probably didn't catch it). Recently I came across log4j and log4net and have begun using them both. That brings me to my question: how do the coders on Slashdot handle logging in their code?" -
Software Logging Schemes?
MySkippy writes "I've been a software engineer for just over 10 years, and I've seen a lot of different styles of logging in the applications I've worked on. Some were extremely verbose — about 1 logging line for every 2 lines of code. Others were very lacking, with maybe 1 line in 200 devoted to logging. I personally find that writing debug and informational messages about every 2 to 5 lines works well for debugging an issue, but can become cumbersome when reading through a log for analysis. I like to write warning messages when thresholds or limits are being approached — these tend to be infrequent. I log errors whenever I catch one (but I've never put a 'fatal' message in my code, because if it's truly a fatal error I probably didn't catch it). Recently I came across log4j and log4net and have begun using them both. That brings me to my question: how do the coders on Slashdot handle logging in their code?" -
Slashdot's Setup, Part 2- Software
Today we have Part 2 in our exciting 2 part series about the infrastructure that powers Slashdot. Last week Uriah told us all about the hardware powering the system. This week, Jamie McCarthy picks up the story and tells us about the software... from pound to memcached to mysql and more. Hit that link and read on.The software side of Slashdot takes over at the point where our load balancers -- described in Friday's hardware story -- hand off your incoming HTTP request to our pound servers.
Pound is a reverse proxy, which means it doesn't service the request itself, it just chooses which web server to hand it off to. We run 6 pounds, one for HTTPS traffic and the other 5 for regular HTTP. (Didn't know we support HTTPS, did ya? It's one of the perks for subscribers: you get to read Slashdot on the same webhead that admins use, which is always going to be responsive even during a crush of traffic -- because if it isn't, Rob's going to breathe down our necks!)
The pounds send traffic to one of the 16 apaches on our 16 webheads -- 15 regular, and the 1 HTTPS. Now, pound itself is so undemanding that we run it side-by-side with the apaches. The HTTPS pound handles SSL itself, handing off a plaintext HTTP request to its machine's apache, so the apache it redirects traffic to doesn't need mod_ssl compiled in. One less headache! Of our other 15 webheads, 5 also run a pound, not to distribute load but just for redundancy.
(Trivia: pound normally adds an X-Forwarded-For header, which Slash::Apache substitutes for the (internal) IP of pound itself. But sometimes if you use a proxy on the internet to do something bad, it will send us an X-Forwarded-For header too, which we use to try to track abuse. So we patched pound to insert a special X-Forward-Pound header, so it doesn't overwrite what may come from an abuser's proxy.)
The other 15 webheads are segregated by type. This segregation is mostly what pound is for. We have 2 webheads for static (.shtml) requests, 4 for the dynamic homepage, 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl), and 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose). We segregate partly so that if there's a performance problem or a DDoS on a specific page, the rest of the site will remain functional. We're constantly changing the code and this sets up "performance firewalls" for when us silly coders decide to write infinite loops.
But we also segregate for efficiency reasons like httpd-level caching, and MaxClients tuning. Our webhead bottleneck is CPU, not RAM. We run MaxClients that might seem absurdly low (5-15 for dynamic webheads, 25 for static) but our philosophy is if we're not turning over requests quickly anyway, something's wrong, and stacking up more requests won't help the CPU chew through them any faster.
All the webheads run the same software, which they mount from a /usr/local exported by a read-only NFS machine. Everyone I've ever met outside of this company gives an involuntary shudder when NFS is mentioned, and yet we haven't had any problems since shortly after it was set up (2002-ish). I attribute this to a combination of our brilliant sysadmins and the fact that we only export read-only. The backend task that writes to /usr/local (to update index.shtml every minute, for example) runs on the NFS server itself.
The apaches are versions 1.3, because there's never been a reason for us to switch to 2.0. We compile in mod_perl, and lingerd to free up RAM during delivery, but the only other nonstandard module we use is mod_auth_useragent to keep unfriendly bots away. Slash does make extensive use of each phase of the request loop (largely so we can send our 403's to out-of-control bots using a minimum of resources, and so your page is fully on its way while we write to the logging DB).
Slash, of course, is the open-source perl code that runs Slashdot. If you're thinking of playing around with it, grab a recent copy from CVS: it's been years since we got around to a tarball release. The various scripts that handle web requests access the database through Slash's SQL API, implemented on top of DBD::mysql (now maintained, incidentally, by one of the original Slash 1.0 coders) and of course DBI.pm. The most interesting parts of this layer might be:
(a) We don't use Apache::DBI. We use connect_cached, but actually our main connection cache is the global objects that hold the connections. Some small chunks of data are so frequently used that we keep them around in those objects.
(b) We almost never use statement handles. We have eleven ways of doing a SELECT and the differences are mostly how we massage the results into the perl data structure they return.
(c) We don't use placeholders. Originally because DBD::mysql didn't take advantage of them, and now because we think any speed increase in a reasonably-optimized web app should be a trivial payoff for non-self-documenting argument order. Discuss!
(d) We built in replication support. A database object requested as a reader picks a random slave to read from for the duration of your HTTP request (or the backend task). We can weight them manually, and we have a task that reweights them automatically. (If we do something stupid and wedge a slave's replication thread, every Slash process, across 17 machines, starts throttling back its connections to that machine within 10 seconds. This was originally written to handle slave DBs getting bogged down by load, but with our new faster DBs, that just never happens, so if a slave falls behind, one of us probably typed something dumb at the mysql> prompt.)
(e) We bolted on memcached support. Why bolted-on? Because back when we first tried memcached, we got a huge performance boost by caching our three big data types (users, stories, comment text) and we're pretty sure additional caching would provide minimal benefit at this point. Memcached's main use is to get and set data objects, and Slash doesn't really bottleneck that way.
Slash 1.0 was written way back in early 2000 with decent support for get and set methods to abstract objects out of a database (getDescriptions, subclassed _wheresql) -- but over the years we've only used them a few times. Most data types that are candidates to be objectified either are processed in large numbers (like tags and comments), in ways that would be difficult to do efficiently by subclassing, or have complicated table structures and pre- and post-processing (like users) that would make any generic objectification code pretty complicated. So most data access is done through get and set methods written custom for each data type, or, just as often, through methods that perform one specific update or select.
Overall, we're pretty happy with the database side of things. Most tables are fairly well normalized, not fully but mostly, and we've found this improves performance in most cases. Even on a fairly large site like Slashdot, with modern hardware and a little thinking ahead, we're able to push code and schema changes live quickly. Thanks to running multiple-master replication, we can keep the site fully live even during blocking queries like ALTER TABLE. After changes go live, we can find performance problem spots and optimize (which usually means caching, caching, caching, and occasionally multi-pass log processing for things like detecting abuse and picking users out of a hat who get mod points).
In fact, I'll go further than "pretty happy." Writing a database-backed web site has changed dramatically over the past seven years. The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.
-
Bossie Awards Honor Open Source Software
The Alliance writes "InfoWorld has announced the 2007 Bossie Awards for the Best of Open-Source Software. Awards were given to 36 winners across 6 categories. Honorees include (among others) SpamAssassin, ClamAV and Nessus in security, Wireshark and Azureus Vuze in networking, and ZFS for storage. Interestingly, they split the operating system winners across two distributions, with CentOS winning for server OS and Ubuntu for desktop." -
Java Open Review Project
bvc writes "We Launched the Java Open Review Project today. We're reviewing open source Java code all the way from Tomcat down to PetStore looking for bugs and security vulnerabilities. We're using two static analysis tools to do the heavy lifting: the open source tool FindBugs, and the commercial tool Fortify SCA. We can use plenty of human eyes to help sort through the results. We're also soliciting ideas for which projects we should be reviewing next. Please help!" -
Apple Announces New Open Source Efforts
Today Apple announced a few expanded open source efforts. First, beginning with Mac OS X 10.4.7, the Darwin/Mac OS X kernel, known as "xnu", is again available as buildable source for the Intel platform, including EFI utilities. Second, iCal Server, Bonjour, and launchd are moving to Apache 2.0 licensing. And finally, Mac OS Forge has been launched, as the successor to OpenDarwin as a conduit for hosting projects such as WebKit that were formerly hosted by the OpenDarwin project's servers, such as WebKit. Mac OS Forge is sponsored by Apple. DarwinPorts has already moved to its own servers. Update: 08/08 01:43 GMT by J : The official Apple announcement is now out. Other fun news: Leopard will ship with Ruby on Rails. -
Summer of Code Now Taking Student Applications
chrisd writes "Just wanted to let you know that we've opened up the student application process for the Summer of Code. We've signed up ~100 mentoring organizations this year, including Apache, Postgres, Xiph, The Shmoo Group, Drupal, Gallery and many others. We're accepting applications through May 8th this year." -
Ask Apache Software Chairman Greg Stein
Here's a man who obviously has his finger on the pulse of open source software development. I mean, who hasn't heard of Apache? His work history is interesting, too: He's moved from Microsoft to CollabNet to Google. And he's not shy about speaking his mind about open source, as shown in this ZDNet blog entry. Please try to confine yourself to one question per post. (If you have more than one question, post more than once.) We'll send 10 of the highest-moderated questions to Greg tomorrow and run his answers when we get them back. -
A Webserver on Your Cellphone?
Mad_Rain asks: "I saw over on Make Magazine an article about using your cell phone on the Internet, except instead of browsing the web from your cell, you can serve webpages from your phone. Of course, it uses Apache, Python and a Nokia S60 series cell phone. I can imagine a couple of creative applications for webservers in strange places, but what else can be done with this?" -
Searchable C/C++ DB surpasses 275 million lines
Sembiance writes "I've been working on a C/C++ source code search database for the past year. It has recently surpassed 275 million lines of searchable open source C/C++ code. The search engine is C/C++ syntax aware so you can search for specific elements such as functions, macros, classes, comments, etc. The site is built upon many open source products including: MySQL and Lucene for the database, CodeWorker to parse the code, PHP and Apache for the website and GeSHi for syntax highlighting. I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code." -
Searchable C/C++ DB surpasses 275 million lines
Sembiance writes "I've been working on a C/C++ source code search database for the past year. It has recently surpassed 275 million lines of searchable open source C/C++ code. The search engine is C/C++ syntax aware so you can search for specific elements such as functions, macros, classes, comments, etc. The site is built upon many open source products including: MySQL and Lucene for the database, CodeWorker to parse the code, PHP and Apache for the website and GeSHi for syntax highlighting. I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code."