Cassandra NoSQL Database 1.2 Released
Billly Gates writes "The Apache Foundation released version 1.2 of Cassandra today which is becoming quite popular for those wanting more performance than a traditional RDBMS. You can grab a copy from this list of mirrors. This release includes virtual nodes for backup and recovery. Another added feature is 'atomic batches,' where patches can be reapplied if one of them fails. They've also added support for integrating into Hadoop. Although Cassandra does not directly support MapReduce, it can more easily integrate with other NoSQL databases that use it with this release."
Maybe someone can explain this to me. I've been keeping an eye out for situations where it would make more sense to use a nosql solutions like Mongo, Couch, etc. for a year or so now, and I just haven't found one.
Under what circumstances do people use a data store that doesn't need data relationships?
I'm not sure if it's a typo or a misunderstanding, but the statement in the summary about atomic batching is hilariously incorrect.
Atomic batching has nothing to do with "patches can be reapplied if one of them fails", but rather the more pedantic yet common case where you want a set of data updates to be batched atomically, where all or none of the changes occur, but nothing in between.
Indeed. I love stumbling across these on the Apache website.
With relational databases, you have to express your relationships in your database schema, as well as in your data objects and business logic.
With non-relationship databases, you have to express your relationships in your data objects and business logic.
Think about it.
The Apache Software Foundation Blog
The Apache Software... | Main
Wednesday Jan 02, 2013
The Apache Software Foundation Announces Apache Cassandra v1.2
High-performance, super-robust Big Data distributed database introduces support for dense clusters, simplifies application modeling, and improves data cell storage, design, and representation.
Forest Hill, MD –2 January 2013– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced Apache Cassandra v1.2, the latest version of the highly-scalable, fault-tolerant, Big Data distributed database.
Successfully handling thousands of requests per second, Apache Cassandra powers massive data sets quickly and reliably without compromising performance –whether running in the Cloud or partially on-premise in a hybrid data store. Apache Cassandra is successfully used by an array of organizations that include Adobe, Appscale, Appssavvy, Backupify, Cisco, Clearspring, Cloudtalk, Constant Contact, DataStax, Digg, Digital River, Disney, eBay, Easou, Formspring, Hailo, Hobsons, IBM, Mahalo.com, Morningstar, Netflix, Openwave, OpenX, Palantir, PBS, Plaxo, Rackspace, Reddit, RockYou, Shazam, SimpleGeo, Spotify, Thomson-Reuters, Twitter, Urban Airship, US Government, Walmart Labs, Williams-Sonoma, Inc., and Yakaz.
"We are pleased to announce Cassandra 1.2," said Jonathan Ellis, Vice President of Apache Cassandra. "By improving support for dense clusters —powering multiple terabytes per node— as well as simplifying application modeling, and improving data cell storage/design/representation, systems are able to effortlessly scale petabytes of data."
Highlights for the second generation high-performance, NoSQL database includes clustering across virtual nodes, inter-node communication, atomic batches, and request tracing. In addition, Cassandra v1.2 also marks the release of CQL3 (version 3 of the Cassandra Query Language), to simplify application modeling, allow for more powerful mapping, and alleviate design limitations through more natural representation.
"We are really excited to begin taking advantage of all the new features Apache Cassandra v1.2 has to offer – particularly virtual nodes and atomic batches. Both of these new features will play a central role in future enhancements to our architecture," said Ed Anuff, VP, Mobile Platform at Apigee.
"It's great to see the core of Apache Cassandra continue to evolve," said independent software developer Kelly Sommers. "In Cassandra v1.2 the introduction of vnodes will simplify managing clusters while improving performance when adding and rebuilding nodes. v1.2 also includes many new features, performance improvements and further heap reduction to eleviate the burden on the JVM garbage collector."
"The much anticipated release of Cassandra 1.2 brings with it features that simplify application development. Atomic batches provide a mechanism for developers to ensure transactional integrity across a business process, instead of relying on idempotent operations and retry mechanisms," said Brian O’Neill, Lead Architect at Health Market Science. "Additionally, native support for collections is attractive and a compelling reason to explore CQL 3."
"Apache Cassandra continues to be a leading option for scalability and high availability without compromising performance and, with the improvements provided in v1.2, reinforces our commitment to growth while preserving backwards compatibility," added Ellis.
Availability and Oversight
As with all Apache products, Apache Cassandra v1.2 is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. Apache Cassandra source code, documentation, and related resources are available at http://cassandra.apache.org/.
There must be something I don't understand. For me the whole point of databases is precisely that they come with SQL to easily do even complex stuff with them.
How can the absence of the only useful feature be a "selling" point. No SQL? No thanks?...
It always pays to use relational over NoSQL when you can. But just like in data warehousing where it makes sense to denormalize for performance reasons it can make sense to organize the data around specific computations which damage the ability to use SQL.
You won't find any good reason with normal sized data sets and normal number of joins. Computations that require large tables that need to join multiple times in complex ways that can't be overcome with tricks like indexing.... then it can make sense to sacrifice the relational algebra.
I can't believe these assholes are getting in an argument about SQL vs NoSQL. Apples and Oranges. NoSQL isn't a complete replacement, nor are rdbms the solve-all solution when you need to scale. Sounds like a bunch of db admins getting threatened that their jobs are going to be in jeopardy.