Cassandra 0.7 Can Pack 2 Billion Columns Into a Row
angry tapir writes "The cadre of volunteer developers behind the Cassandra distributed database have released the latest version of their open source software, able to hold up to 2 billion columns per row. The newly installed Large Row Support feature of Cassandra version 0.7 allows the database to hold up to 2 billion columns per row. Previous versions had no set upper limit, though the maximum amount of material that could be held in a single row was approximately 2GB. This upper limit has been eliminated."
What sorta applications need so many columns? Curious.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
This is a feature in need of an application and I can see very few applications.
ought to be enough for everybody
In soviet russia the government regulates the companies.
Well good on them for solving an interesting technical problem, but the use cases for this are all bad.
Obvious first use: boss will suggest we optimize the database by using only one gigantic row with two billion columns.
You work for Gillette, don't you.
Cassandra appears to be a multi-dimensional datastore that does not store data in the same fashion as a typical RDBMS. It uses columns and rows both to store sets of data uniquely. If you're familiar with Big Table, then, apparently, its kinda like that.
That just means that they've added even more storage vectors to it than before...not sure why it made slashdot front page...
I'd happily pay you Tuesday for a biopsy today!
Not with column store databases such as Cassandra, HBase and BigTable.
I predict that bad things will come of this.
Not that anyone will believe me.
http://alternatives.rzero.com/
Cassandra like many of the "no sql" type databases doesn't have classic indexes.
So instead of having an index you typically have a separate table that acts as the index.
Image you have a users table. One of the field is country. Now you want to know all the users for a particular country.
In standard RDMS type systems you just scan each row or have a index that has done that "ahead of time" or as rows are inserted.
In Cassandra the rows of users are distributed possibly among 100s of servers. So scanning for all users that have a particular country would require scanning all rows which could a long time.
Unlike RDMS like system rows don't have a 2d structure and don't have real limitation on the number of columns they can have. And columns can essentially be arrays\rows of objects.
So as you design/bang out your application you typically realize you need to know "users by country" for some stupid report. So you create a new table to hold these values. This has one row per country. As users are entered you append to this row. This essentially creates an array like structure. You then lookup the row for a particular country and you now know all the users for that particular country.
Sounds like Cassandra is getting rid of a limitation that could have caused very large index to require multiple rows.
That we had all of this stuff 30 years ago. It was called 'network' databases, which were pretty much the standard sort of technology before RDBMS came along and everyone realized how incredibly much better relational algebra was for the vast majority of problems. As with many other things older ideas eventually resurface with new names and a few more features. There are times when this kind of facility is useful. Nothing wrong with it. The vast majority of cases though where I've seen people using something like Cassandra or Big Table were ill advised. A properly optimized RDBMS with correctly designed schema can handle all but a few edge cases. Most of the hype these tools are generating is based on a lack of real understanding of how to properly use databases combined with people believing myths about other technologies and helped along by the industry's short memory span. The best part though is that when something turns into a giant mess guys like me can make nice money fixing the mess. lol.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
He doesn't, otherwise it'd be uint64_t and a lather strip!
SSC
So I can appreciate that this announcement sounds like News for Nerds, but can someone why it Matters that Cassandra can support 2 billion columns?
The article basically says "because you can't execute SQL you need lots of columns". OK, great, why would I want that? The article doesn't tell me. The Cassandra website sure doesn't tell me.
Oracle 11 supports up to 8 fucking EXABYTES of data in an RDBMS that I can execute SQL against. What Cassandra puts in columns, I put in rows.
I've scoured this thread like all the other ones on Cassandra for the killer feature, for the "you can do this with Cassandra that you can't do as well with an RDBMS" and I can't find it.
The best I can come up with is "I want to store lots of indexed data, I don't care about transactional integrity, and I don't want to pay Oracle". Is that it? That's fine if it's it, Oracle doesn't come cheap and that can be a deal breaker for new companies, but I just wish someone would spell out that this is the justification for Cassandra's existence.