Cassandra 0.7 Can Pack 2 Billion Columns Into a Row

← Back to Stories (view on slashdot.org)

Cassandra 0.7 Can Pack 2 Billion Columns Into a Row

Posted by timothy on Sunday January 16, 2011 @12:58PM from the but-only-if-they're-really-thin dept.

angry tapir writes "The cadre of volunteer developers behind the Cassandra distributed database have released the latest version of their open source software, able to hold up to 2 billion columns per row. The newly installed Large Row Support feature of Cassandra version 0.7 allows the database to hold up to 2 billion columns per row. Previous versions had no set upper limit, though the maximum amount of material that could be held in a single row was approximately 2GB. This upper limit has been eliminated."

2 of 235 comments (clear)

Min score:

Reason:

Sort:

Re:Typical applications? by gratuitous_arp · 2011-01-16 13:20 · Score: 5, Interesting

Apparently the extra columns can be used to the effect of doing "more" than store data. A link in the article explains how lots of extra columns can be useful for querying data (Casandra doesn't use SQL). http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/
So the primary reason for this doesn't seem to be that one's run-of-the-mill database needs more columns.
Re:And Oracle supports EXABYTE sized databases by DavidTC · 2011-01-16 18:01 · Score: 4, Interesting

NoSQL stuff is useful in weird extreme fringe cases, where you need to access data in essentially random ways. Digg, Facebook, and Google all NoSQL databases, and I think the first two use Cassandra.
Specifically, you kinda make your own rows. It's like having permanent multiple JOINs that you can access instantly, from what I understand. (This is what this article is talking about, it's now unlimited.)
Essentially, it's a giant blob of data that exists, and you draw lines on it in advance that are your results, and you can get those result instantly, at the cost of being unable to decide to get other results in real time.
Many of the products let you have them on different servers, so you can have a 'people who have voted for this Digg' table or something, on the server that handles that thing.
I'm not entirely sure how it works, but that's basically it. Oh, and the fact they talk about 'columns' and 'rows' is just utter stupidity in naming to confuse everyone. Basically, they simply tend to keep each column as a file, which allows them to do what I mentioned above..copy needed columns, and just needed columns, to other servers.
It's really weird, and, like I said, only relevant for giant giant databases. There's no way that google could do a full text search on a RDBMS, regardless if it fits in Oracle. What it can do is make a 'column' for each word, and a 'row' for each URL, put different columns on different servers, and that actually works in the non-relational database they use, when there's no way in hell that would work on a RDBMS.
However, more importantly for slashdot, a fuckload of fools think that SQL is somehow 'retarded' and that NoSQL is 'awesome, dude', so they like to play with it, usually by spewing out some crap PHP or Perl or something that works about a tenth as well as just using an RDBMS would work. If they actually understood how to use an RDBMS, that is.

--
If corporations are people, aren't stockholders guilty of slavery?