Cassandra 0.7 Can Pack 2 Billion Columns Into a Row
angry tapir writes "The cadre of volunteer developers behind the Cassandra distributed database have released the latest version of their open source software, able to hold up to 2 billion columns per row. The newly installed Large Row Support feature of Cassandra version 0.7 allows the database to hold up to 2 billion columns per row. Previous versions had no set upper limit, though the maximum amount of material that could be held in a single row was approximately 2GB. This upper limit has been eliminated."
What sorta applications need so many columns? Curious.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Now I don't need that 2NF! Just use one column per customer!!!
They should have gone with the uint32_t counter, then they could support up to 4 billion!
I don't care if it's 90,000 hectares. That lake was not my doing.
... then you're doing it wrong
If this really matters at all, besides being slightly cool, it will just lead to more bad db design.
vos nescitis quicquam, nec cogitatis quia expedit nobis ut unus moriatur homo pro populo et non tota gens pereat.
This is a feature in need of an application and I can see very few applications.
What would the SQL statement look like if you wanted to select nearly all of those 2 billion columns except a few?
I couldn't not link to this xkcd comic.
ought to be enough for everybody
In soviet russia the government regulates the companies.
Now I can write that application I have been wanting to write forever. Just couldn't find a suitable database because none of them supported two billion columns per row. Oh happy day.
Well good on them for solving an interesting technical problem, but the use cases for this are all bad.
Obvious first use: boss will suggest we optimize the database by using only one gigantic row with two billion columns.
Seriously though, WTF?
...whats the point...
portfolio
Now I can finally shoe-horn my coworkers' Excel spreadsheets into a database.
Cassandra appears to be a multi-dimensional datastore that does not store data in the same fashion as a typical RDBMS. It uses columns and rows both to store sets of data uniquely. If you're familiar with Big Table, then, apparently, its kinda like that.
That just means that they've added even more storage vectors to it than before...not sure why it made slashdot front page...
I'd happily pay you Tuesday for a biopsy today!
By establishing an upper limit on a formerly unlimited limit.
Wow, the first spam comment I have ever seen on /. And not one piece is authentic. I especially like how they made the security icons clickable but not the way they should be.
If I used a sig over again, would anyone notice?
I predict that bad things will come of this.
Not that anyone will believe me.
http://alternatives.rzero.com/
I know why the developers thought this would be a good idea. A feature this mental would be sure to get them free publicity on slashdot
portfolio
Welcome to the new online dating experience when we match you to someone else with up to 2 Billion traits!
This sounds purely like marketing gibberish when you can't create enough meaningful features to boast about.
I can't even think of a reason why you would need 2 billion columns. If you did, I think the ability to store it is the least of your problems.
As I recall, one of the tasks given to Nedry in the design of the computer systems was to devise a database capable of holding a couple of billion fields to handle the sequencing of DNA strands.
Cassandra appears to be a multi-dimensional datastore that does not store data in the same fashion as a typical RDBMS. It uses columns and rows both to store sets of data uniquely. If you're familiar with Big Table, then, apparently, its kinda like that.
That just means that they've added even more storage vectors to it than before...not sure why it made slashdot front page...
ah yes.... this will but my linear algebra class to good use!
Cassandra like many of the "no sql" type databases doesn't have classic indexes.
So instead of having an index you typically have a separate table that acts as the index.
Image you have a users table. One of the field is country. Now you want to know all the users for a particular country.
In standard RDMS type systems you just scan each row or have a index that has done that "ahead of time" or as rows are inserted.
In Cassandra the rows of users are distributed possibly among 100s of servers. So scanning for all users that have a particular country would require scanning all rows which could a long time.
Unlike RDMS like system rows don't have a 2d structure and don't have real limitation on the number of columns they can have. And columns can essentially be arrays\rows of objects.
So as you design/bang out your application you typically realize you need to know "users by country" for some stupid report. So you create a new table to hold these values. This has one row per country. As users are entered you append to this row. This essentially creates an array like structure. You then lookup the row for a particular country and you now know all the users for that particular country.
Sounds like Cassandra is getting rid of a limitation that could have caused very large index to require multiple rows.
My bad. I meant "multiple rows".
Here is a link to to an introduction to the Cassandra database system. One thing to realize is that Cassandra is one of the new "noSQL" DBMS. These operate very differently than an RDBMS such as Oracle or DB2.
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
That's less than one column per person!
Now if only Excel would follow.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Man this is great! Now I only need one table and never have to JOIN again. Most of the rows won't use most of the columns but that's what NULL is for, am I right?
Comment removed based on user account deletion
I wonder if it's possible to represent a non-cartesian basis vector-space with a DB. Maybe one of the columns is sinusoidally looped - haha,, every 32nd insert wraps around itself.. Oh this could be a cool MLK holiday project.
-Michael
Not knocking Cassandra, but basically it means that this metric of "2 billion columns", being completely different from the concept of RDBMS columns, really doesn't mean much from a comparative point of view...
It's kinda like saying "that army of ants will conquer all nations, they have 2 billion soldiers!" :)
That we had all of this stuff 30 years ago. It was called 'network' databases, which were pretty much the standard sort of technology before RDBMS came along and everyone realized how incredibly much better relational algebra was for the vast majority of problems. As with many other things older ideas eventually resurface with new names and a few more features. There are times when this kind of facility is useful. Nothing wrong with it. The vast majority of cases though where I've seen people using something like Cassandra or Big Table were ill advised. A properly optimized RDBMS with correctly designed schema can handle all but a few edge cases. Most of the hype these tools are generating is based on a lack of real understanding of how to properly use databases combined with people believing myths about other technologies and helped along by the industry's short memory span. The best part though is that when something turns into a giant mess guys like me can make nice money fixing the mess. lol.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
I'm was having trouble making a table for my new Web 3.0 m-commerce application on lesser databases:
CREATE TABLE peeps( ...
peep1_first_name VARCHAR(255),
peep1_last_name VARCHAR(255),
peep1_address VARCHAR(255),
peep1_address2 VARCHAR(255),
peep1_address3 VARCHAR(255),
peep1_creditcard VARCHAR(255),
peep1_creditcard2 VARCHAR(255),
peep1_creditcard3 VARCHAR(255),
peep2_first_name VARCHAR(255),
peep2_last_name VARCHAR(255),
peep2_address VARCHAR(255),
peep2_address2 VARCHAR(255),
peep2_address3 VARCHAR(255),
peep2_creditcard VARCHAR(255),
peep2_creditcard2 VARCHAR(255),
peep2_creditcard3 VARCHAR(255),
509 Bandwidth Limit Exceeded
I'm not a lawyer, but I play one on the Internet. Blog
So I can appreciate that this announcement sounds like News for Nerds, but can someone why it Matters that Cassandra can support 2 billion columns?
The article basically says "because you can't execute SQL you need lots of columns". OK, great, why would I want that? The article doesn't tell me. The Cassandra website sure doesn't tell me.
Oracle 11 supports up to 8 fucking EXABYTES of data in an RDBMS that I can execute SQL against. What Cassandra puts in columns, I put in rows.
I've scoured this thread like all the other ones on Cassandra for the killer feature, for the "you can do this with Cassandra that you can't do as well with an RDBMS" and I can't find it.
The best I can come up with is "I want to store lots of indexed data, I don't care about transactional integrity, and I don't want to pay Oracle". Is that it? That's fine if it's it, Oracle doesn't come cheap and that can be a deal breaker for new companies, but I just wish someone would spell out that this is the justification for Cassandra's existence.
Nice, but can it run Flash?
Good morning, Michael! How are you? I am fine.
Say, I haven't seen you in awhile; whatzup with dat?
Are you related to http://slashdot.org/~MichaelKristopeit301 thru http://slashdot.org/~MichaelKristopeit360?
I thought so. Yeah, I know how it is, bro--sixty /. accounts is the absolute minimum one should have, eh, Michael?
Well, it's good to see your friendly posts again ol' buddy!
This is great for those of us in the database community who are purists about only using one row of data.
You've not done much outsourcing, have you?
No sig today...
Post a story that has no value whatsoever and queue the predictable quibbling about how screwed up in the head one must be to build a table with soo many columns.
-1 redundant + AC -100 modifier
I can hardly wait to see the first story on The Daily WTF about someone doing a SELECT *... on a 2-billion column row, whether intentionally or not. Bonus points for excluding a LIMIT clause, too.
That green slime had it coming.
... You're our -2,147,483,648th Column in the User Row!
<blink>This is not a Joke!</blink>
Click here to claim your free ERROR [MEMTABLE-FLUSHER-POOL:7] 2010-01-17 08:16:53,628 DebuggableThreadPoolExecutor.java (line 110) Error in ThreadPoolExecutor!
No seriously, why?
What could possibly necessitate two billion rows per column?
Is this just a "because we can" kind of thing, or is there a practical reason for it?
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
True story. They were over there in India using some meta data derived from our application dataset to generate a UML model which was generating java source which was compiling to class files one gigabyte in size. We fired the application up for the customer and it never actually finished starting...
http://michaelsmith.id.au
...the most important feature: 2 billion columns per row!
What an absolute moron!
"The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
Looking at Cassandra from a traditional DBMS viewpoint i notice one thing, it does not have transaction like a true transactional database.
This reminds me of early MYSQL databaseengines where there also was not transactional support (but had " huge" speed, until you went multi user and the lack of rowlevel locking bit you hard.)
There are a lot of applications where the lack of tradional ACID is acceptable, but one has to keep this in mind in the designing in the application.
(SQL/ACID is a incomplete model anyway, since in itself it does not have a facility to show updates to data to users. )
Columns per row?
What happened to "fields per record"?
All that story needs is for the Java classes to be sorted as XML inside a database.
No sig today...
Many people have been asking why it needs to support so many columns. This is hugely advantageous for users of Cassandra, because it only supports databases with a single row.
- WHY?
So, what is the label for the 2 billionth column as a base26 number with an initial leading base27 "digit" (i.e. alphabetic name of A-Z[A-Z]* where A is zero except for the first digit where A is 1, AA is 10, AAA is 100)?
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
Developer: "Sure, how many rows did you say that was again?" Asshat: "Umm... 2 billion." Developer: ....
Apparently most geeks, unlike most Greeks, are unfamiliar with who Cassandra was...
I've abandoned my search for truth; now I'm just looking for some useful delusions.
"Previous versions had no set upper limit, though the maximum amount of material that could be held in a single row was approximately 2GB. This upper limit has been eliminated."
Poor writing. I think you meant to say that they exchanged one type of upper limit for another.
... when we built guitar amps that went to eleven.
Have gnu, will travel.
I can't wait to have to fix some 3rd party database app that has millions of columns that was created by some automated tool.
The Kruger Dunning explains most post on
Your script needs tuning.
"The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
Why not flip a RDBMS 90 degrees and you have a column-oriented database? Using your rows in your RDBMS as you would use your columns in Cassandra?
Postgre/MySQL can pack more than 2 BILLION ROWS in single column!
"Don't let fools fool you. They are the clever ones."
And some d*ck out there is going to need 2 billion and 1 columns, it will never end.
seo
Online Film zle
Seo Turkiye
ahin K Günah Keçisi
Black Hat Seo Teknikleri