How Twitter Is Moving To the Cassandra Database
MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database. Here's some detailed background on Cassandra, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model." Before settling on Cassandra, the Twitter team looked into: "...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting. ... We're currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets. ... Some side notes here about importing. We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast — it would saturate the backplane of our network. We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster." Relatedly, an anonymous reader notes that the upcoming NoSQL Live conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace.
heh heh heh.
Nostalgia's not what it used to be.
I hear Cassandra can even predict when disastrous system failures are going to occur! Unfortunately, for some reason nobody ever believes the warnings.
.
First time I have ever heard anyone say that a database was too fast. Maybe there are network problems that also need to be addressed.
Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?
who cares what twuufter is running off.
The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.
If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs? Databases are not being replaced by NoSQL in projects that need databases. The projects that may not have ever needed databases may benefit by this NoSQL idea, but if you actually need a database... well, you better be really good at working around all kinds of problems that this will create for you.
I think that relational databases are good at what they do and that many projects may not need them, but if you do need them on the back end, you will end up with them on the back end. Of-course there maybe some caching/hashmaps/files on the front end but at the back stuff will be sorted out within a real datastore.
Is there really a huge issue with rdbms speeds? Well if there is something there, that's what needs to be looked at. If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.
You can't handle the truth.
Twitter's only moving to this new database written in Java because everyone else is.
Mod me down, my New Earth Global Warmingist friends!
I hear Cassandra is really a trojan. Can anyone verify? I don't want a trojan on my computer.....
LedgerSMB: Open source Accounting/ERP
I love how ass backwards twitter has always been with learning how to scale their 90s infrastructure up. I remember when they called out the Ruby community because they didn't understand MySQL replication and memcached.
I guess without a profit model they couldn't use a real RDBMS like Oracle. EFD (Enterprise Flash Drive) support anyone? 11g supports EFD on native SSD block-levels. Write scale? How about 1+ million transactions/sec on a single node Oracle DB using <$100K worth of equipment and licenses? Anyway, I've built HUGE databases for a long time, odds are most of you have interfaced with them. Just because it's free and open-source doesn't make it cheap.
I love FOSS don't get me wrong, but best-in-class is best-in-class. I only use FOSS when it happens to be best-in-class. I laugh at how none of the requirements included disaster recovery. No single point of failure does not preclude failing at every point simultaneously. EMP bomb at your primary datacenter anyone?
Until recently I thought the same way, I would never endorse a solution that involves java. However
a recently came to the same realization that sun did when they created it. Java is a fantastic
way to over sell gobs of expensive hardware. I am a system administrator so the more hardware it takes to
run a solution the better off I am, more machines, more money and better job security. So I have now
fully jumped on the java bandwagon, java makes me smile.
Got Code?
Sure - but I think the whole point is that you'd be smiling even more if they were using one of the modern & trendy dynamic languages because you'd likely have 2 - 3 times the amount of hardware to look after. I'm not sure what alternative you would propose that uses less hardware but there actually aren't many that are better than the JVM these days.