How Twitter Is Moving To the Cassandra Database
MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database. Here's some detailed background on Cassandra, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model." Before settling on Cassandra, the Twitter team looked into: "...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting. ... We're currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets. ... Some side notes here about importing. We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast — it would saturate the backplane of our network. We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster." Relatedly, an anonymous reader notes that the upcoming NoSQL Live conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace.
.
First time I have ever heard anyone say that a database was too fast. Maybe there are network problems that also need to be addressed.
Scaling. If something turns out to be robust and fast enough for Twitter, it is definitely of interest to anyone working on significantly large and busy websites.
De gustibus et coloribus non est disputandum
I think their point is not everything needs an RDBMS, whereas before it was the 'go to' method of storing data.
Or: use the right tool for the job. The only difference is, now alternative tools actually exist.
No way. Their architecture is about as "best guess" engineering as Facebook. I don't think that's actually what engineering is. "Maybe this one will work?"
In the meantime, I have not been able to update my avatar image on Twitter, and TwitPic-like feature is still a faint glimmer in Twitter's amateur eyes. Speaking of missed opportunities, why drive so much traffic to Twitter parasites Bit.ly, TwitPic, TinyURL, Twitition, TwitLonger?
What in the world are Twitter's engineers actually DOING should be the real question.
Kriston