Twitter On Scala
machaut writes "Twitter, one of the highest profile Ruby on Rails-backed websites on the Internet, has in the past year started replacing some of their Ruby infrastructure with an emerging language called Scala, developed by Martin Odersky at Switzerland's École Polytechnique Fédérale de Lausanne. Although they still prefer Ruby on Rails for user-facing web applications, Twitter's developers have started replacing Ruby daemon servers with Scala alternatives, and plan eventually to serve API requests, which comprise the majority of their traffic, with Scala instead of Ruby. This week several articles have appeared that discuss this shift at Twitter. A technical interview with three Twitter developers was published on Artima. One of those developers, Alex Payne, Twitter's API lead, gave a talk on this subject at the Web 2.0 Expo this week, which was covered by Technology Review and The Register."
Kidding aside, is this a 'nail' in the coffin of scalable Ruby? 5 years ago people were saying the same thing about PHP scaling but Facebook has done a rather nice job of making it scale. Twitter was supposed to be the poster child of how awesome Ruby and RoR was.
Difference is, Facebook is still using php, Twitter is going toScala.
Well... just to name a few:
This, of course, does not make Twitter a panacea, but it certainly makes it interesting enough to warrant the occasional Slashdot article.
If I want to use any Java software then I'll use Scala. I see people bashing Scala, saying the languages they know are good enough or they can just use jython/jruby/groovy, but they clearly know little about Scala.
One thing that's nice about Scala that Java, Jython, JRuby, and Groovy all lack is it's powerful type system and pattern matching. Once you get used to good pattern matching like in Scala, SML, OCaml, or Haskell you won't want to go back. Plus you get all the benefits of running on the JVM at high speed (unlike all the aforementioned JVM languages, except Java itself.)
Honestly, you should check out Scala before you bash it. It's a very good choice wherever you might choose Java, which is a good choice for the back end. Twitter's developers are smart and experienced. They didn't choose Scala just to be cool. It is a powerful tool that can get the job done in an elegant way.
Read this and all will become clear:
Event-Based Programming without Inversion of Control
mp3's are only for those with bad memories
Anyone who thinks Ruby on Rails can't scale is as dogmatic in their anti-hype as the original hypers were. The right tool for the right job and all that.
Maybe they use Scala because writing Java code is painful by comparison. Tons of boilerplate, every exception has to be caught in every scope, no pattern matching, no named arguments, and on and on. For people like me, without Scala the JVM wouldn't even be under consideration, though I admit that Java has been more usable since it got generics.
Please get some facts before digging up this long-dead and well-buried "Ruby or Rails doesn't scale" bullshit again.
Says the person getting his/her 'facts' from Obie Fernandez's insane flamebait blog post? Holy crap, dude!
When the Twitter folks wrote their own message queue, there were very limited options on the market. Seeing as Obie Fernandez has failed to even begin to explain, in technical terms (rather than saying "... made me throw up in my mouth") what's wrong with their implementation, forgive me if I don't consider this damning evidence
Moreover, if you're going to reference Basecamp, Campfire, Lighthouse, et al, perhaps you should also reference the ridiculous effort and resources that they expend in scaling Rails?
Rabid fanboyism does Rails and Ruby a disservice. I wouldn't touch that community with a 10 foot pole.
From the Scala website:
In what parallel universe it is difficult to build a message queue capable of handling 83 messages per second? I built a fault-tolerant group message passing system 10 years ago that handled 30,000 messages per second on a dinky machine. Hell, Oracle's built in message queue system can handle more than 83 messages per second with ACID!
I will never, ever, ever understand the engineering choices of the Twitter team.
pageviews do not suddenly get easier to service because that page has a video on it.
No, they get considerably harder. Hulu, if I remember correctly, dynamically alters the bitrate to compensate slow connections and improve the quality on faster ones. It also puts a load on the server that's throwing out the video regardless of the bitrate.
You're using Alexa's rank rather than their pageviews, which shows a considerably different picture. Also, unless I'm missing something, linkedin is written in java, not ruby on rails, it just had a rather high profile experiment with ruby on rails and a facebook app.
Twitter is not a trivial application to scale, considering the wide disparity in listeners to follower ratios, that views are dynamically generated by interpolating many-to-many message streams, and that each message is persistent forever.
As an analogy, It's like managing an IRC server, with persistent messages that are full-text indexed, with one channel per user, and unlimited number of users can join each other's channels. When you join a new user's channel, your chat log is automatically (and quickly) re-woven with messages from that channel according to relative time series of these messages. And, there's a global channel that everyone can watch to see what any user in any channel is saying at any time.
Now do this, all the while avoiding netsplits (i.e. missing messages), allowing retracts of almost message, recent or historical, and ensuring the channel history (eventually) reflects that change. And handle sudden bursts of activity among unpredictable sets of channels because they're all attending the same conference, or a burst of network-wide high activity because people are watching the World Cup or Obama's inauguration.
The point is that, while the idea is simple, the variability of use and disparity of activity is what makes life interesting; the messaging & DB architecture that works well for recent activity, for example, doesn't help for having reasonable persistent random-access to historical messages.
In all, Twitter has gotten a *lot* more reliable the past several months than it was a year ago.
-Stu