Slashdot Mirror


Twitter On Scala

machaut writes "Twitter, one of the highest profile Ruby on Rails-backed websites on the Internet, has in the past year started replacing some of their Ruby infrastructure with an emerging language called Scala, developed by Martin Odersky at Switzerland's École Polytechnique Fédérale de Lausanne. Although they still prefer Ruby on Rails for user-facing web applications, Twitter's developers have started replacing Ruby daemon servers with Scala alternatives, and plan eventually to serve API requests, which comprise the majority of their traffic, with Scala instead of Ruby. This week several articles have appeared that discuss this shift at Twitter. A technical interview with three Twitter developers was published on Artima. One of those developers, Alex Payne, Twitter's API lead, gave a talk on this subject at the Web 2.0 Expo this week, which was covered by Technology Review and The Register."

34 of 324 comments (clear)

  1. Should have used PHP. by 0100010001010011 · · Score: 4, Interesting

    Kidding aside, is this a 'nail' in the coffin of scalable Ruby? 5 years ago people were saying the same thing about PHP scaling but Facebook has done a rather nice job of making it scale. Twitter was supposed to be the poster child of how awesome Ruby and RoR was.

    Difference is, Facebook is still using php, Twitter is going toScala.

    1. Re:Should have used PHP. by julesh · · Score: 5, Insightful

      Difference is, Facebook is still using php, Twitter is going toScala.

      PHP was a mature environment when facebook was launched. RoR was (and still is to a certain extent) a fad environment, popular primarily because of its differentness. People who build sites on a platform because it's the latest thing are less likely to stick with that platform than people who choose a platform that has a solid reputation but is boring. Scala, at a guess, is going to be the next fad platform. Like Ruby, it has some interesting ideas behind it, but it needs a lot of development before we can consider a stable platform for serious applications, I think.

    2. Re:Should have used PHP. by mini+me · · Score: 4, Insightful

      While Facebook uses PHP where Twitter uses Rails, Facebook uses a plethora of languages to make the whole system work. So Twitter really isn't going to Scala any more than Facebook is going to Erlang. Which is the say that they use the best tool for the job, not one tool for every job.

    3. Re:Should have used PHP. by K.+S.+Kyosuke · · Score: 4, Insightful

      Ruby is a language. Languages usually don't have problems with scalability.

      --
      Ezekiel 23:20
    4. Re:Should have used PHP. by tcopeland · · Score: 5, Insightful

      > Ruby is a language. Languages usually don't have problems with scalability.

      Quite right. An application with 8 million users will have scalability challenges regardless of what type of language opcodes are being executed. At some point it's all about architecture.

    5. Re:Should have used PHP. by iluvcapra · · Score: 4, Insightful

      RoR was (and still is to a certain extent) a fad environment, popular primarily because of its differentness.

      Huh, I generally use it because it has really good ORM and migrations, and I really like the syntax (coming from Objective-C it's pretty slick. I also used the PHP language when I was starting out, but one day it tried to insist that $myArr[0] and $myArr["0"] actually pointed to the same object, and I have refused to deal with it ever since; I also got tired of typing str_sub_case_insensitive_for_real_safe(haystack, needle) -- or is it needle, haystack? And is this one of those prank functions that fails to substitute the value but still returns a value that evals true? Or if I leave out one of those underscores, am I in fact calling a function that behaves almost exactly the same way but fails under difficult-to-reproduce circumstances? Maybe they've fixed this and the other sundry atrocities? Maybe they've stopped trying to make it into Perl, as compiled by a C++ compiler, and tried fashioning it into an actual dynamic language? I know, I know, some people like PHP, but I think arguments for the superiority of PHP over Ruby (or Python or Scala or Lisp or WebObjects or Perl6 or really anything else) are going to rest completely on the skills of the Zend interpreter writers, and almost never on the quality/readability/maintainability of the code, or the ease of the development process. You can write good safe code in PHP, that is true, but it isn't very ergonomic.

      You know, RoR is really good at replacing those old Paradox and FMP database systems. I can see how Facebook might prefer PHP, but people trying to replace little inventory/business processes systems generally only need to support a few dozen users, and don't have an army of developers to keep it running. The Universe is big enough to accommodate the utility of Ruby on Rails and the Twitter developer's stupidity.

      --
      Don't blame me, I voted for Baltar.
    6. Re:Should have used PHP. by caramelcarrot · · Score: 4, Informative

      Facebook doesn't use PHP for the backend, it's mostly C++, Python and Erlang.

    7. Re:Should have used PHP. by TeXMaster · · Score: 3, Interesting

      The problem is that most of those compiler/interpreters suck enormously.

      Exactly. MRI (Matz' Ruby Interpreter) is known to have some serious scalability issues. Interestingly, one of the main issues with MRI comes from the way gcc compiles the big delegator switch in MRI's core, with a large sparse stack that causes ridiculous memory consumption (and sometimes even leaks). There's a set of 8 patches (the MBARI patchset) that drastically improve the situation. The reduced memory footprint and the much smaller stack also give a noticeable speed increase.

      The good news is, these patches are progressively being merged upstream, so it's very likely that future MRI versions will be much better.

      --
      "I'm never quite so stupid as when I'm being smart" (Linus van Pelt)
    8. Re:Should have used PHP. by lotzmana · · Score: 3, Interesting

      I agree, but wish to add a comment about vertical and horizontal scaling.

      Ruby and Python have poor multi-threading. They don't scale well on multi-CPU platforms.

      from the interview:
      Robey Pointer: Green threads don't use the actual operating systemâ(TM)s kernel threads.

      So, a Ruby application can't scale well vertically -- one can't just upgrade the machine with more CPUs for example.

      At the same time, no language is inherently prohibiting horizontal scaling, if application design provides for it -- adding more machines onto which the application can run in parallel.

      Twitter could've been designed to permit horizontal scaling. Regrettably the article didn't say much about this approach. They are improving the vertical scalability of the application by switching to first-class threads (via the JVM), but are they not eventually going to hit the limits for vertical scaling?!

    9. Re:Should have used PHP. by K.+S.+Kyosuke · · Score: 3, Insightful

      No, my logic is that Ruby is essentially a member of a well-researched class of languages, for which (the class, not Ruby) high-performance VMs have already been developed (Cincom Smalltalk, Gemstone, Self93, Strongtalk...), but this development was always expensive (this was also the case of Java and .NET, obviously). IMO, your shit(ty?) analogy does not apply here.

      --
      Ezekiel 23:20
    10. Re:Should have used PHP. by hhr · · Score: 4, Informative

      The article is about Ruby on Rails. Ruby on Rails is not just a langauge. It is a lanaguage and a web framework. Frameworks very much affect your scalability.

    11. Re:Should have used PHP. by DuckDodgers · · Score: 3, Informative

      Scala has the significant advantage that it's built on Java and interoperable with Java. Scala source code compiles directly into .class files. You get the speed of the JVM (which is acceptably quick these days), the ability to easily call Java APIs from within Scala, and the ability to run your Scala code on any machine with the JVM.

      It's popular to dislike Java, and even as a well paid Java developer I'm not a huge fan of the language. But Java still is extremely common, and you can even write Java code for your Scala code to use while you're learning Scala.

      Scala also keeps Java's strong static typing and adds functional language features. I don't think it needs any development at all to be adapted for mainstream use.

      On the other hand, as a C++ developer I found learning Java to be child's play. The learning curve from Java to Scala, for me at least, is noticeably steeper. If anything kneecaps Scala I suspect it will be the barrier to entry, not the language itself.

    12. Re:Should have used PHP. by kv9 · · Score: 4, Insightful

      I don't get this buzzword-bingo bullshit about Twitter (or whatever the fuck site-du-jour is) in regards to concurrency and scalability. this is not a complex application, it's something that you code one afternoon (in Java/PHP) then throw it in a rack full of HTTP server nodes, a load balancer (shit, even RR-DNS will do) and a RAMSAN for the DB. that's it. stop the drama.

  2. Who gives a shit about twitter? by snowraver1 · · Score: 5, Insightful

    Seriously.

    --
    Copyright 2010. All rights reserved. This comment may not be copied in any way including, but not limited to caching.
  3. Useful! by castorvx · · Score: 5, Insightful

    Twitter using new-and-fancy programming languages has a way of load testing them for all of us.

    I'm not sure I'd have the balls to take a 5 year old development platform/framework and drop it into something that sees so much traffic. Hopefully they share their experiences in some form.

  4. Re:Proving that.. by AKAImBatman · · Score: 3, Insightful

    Twitter's developers care more about being cool and hip

    Not to be too pedantic, but doesn't that sum up Twitter as a whole?

  5. Good thinking, by geekoid · · Score: 3, Funny

    replace one language that wasn't tested on that scale and replace it with another one that wasn't tested on that scale.

    Good thinking~

    Oh look, twitter is down..again.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    1. Re:Good thinking, by The+Slashdolt · · Score: 3, Interesting

      Read this and all will become clear:
      Event-Based Programming without Inversion of Control

      --
      mp3's are only for those with bad memories
    2. Re:Good thinking, by burris · · Score: 4, Interesting

      Maybe they use Scala because writing Java code is painful by comparison. Tons of boilerplate, every exception has to be caught in every scope, no pattern matching, no named arguments, and on and on. For people like me, without Scala the JVM wouldn't even be under consideration, though I admit that Java has been more usable since it got generics.

  6. Re:Scala seems to be Java+/- by geekoid · · Score: 3, Insightful

    It's new! It's hot! it's what all the kids are doing! It's what crappy programmers can pretend to adapt to instead of writing solid code!

    weeee!!!

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  7. Re:Proving that.. by joe_bruin · · Score: 3, Insightful

    Twitter's developers care more about being cool and hip and using the latest tool so that they remain popular, than they do about having a site that stays up 7 days a week.

    Exactly. Scalability problems arise from poor implementation, not from language choices. Scalable platforms have been implemented in the past with PHP, ASP, Perl, C, Java, and I'm sure with Ruby, Python, or your favorite new language. Twitter is a massive-scale site, they should be looking at deep engineering, not a buzzword platform that promises easy scalability for dummies.

    Scala may help them alleviate problems they've hit in the Rails framework. What will help them with the problems they hit in Scala?

  8. Re:Scala seems to be Java+/- by A.K.A_Magnet · · Score: 4, Insightful

    I read between the lines that you call C or C++ solid-code, and if I'm not mistaken, you will find that the kids are doing Scala because the code is more solid. Scala benefits from a typing system close to OCaml's which makes Scala code very, very solid -- especially if you keep away Java specifics (such as nullable objects) in your code and take special care when interacting with Java libs that may do so.

    If I'm mistaken and you're not talking about C/C++, I hope you are not talking about dynamic languages which offer no guarantee whatsoever; you know as a developer I enjoy actually spending my time on working on the business side of my application -- and how to make it scalable, rather than working on low-level specifics and on testing if every pointer is null before dereferencing them. A type system that does this for me (which Scala or ML's parametrized type Option allows) is a bliss.

    Now, I'm not enumerate every language under the sun to see what code you call solid, I guess your answer would be that the code is solid whatever the language it's written in. In the end, it all comes down to binary instructions, right? The question is: how many guarantees do the tools give you? In the case of Scala's compiler, it gives you a lot AND offers you a very enjoyable, lightweight yet powerful syntax.

  9. Aw, Java and Python had a baby! by gpig · · Score: 5, Funny

    Isn't she cute :)

  10. Scala is great by burris · · Score: 5, Interesting

    If I want to use any Java software then I'll use Scala. I see people bashing Scala, saying the languages they know are good enough or they can just use jython/jruby/groovy, but they clearly know little about Scala.

    One thing that's nice about Scala that Java, Jython, JRuby, and Groovy all lack is it's powerful type system and pattern matching. Once you get used to good pattern matching like in Scala, SML, OCaml, or Haskell you won't want to go back. Plus you get all the benefits of running on the JVM at high speed (unlike all the aforementioned JVM languages, except Java itself.)

    Honestly, you should check out Scala before you bash it. It's a very good choice wherever you might choose Java, which is a good choice for the back end. Twitter's developers are smart and experienced. They didn't choose Scala just to be cool. It is a powerful tool that can get the job done in an elegant way.

  11. Mod down by Anonymous Coward · · Score: 5, Funny

    OP is just a twitter sock puppet.

  12. Re:Scala? Not for me! by Anonymous Coward · · Score: 4, Funny

    You know, at first I thought this post was off-topic, because it's an incredibly pointless life update that none of us could possibly care about.

    Then I realized that the story is about Twitter, and suddenly I think it's the most relevant post so far.

  13. There you go again! by Jane+Q.+Public · · Score: 5, Informative

    Ruby does not have a problem scaling. Neither, for that matter, does even Rails. (As the companies that run Basecamp, Campfire, LinkedIn, Lighthouse, and many others will tell you.)

    The fact is that the Twitter folks tried to write their own message queue in Ruby, when there was absolutely no reason to do so: there were plenty of pre-made message queues already available for Ruby, and already optimized. Not only did they choose to write their own, unnecessarily, they did it badly.

    And not only that, but Alex Payne has a hidden agenda: he is trying to push Scala to boost interest in the book about Scala he just wrote!

    Please get some facts before digging up this long-dead and well-buried "Ruby or Rails doesn't scale" bullshit again.

    1. Re:There you go again! by Radhruin · · Score: 5, Interesting

      Anyone who thinks Ruby on Rails can't scale is as dogmatic in their anti-hype as the original hypers were. The right tool for the right job and all that.

    2. Re:There you go again! by tieTYT · · Score: 4, Informative

      Jane Q. Public: Either you didn't read the comments of that blog or you're spreading FUD. Here is a comment from Alex Payne from that article:

      Hoo boy. First of all, I hope you've had a chance to read my general reply to the articles about my Web 2.0 Expo talk [1] and this response to a vocal member of the Ruby community [2]. I sound like a pretty unreasonable guy filtered through the tech press and Reddit comments, but I hope less so in my own words.

      Secondly, the quote at the top of your post is from my coworker, Steve Jenson, who's been participating in the discussion on this post.

      On JRuby: as Steve said, we can't actually boot our main Rails app on JRuby. That's a blocker. Incidentally, if you know of anyone who has a large JRuby deployment, we'd be interested in that first-hand experience. If you don't, it might be a little early to say it would solve all our problems.

      It's also incorrect to say that the way JRuby and Scala make use of the JVM is exactly the same. Much like our other decisions haven't been arbitrary, our decision to use Scala over other JVM-hosted languages was based on investigation.

      On our culture: if you'd like to know about how we write code, or how our code has evolved over time, just ask us. We're all on Twitter, of course, but most of the engineers also have blogs and publish their email addresses. There's no need to speculate. Just ask. There's not a "raging debate" internally because we make our engineering decisions like engineers: we experiment, and base our decisions on the results of those experiments.

      It's definitely true that Starling and Evented Starling are relatively immature queuing systems. I was eager to get them out of our stack. So, as Steve said, we put all the MQ's you think we'd try through their paces not too long ago, and we knocked one after another over in straightforward benchmarks. Some, like RabbitMQ, just up and died. Others chugged on, but slowly. Where we ran into issues, we contacted experts and applied best practices, but in the end, we found that Kestrel fit our particular use cases better and more reliably. This was not the hypothesis we had going into those benchmarks, but it's what the data bore out.

      We get a lot of speculation to the tune of "why haven't those idiots tried x, it's so obvious!" Generally, we have tried x, as well as y and z. Funnily enough, I was actually pushing to get us on RabbitMQ, but our benchmarks showed that it just wouldn't work for us, which is a shame, because it advertises some sexy features.

      Personally, I'm extremely NIH-averse; I research open source and commercial solutions before cutting a new path. In the case of our MQ, one of our engineers actually wrote Kestrel in his free time, so it was bit more like we adopted an existing open source project than rolled our own. Pretty much the last thing we want to be doing is focusing on problems outside our domain. As it so happens, though, moving messages around quickly is our business. I don't think it's crazy-go-nuts that we've spent some time on an MQ.

      I hope my colleagues and I have been able to answer some of your questions. As I said, in the future, please consider emailing us so we can share our experience. Then, we can have a public discussion about facts, not speculation. Perhaps, as commenter sethladd suggested, the onus is on us to produce a whitepaper or presentation about our findings so as to stave off such speculation. Time constraints are the main reason why we haven't done so.

      [1] http://al3x.net/2009/04/04/reasoned-technical-discussion.html
      [2] http://blog.obiefernandez.com/content/2009/04/my-reasoned-response-about-scala-at-twitter.html#IDComment18212539

  14. Here is an interesting discussion on alternatives by tieTYT · · Score: 5, Informative

    http://unlimitednovelty.com/2009/04/twitter-blaming-ruby-for-their-mistakes.html

    This blog post takes the attitude that Twitter didn't move to Scala because ROR had a problem, but because the in-house messaging system Twitter created performed poorly. The author does not work at Twitter but many of the Twitter developers (including Alex Payne) respond in the comments. I found the article to be very interesting and the comments even more so. They give a sense of how much research Twitter did before this change.

  15. Re:Proving that.. by burris · · Score: 3, Informative

    According to the rebuttals in the comments of the blog post in one of my sibling posts here, part of Twitter's scalability problem was poor implementation of the Ruby interpreter. Lots of small objects cause the heap to get fragmented and eventually it runs out of memory. Java interpreters have better GC and you can swap out different GC algorithms in some of them.

    Why does everyone assume the people at Twitter are a bunch of newbies who don't know about deep engineering? Is it just because their analysis didn't lead them to your preferred buzzword?

  16. Psht. by kkrajewski · · Score: 5, Funny

    I program in PDP-11 assembly, which is then translated into C, compiled into Java bytecode, and executed on a JVM. I call it Assemblacava, and it's the wave of the future.

  17. Actually, this is pretty complex by Stu+Charlton · · Score: 5, Interesting

    Twitter is not a trivial application to scale, considering the wide disparity in listeners to follower ratios, that views are dynamically generated by interpolating many-to-many message streams, and that each message is persistent forever.

    As an analogy, It's like managing an IRC server, with persistent messages that are full-text indexed, with one channel per user, and unlimited number of users can join each other's channels. When you join a new user's channel, your chat log is automatically (and quickly) re-woven with messages from that channel according to relative time series of these messages. And, there's a global channel that everyone can watch to see what any user in any channel is saying at any time.

    Now do this, all the while avoiding netsplits (i.e. missing messages), allowing retracts of almost message, recent or historical, and ensuring the channel history (eventually) reflects that change. And handle sudden bursts of activity among unpredictable sets of channels because they're all attending the same conference, or a burst of network-wide high activity because people are watching the World Cup or Obama's inauguration.

    The point is that, while the idea is simple, the variability of use and disparity of activity is what makes life interesting; the messaging & DB architecture that works well for recent activity, for example, doesn't help for having reasonable persistent random-access to historical messages.

    In all, Twitter has gotten a *lot* more reliable the past several months than it was a year ago.

    --
    -Stu
  18. scala vs erlang by xkcd150 · · Score: 3, Informative

    i figured a lot of people would mention erlang, and thought someone might be interested in this writeup i read the other day http://yarivsblog.com/articles/2008/05/18/erlang-vs-scala/