Slashdot Mirror


Cassandra Rewritten In C++, Ten Times Faster

urdak writes: At Cassandra Summit opening today, Avi Kivity and Dor Laor (who had previously written KVM and OSv) announced ScyllaDB — an open-source C++ rewrite of Cassandra, the popular NoSQL database. ScyllaDB claims to achieve a whopping 10 times more throughput per node than the original Java code, with sub-millisecond 99%ile latency. They even measured 1 million transactions per second on a single node. The performance of the new code is attributed to writing it in Seastar — a C++ framework for writing complex asynchronous applications with optimal performance on modern hardware.

16 of 341 comments (clear)

  1. %ile? Are We Texting? by Anonymous Coward · · Score: 5, Insightful

    Seriously. WTF?

  2. Garbage collected virtual machines! by Anonymous Coward · · Score: 5, Insightful

    Almost as fast as native! Maybe even faster for some tasks!

    sure

    1. Re:Garbage collected virtual machines! by Luke+Wilson · · Score: 5, Insightful

      Most of what they've done seems to be rearchitecting, not getting a simple speed boost from using an unmanaged language. They're bypassing the OS to get more locality and cache retention. Those problems would not be addressed by merely rewriting in C++.

      For one, they've replaced the OS network stack with an in-process one, where each thread gets its own NIC queue so they can have "zero-copy, zero-lock, and zero-context-switch[es]"

      They're also keeping more data in memory and eschewing relying on the the OS file cache. It seems like they're taking every opportunity to use the in memory representation to avoid using sstables. They try harder than Cassandra to update instead of invalidate that cache on writes.

    2. Re: Garbage collected virtual machines! by O('_')O_Bush · · Score: 4, Insightful

      It is a reasonable assumption that at most people writing in C++ made it past the first week of a CS101.

      --
      while(1) attack(People.Sandy);
  3. Because it was written in Seastar or C++ by Anonymous Coward · · Score: 2, Insightful

    This is the trademark reason why Java shouldn't be used in performance sensitive environments in the first place.

    As for would it have been any faster if it was written in C or straight ASM, probably not worth chasing down that extra 1%. Generally the justification for straight C or ASM is to remove runtime bloat, and you'd first have to give up using any frameworks to get there.

    Just to remind potential programmers. Lean C before you learn any other programming language, otherwise you will not understand why your code's performance is terrible.

    1. Re: Because it was written in Seastar or C++ by Anonymous Coward · · Score: 5, Insightful

      Cassandra is nothing to sneeze out since it outperforms other db-engines (which are written in C, like MySQL).

      Cassandra and MySQL are very different types of databases designed to handle different tasks. It's like saying a hammer is better than the saw without mentioning what job needs to be done with it.

    2. Re:Because it was written in Seastar or C++ by Dutch+Gun · · Score: 4, Insightful

      Just to remind potential programmers. Lean C before you learn any other programming language, otherwise you will not understand why your code's performance is terrible.

      It may not be apparent even then. Java looks an awful lot like C++ at the code level. So... what's different? Java (and other managed languages like C#) have a bunch of neat features like reflection and automatic memory management, which inherently comes at the cost of runtime efficiency. Simply learning C or C++ won't point out exactly why those languages are so much faster than managed languages. You can write nearly the same code in C++, Java, and C#, and you'll see C++ win performance benchmarks - at least in all but the most contrived examples.

      Among the more significant differences are that C++ compilers are extremely good at optimizing, and C++ code generally compiles down to better cache-coherent structures than other languages. The difference is in the language itself, which adheres to a zero-cost principle, in that you don't pay for features you don't use. A lot of C++ abstractions are eliminated *entirely* at runtime, and are only used to protect the code's integrity during the compilation phase. We were told for years that native-equivalent performance was just around the corner or even already here, and it just never really happened outside of small, contrived benchmarks.

      I don't think it's necessary to always learn C or C++ first, although I do think it's worthwhile to learn it at some point, simply because there's a lot of it out there. I'm primarily a C++ programmer myself, but I tend to be a bit more pragmatic about language preference. Use the language that's right for the job. For example, C is a *horrible* choice if you're writing a simple application that needs to do a bunch of string processing. In many cases, high performance isn't even a consideration, rather than correctness, security, and development speed.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    3. Re:Because it was written in Seastar or C++ by fyngyrz · · Score: 4, Insightful

      For example, C is a *horrible* choice if you're writing a simple application that needs to do a bunch of string processing. In many cases, high performance isn't even a consideration, rather than correctness, security, and development speed.

      That is only true if you haven't written a string processing library. Which pretty much anyone who is going to address tasks like this will do, presuming they just don't go out and find one already written. Same thing for lists, dictionaries, trees, GEOdata, IPs, etc. Whatever. There's nothing that says one has to use C's built-in model for strings, either. Make a better one. It was one of the first things I did, and I did it in assembler, as soon as I ran into the convention of an EOT embedded in the actual text being the end marker -- I thought it was stupid then, and I didn't think a zero was any smarter when C first came to my attention lo those many decades ago. It's also a bear trap anyone can throw a bear into with regard to vulnerabilities -- one that can be entirely obviated by a decent string handling module.

      C isn't a bad language to do *anything* in. It's just a language that requires you to be competent, or better, and to address it through the lens of that competence in order to get enough out of it to make the result and the effort expended worth the candle. And no, if the programmer doesn't write in such a way as to almost always create generally reusable components, I'd not be willing to apply the appellation "competent" to the programmer.

      C's key inherent characteristics are portability, leanness and close-to-the-metal speed. It doesn't hold your hand. It's a language for experienced, skilled programmers when we're talking about creating actual products that are expected to perform in the wild. Lean code isn't nearly the issue it used to be, but it's still "nice" to have.

      --
      I've fallen off your lawn, and I can't get up.
    4. Re:Because it was written in Seastar or C++ by phantomfive · · Score: 3, Insightful

      I would say that 95% of all people I know in person, who learned C first and not: Assembler, Pascal, SmallTalk, Lisp are extremely bad on advanced language concepts like functional or oo programming. Most of them shifted to scripting and operating servers and don't "code". A minority is doing embedded programming in C++ which mainly looks like C.

      Almost no one learns to program in assembler, Pascal, SmallTalk, or Lisp as their first language these days. It's all Python now, or Java.

      --
      "First they came for the slanderers and i said nothing."
  4. I find it depressing... by Anonymous Coward · · Score: 3, Insightful

    I find it depressing that so little attention is paid to efficient computing. People now just throw memory and cycles at problems because they can with passable results. But I wonder how much more we could get out of our machines if software was carefully crafted from bottom to top.

  5. Re:It's a miracle! C++ makes disks spin 10x faster by garethjrowlands · · Score: 5, Insightful

    Databases used to be disk bound, sure. But these days we have huge RAM caches and SSDs - no spinning disks. It's very common for the vast majority of requests to be served entirely from cache. Read the guys' site - it looks like they know what they're doing.

    Imagine if Redis was ten times slower or ten times faster. It would matter.

  6. Rewrites are easier than the first strike by angel'o'sphere · · Score: 4, Insightful

    Wow, two years ago everyone here told us that NoSQL is evil and tried to convince us that we should stick to MySQL.

    Now everyone tells us Java is evil, because a rewrite in C++ is faster.

    What a surprise.

    If I would rewrite Cassandra from scratch, in Java, it also would be faster than the actual code.

    Why? Because all the learning the original team did over a course of a decade I can reuse and improve on.

    Keep in mind, the rewrite uses a new framework and new concepts for concurrency. Concurrency is one of the core areas where computing in future will certainly make lots of progress.

    I for my part I'm waiting for a Lucene rewrite, regardless in what language. Probably the worst OSS code I have ever see ... actually the worst code regardless of OSS or closed source.

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    1. Re:Rewrites are easier than the first strike by Zero__Kelvin · · Score: 1, Insightful

      "If I would rewrite Cassandra from scratch, in Java, it also would be faster than the actual code."

      Great (In the unlikely event you could actually single-handedly rewrite it at all.) Now make it 10 times faster. Nice Red Herring post though!

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  7. Re:Lies! by Anonymous Coward · · Score: 2, Insightful

    And back in 1997 I remember telling a C.S. prof. that java was running like a narcoticized slug. True I was running a 66 MHz '486 at home and the university labs were Sparc 1+'s (and since the profs were running some kind of global process 'lightly' on each, they ran slower than molasses in January anyway), but Java seemed to slow them even more. He told me all the stuff about just in time compilers, byte code yadda yadda. In the end, Java is a flavor of the month from 1997. I like javascript though. Its a crap language, has lots of problems, not that fast, but as a built-in on every browser, it allows more to be done in this medium than 'bolt-ons', is generally cross platform, and easy enough to code that even kids can learn it fast and easy.

  8. Re:Lies! by jandersen · · Score: 2, Insightful

    The claim was that Java was no longer slow thank to JIT, with HotSpot making it possible for Java code to run faster than equivalent code written in C or C++.

    Really? Sounds a bit rich to claim that an interpreted language would be faster than a compiled one, but I suppose if your interpreted program calls into some really well-written libraries, and you compiled program doesn't....

    Be that as it may, I don't think it is all relevant any more. In many practical situations, Java is fast enough, and the fact that it defines and complies with a huge number of valuable standards - and is portable across HW and OS - is the main selling point. It is not a bad language to work with, and there are many practical applications for it. Good enough for the job at hand is, well, good enough.

  9. Re: Lies! by Anonymous Coward · · Score: 2, Insightful

    Read the information on their website. They provide quite a lot of detail. This was hardly a rewrite of cassandra in C++. It is a completely different database system implementing the same protocol as Cassandra. The internal architecture is different. The caching subsystem is different. Threading model is different. The feature set is a fraction of original Cassandra. And none of the things they did there are really exclusively available in C++. It could have been just as well written on Java or C# and still get all the benefits.