Slashdot Mirror


Cassandra Rewritten In C++, Ten Times Faster

urdak writes: At Cassandra Summit opening today, Avi Kivity and Dor Laor (who had previously written KVM and OSv) announced ScyllaDB — an open-source C++ rewrite of Cassandra, the popular NoSQL database. ScyllaDB claims to achieve a whopping 10 times more throughput per node than the original Java code, with sub-millisecond 99%ile latency. They even measured 1 million transactions per second on a single node. The performance of the new code is attributed to writing it in Seastar — a C++ framework for writing complex asynchronous applications with optimal performance on modern hardware.

11 of 341 comments (clear)

  1. First post by Anonymous Coward · · Score: 5, Funny

    Because it was written in Seastar

  2. %ile? Are We Texting? by Anonymous Coward · · Score: 5, Insightful

    Seriously. WTF?

  3. Lies! by Anonymous Coward · · Score: 5, Funny

    That is a lie!

    I think they mean the C++ port is 10X SLOWER than Java.

    Java is faster than C,C++ everyone knows that!

    Maybe if they ran the code on a java interpreter, written in java, running on a java interpreter...

    More recursive use of java == more speed!

    Why slow a system down with all that C++ bloatware?

    1. Re:Lies! by narcc · · Score: 5, Informative

      It comes from an old (15+ years) defense of Java. The claim was that Java was no longer slow thank to JIT, with HotSpot making it possible for Java code to run faster than equivalent code written in C or C++.

      OP is playing the part of a turn-of-the-century die-hard Java zealot cracking under the harsh light of reality, desperately clinging to their long-cherished beliefs.

  4. Garbage collected virtual machines! by Anonymous Coward · · Score: 5, Insightful

    Almost as fast as native! Maybe even faster for some tasks!

    sure

    1. Re:Garbage collected virtual machines! by fragMasterFlash · · Score: 5, Interesting

      "C++ is my favorite garbage collected language because it generates so little garbage"

      -Bjarne Stroustrup

    2. Re:Garbage collected virtual machines! by Luke+Wilson · · Score: 5, Insightful

      Most of what they've done seems to be rearchitecting, not getting a simple speed boost from using an unmanaged language. They're bypassing the OS to get more locality and cache retention. Those problems would not be addressed by merely rewriting in C++.

      For one, they've replaced the OS network stack with an in-process one, where each thread gets its own NIC queue so they can have "zero-copy, zero-lock, and zero-context-switch[es]"

      They're also keeping more data in memory and eschewing relying on the the OS file cache. It seems like they're taking every opportunity to use the in memory representation to avoid using sstables. They try harder than Cassandra to update instead of invalidate that cache on writes.

    3. Re:Garbage collected virtual machines! by IamTheRealMike · · Score: 5, Informative

      The headline is rather misleading. This isn't just a plain port of the code from Java to C++ to get a magical 10x speedup. Amongst other things they appear to be running an entire TCP stack in userspace and using special kernel drivers to avoid interrupts. This is the same team that produced OSv, an entirely new kernel written in C++ that gets massive speedups over Linux ..... partly by doing things like not using memory virtualisation at all. Fast but unsafe. These guys are hard core in a way more advanced way than just "hey let's switch languages".

  5. Re: Because it was written in Seastar or C++ by Anonymous Coward · · Score: 5, Insightful

    Cassandra is nothing to sneeze out since it outperforms other db-engines (which are written in C, like MySQL).

    Cassandra and MySQL are very different types of databases designed to handle different tasks. It's like saying a hammer is better than the saw without mentioning what job needs to be done with it.

  6. Re:It's a miracle! C++ makes disks spin 10x faster by garethjrowlands · · Score: 5, Insightful

    Databases used to be disk bound, sure. But these days we have huge RAM caches and SSDs - no spinning disks. It's very common for the vast majority of requests to be served entirely from cache. Read the guys' site - it looks like they know what they're doing.

    Imagine if Redis was ten times slower or ten times faster. It would matter.

  7. Re:Because it was written in Seastar or C++ by angel'o'sphere · · Score: 5, Interesting

    I would say that 95% of all people I know in person, who learned C first and not: Assembler, Pascal, SmallTalk, Lisp are extremely bad on advanced language concepts like functional or oo programming. Most of them shifted to scripting and operating servers and don't "code". A minority is doing embedded programming in C++ which mainly looks like C.

    The idea that learning C first has any advantage is completely bollocks, a /. myth.

    I started with C in 1987 ... on Sun Solaris (after 6 years Assembler, Pascal and BASIC) ... 1989 I switched to C++. I never looked back.

    Only masochists would look back at C of that period.

    ANSI C is much better ... but still: when I see a self proclaimed C genius with 30 years experience program Java or C++ ... shudder.

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.