Cassandra Rewritten In C++, Ten Times Faster
urdak writes: At Cassandra Summit opening today, Avi Kivity and Dor Laor (who had previously written KVM and OSv) announced ScyllaDB — an open-source C++ rewrite of Cassandra, the popular NoSQL database. ScyllaDB claims to achieve a whopping 10 times more throughput per node than the original Java code, with sub-millisecond 99%ile latency. They even measured 1 million transactions per second on a single node. The performance of the new code is attributed to writing it in Seastar — a C++ framework for writing complex asynchronous applications with optimal performance on modern hardware.
Seriously. WTF?
Almost as fast as native! Maybe even faster for some tasks!
sure
This is the trademark reason why Java shouldn't be used in performance sensitive environments in the first place.
As for would it have been any faster if it was written in C or straight ASM, probably not worth chasing down that extra 1%. Generally the justification for straight C or ASM is to remove runtime bloat, and you'd first have to give up using any frameworks to get there.
Just to remind potential programmers. Lean C before you learn any other programming language, otherwise you will not understand why your code's performance is terrible.
I find it depressing that so little attention is paid to efficient computing. People now just throw memory and cycles at problems because they can with passable results. But I wonder how much more we could get out of our machines if software was carefully crafted from bottom to top.
Databases used to be disk bound, sure. But these days we have huge RAM caches and SSDs - no spinning disks. It's very common for the vast majority of requests to be served entirely from cache. Read the guys' site - it looks like they know what they're doing.
Imagine if Redis was ten times slower or ten times faster. It would matter.
Wow, two years ago everyone here told us that NoSQL is evil and tried to convince us that we should stick to MySQL.
Now everyone tells us Java is evil, because a rewrite in C++ is faster.
What a surprise.
If I would rewrite Cassandra from scratch, in Java, it also would be faster than the actual code.
Why? Because all the learning the original team did over a course of a decade I can reuse and improve on.
Keep in mind, the rewrite uses a new framework and new concepts for concurrency. Concurrency is one of the core areas where computing in future will certainly make lots of progress.
I for my part I'm waiting for a Lucene rewrite, regardless in what language. Probably the worst OSS code I have ever see ... actually the worst code regardless of OSS or closed source.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
And back in 1997 I remember telling a C.S. prof. that java was running like a narcoticized slug. True I was running a 66 MHz '486 at home and the university labs were Sparc 1+'s (and since the profs were running some kind of global process 'lightly' on each, they ran slower than molasses in January anyway), but Java seemed to slow them even more. He told me all the stuff about just in time compilers, byte code yadda yadda. In the end, Java is a flavor of the month from 1997. I like javascript though. Its a crap language, has lots of problems, not that fast, but as a built-in on every browser, it allows more to be done in this medium than 'bolt-ons', is generally cross platform, and easy enough to code that even kids can learn it fast and easy.
The claim was that Java was no longer slow thank to JIT, with HotSpot making it possible for Java code to run faster than equivalent code written in C or C++.
Really? Sounds a bit rich to claim that an interpreted language would be faster than a compiled one, but I suppose if your interpreted program calls into some really well-written libraries, and you compiled program doesn't....
Be that as it may, I don't think it is all relevant any more. In many practical situations, Java is fast enough, and the fact that it defines and complies with a huge number of valuable standards - and is portable across HW and OS - is the main selling point. It is not a bad language to work with, and there are many practical applications for it. Good enough for the job at hand is, well, good enough.
Read the information on their website. They provide quite a lot of detail. This was hardly a rewrite of cassandra in C++. It is a completely different database system implementing the same protocol as Cassandra. The internal architecture is different. The caching subsystem is different. Threading model is different. The feature set is a fraction of original Cassandra. And none of the things they did there are really exclusively available in C++. It could have been just as well written on Java or C# and still get all the benefits.