Slashdot Mirror


Cassandra Rewritten In C++, Ten Times Faster

urdak writes: At Cassandra Summit opening today, Avi Kivity and Dor Laor (who had previously written KVM and OSv) announced ScyllaDB — an open-source C++ rewrite of Cassandra, the popular NoSQL database. ScyllaDB claims to achieve a whopping 10 times more throughput per node than the original Java code, with sub-millisecond 99%ile latency. They even measured 1 million transactions per second on a single node. The performance of the new code is attributed to writing it in Seastar — a C++ framework for writing complex asynchronous applications with optimal performance on modern hardware.

51 of 341 comments (clear)

  1. First post by Anonymous Coward · · Score: 5, Funny

    Because it was written in Seastar

  2. %ile? Are We Texting? by Anonymous Coward · · Score: 5, Insightful

    Seriously. WTF?

    1. Re:%ile? Are We Texting? by RightwingNutjob · · Score: 4, Funny

      Well, let's see. % means its a conversion code, l means the converted quantity is a long, i means its an integer, so a long integer, but e means it's a float to be converted to exponential notation. But it was supposed to be an integer. Does not compute.

  3. Lies! by Anonymous Coward · · Score: 5, Funny

    That is a lie!

    I think they mean the C++ port is 10X SLOWER than Java.

    Java is faster than C,C++ everyone knows that!

    Maybe if they ran the code on a java interpreter, written in java, running on a java interpreter...

    More recursive use of java == more speed!

    Why slow a system down with all that C++ bloatware?

    1. Re:Lies! by narcc · · Score: 5, Informative

      It comes from an old (15+ years) defense of Java. The claim was that Java was no longer slow thank to JIT, with HotSpot making it possible for Java code to run faster than equivalent code written in C or C++.

      OP is playing the part of a turn-of-the-century die-hard Java zealot cracking under the harsh light of reality, desperately clinging to their long-cherished beliefs.

    2. Re:Lies! by Anonymous Coward · · Score: 2, Insightful

      And back in 1997 I remember telling a C.S. prof. that java was running like a narcoticized slug. True I was running a 66 MHz '486 at home and the university labs were Sparc 1+'s (and since the profs were running some kind of global process 'lightly' on each, they ran slower than molasses in January anyway), but Java seemed to slow them even more. He told me all the stuff about just in time compilers, byte code yadda yadda. In the end, Java is a flavor of the month from 1997. I like javascript though. Its a crap language, has lots of problems, not that fast, but as a built-in on every browser, it allows more to be done in this medium than 'bolt-ons', is generally cross platform, and easy enough to code that even kids can learn it fast and easy.

    3. Re:Lies! by Anonymous Coward · · Score: 3, Informative

      Not all true. Over the years I have compared "slow" languages Lisp, Java, and .Net to the "fast" C. For various odd reasons the slow languages were faster each time.

      The modern Jit compilers have a huge advantage over C because they can do whole of program optimization and they can aggressively inline. Sure, one can declare C methods inline, but I compared Java and .Net to real production code where the programmers forgot to. So in practice the slow languages really were faster. And in-lined routines kill binary compatibility, particularly for access methods of opaque types -- not a problem for Java/.Net. Modern garbage collection is often actually faster than malloc/free, and certainly faster than any reference counting schemes.

      Sure, there are some idiot restrictions in Java that make .Net faster, such as no structs and no way to turn off array bounds checking. But C is a technology that was out of date when it was first introduced 30 years ago. If any C compiler can beat .Net for some particular case it will be by a few tens of percent at most.

      If this speed increase is real, it is nothing to do with C vs Java.

    4. Re:Lies! by jandersen · · Score: 2, Insightful

      The claim was that Java was no longer slow thank to JIT, with HotSpot making it possible for Java code to run faster than equivalent code written in C or C++.

      Really? Sounds a bit rich to claim that an interpreted language would be faster than a compiled one, but I suppose if your interpreted program calls into some really well-written libraries, and you compiled program doesn't....

      Be that as it may, I don't think it is all relevant any more. In many practical situations, Java is fast enough, and the fact that it defines and complies with a huge number of valuable standards - and is portable across HW and OS - is the main selling point. It is not a bad language to work with, and there are many practical applications for it. Good enough for the job at hand is, well, good enough.

    5. Re:Lies! by Dr_Barnowl · · Score: 4, Informative

      Yeah, it's more to do with using a framework that helps with the aggressive use of computer resources than being in one language over another.

      Some of the latency gains might be down to C++ vs Java, but the throughput is probably because the CPU is less idle.

    6. Re:Lies! by phantomfive · · Score: 4, Informative

      Really? Sounds a bit rich to claim that an interpreted language would be faster than a compiled one,

      The reasoning is because any bottleneck in code will be in a loop (or recursion, or whatever).
      Java is roughly only interpreted on the first iteration of a loop, when it gets compiled by JIT. After that, it's assembly code, just like C.
      Add to that, there are some optimizations that can be done at run-time by the JIT that can't be done at compile time.

      These are typically the reasons people claim Java is faster than C or C++.

      Also, it seems the Java creators at Sun were really competitive and got upset when people said their language was slower than C++, so they spent a lot of time optimizing the efficiency of their standard library, more than the C++ compiler writers of the time.

      --
      "First they came for the slanderers and i said nothing."
    7. Re:Lies! by TheRaven64 · · Score: 4, Informative

      Why are you talking about an interpreted program? We're specifically talking about JIT-compiled Java. Modern JITs use trace-based optimisation, which means that they will generate straight-line binary code for hot paths that span multiple method calls and returns. This is something that an AoT-compiled implementation can't do without a lot of profiling information. A JIT compiler can also optimise based on assumptions that are true for one phase of the program, then throw away the result if it stops being true for a later phase.

      There are also other trades. For example, if you're writing memory-safe C++ and sharing pointers across threads, then you're going to be using std::shared_ptr, which performs an atomic operation (MESI bus traffic) on every assignment. In a typical JVM, copying pointers doesn't require atomic operations, but the cost of this is the GC pass. Depending on your workload, the GC cost can be a lot cheaper, a lot more expensive, or about the same as doing it correctly with smart pointers.

      Unfortunately, a big part of the current 'Java is slow' claim is from idiots who don't understand that different GC implementations are all on a spectrum trading throughput for latency and who then build big distributed systems where tail latency in the edge nodes is important, then run a throughput-optimised stop-the-world collector on the edges and wonder why it sucks.

      --
      I am TheRaven on Soylent News
    8. Re:Lies! by gbjbaanb · · Score: 2

      Except....

      The modern JIT compilers do not do the kind of performance optimisations that they could, in theory, do. It simply costs too much in development time to cater for all the combinations and possibilities, or it costs more much in CPU time to calculate the optimisations than it would save in executing the slower code.

      GC is faster than malloc for allocating, but when it comes to deallocating.. its a lot slower. Obviously, it has to do a lot more like compact the heap which is a pretty slow operation.

      Most functions in C are inlined - the compiler has plenty of time to optimise a C program (ie the compile stage is a lot slower) that is can decide which functions are better inlined (based on size and/or call quantity).

      Now you don't tend to notice much performance difference because modern CPUs are sitting spinning their wheels with cycles to spare - so the slower Java/.NET code still runs roughly as fast. But when you get server-side code that heavily uses the hardware - like this Database system - then you are really going to notice the difference.

      There's one killer way to know if Java/.NET code is in fact slower than C++ native code: if the companies that produce it decide to create a native compiled version of their language. Which Microsoft has finally decided to do - .NET Native is building .NET source using the old C++ backend technology to produce native binaries. Microsoft says this makes their .NET programs run 30% faster. Still not as fast as C++ due to design limitations such as greater memory usage than a C++ program, but still - faster than old-style bytecode .NET.

      The proof that C/C++ is faster than Java/.NET doesn't get more damning than that.

    9. Re:Lies! by NostalgiaForInfinity · · Score: 3, Interesting

      It comes from an old (15+ years) defense of Java. The claim was that Java was no longer slow thank to JIT, with HotSpot making it possible for Java code to run faster than equivalent code written in C or C++.

      High performance software requires several things, among them good native code generation and good libraries. Java used to have neither, then it got the JIT. Unfortunately, Java's semantics and built-in data types make writing high performance software in it really hard.

      C++ started out with good native code generation, and its standard library and built in data types make writing high performance software a bit easier if you know what you are doing. Most C++ programmers don't know what they are doing, though, so their software ends up bloated and inefficient anyway.

    10. Re:Lies! by SQLGuru · · Score: 2

      An *that* is why I think JavaScript should be the first language that new devs learn. I agree it sucks as a language, but it is SOOOO accessible with plenty of examples (good and bad) online. But the barrier to entry is essentially nil. Almost any computer (and certainly any computer released since Windows 95) comes with everything you need to get started --- a text editor and a browser (and modern browsers also include developer tools).

      I'll take my offtopic mod back to my cave, now.

    11. Re: Lies! by Anonymous Coward · · Score: 2, Insightful

      Read the information on their website. They provide quite a lot of detail. This was hardly a rewrite of cassandra in C++. It is a completely different database system implementing the same protocol as Cassandra. The internal architecture is different. The caching subsystem is different. Threading model is different. The feature set is a fraction of original Cassandra. And none of the things they did there are really exclusively available in C++. It could have been just as well written on Java or C# and still get all the benefits.

    12. Re:Lies! by T.E.D. · · Score: 2

      Still, there's no good reason for it to be that much faster, unless the Java was also incredibly crappily written (which is quite likely). They are both compiled languages, written at about the same level. Something seriously wrong must have been going on in that code.

      Note: This is from a professional C++ developer, who also happens to have done his Master's thesis on Compiler Construction. Admittedly, I don't care that much about Java, except in theory. Have only used it a few times. But either there was something going on with that Java program I'm seriously missing, or the rewrite speedup wasn't primarily about language.

  4. Garbage collected virtual machines! by Anonymous Coward · · Score: 5, Insightful

    Almost as fast as native! Maybe even faster for some tasks!

    sure

    1. Re:Garbage collected virtual machines! by fragMasterFlash · · Score: 5, Interesting

      "C++ is my favorite garbage collected language because it generates so little garbage"

      -Bjarne Stroustrup

    2. Re:Garbage collected virtual machines! by Luke+Wilson · · Score: 5, Insightful

      Most of what they've done seems to be rearchitecting, not getting a simple speed boost from using an unmanaged language. They're bypassing the OS to get more locality and cache retention. Those problems would not be addressed by merely rewriting in C++.

      For one, they've replaced the OS network stack with an in-process one, where each thread gets its own NIC queue so they can have "zero-copy, zero-lock, and zero-context-switch[es]"

      They're also keeping more data in memory and eschewing relying on the the OS file cache. It seems like they're taking every opportunity to use the in memory representation to avoid using sstables. They try harder than Cassandra to update instead of invalidate that cache on writes.

    3. Re:Garbage collected virtual machines! by IamTheRealMike · · Score: 5, Informative

      The headline is rather misleading. This isn't just a plain port of the code from Java to C++ to get a magical 10x speedup. Amongst other things they appear to be running an entire TCP stack in userspace and using special kernel drivers to avoid interrupts. This is the same team that produced OSv, an entirely new kernel written in C++ that gets massive speedups over Linux ..... partly by doing things like not using memory virtualisation at all. Fast but unsafe. These guys are hard core in a way more advanced way than just "hey let's switch languages".

    4. Re: Garbage collected virtual machines! by O('_')O_Bush · · Score: 4, Insightful

      It is a reasonable assumption that at most people writing in C++ made it past the first week of a CS101.

      --
      while(1) attack(People.Sandy);
    5. Re:Garbage collected virtual machines! by DrXym · · Score: 2
      John Carmack writes games for a living. Games have to bend or break every rule to get as much performance out of the software. e.g. writing a square root approximation instead of calling a more expensive but proper square root function. It does not mean that those techniques are universally applicable or even represent good programming practice.

      As for Java vs C++, I expect that most enterprises put stability and portability well above raw speed when developing software. Modern JVMs do just in time compilation meaning the performance differential is fairly slight particularly for IO / database bound middleware.

  5. Because it was written in Seastar or C++ by Anonymous Coward · · Score: 2, Insightful

    This is the trademark reason why Java shouldn't be used in performance sensitive environments in the first place.

    As for would it have been any faster if it was written in C or straight ASM, probably not worth chasing down that extra 1%. Generally the justification for straight C or ASM is to remove runtime bloat, and you'd first have to give up using any frameworks to get there.

    Just to remind potential programmers. Lean C before you learn any other programming language, otherwise you will not understand why your code's performance is terrible.

    1. Re:Because it was written in Seastar or C++ by Anonymous Coward · · Score: 2, Funny

      But, but... Java is enterprisey!

    2. Re: Because it was written in Seastar or C++ by Anonymous Coward · · Score: 2, Interesting

      Cassandra is nothing to sneeze out since it outperforms other db-engines (which are written in C, like MySQL).

      Anyhow, you use the right tool for the job, and the big question is: would ScyllaDB even exist if Cassandra wasn't written first?

    3. Re: Because it was written in Seastar or C++ by Anonymous Coward · · Score: 5, Insightful

      Cassandra is nothing to sneeze out since it outperforms other db-engines (which are written in C, like MySQL).

      Cassandra and MySQL are very different types of databases designed to handle different tasks. It's like saying a hammer is better than the saw without mentioning what job needs to be done with it.

    4. Re:Because it was written in Seastar or C++ by Dutch+Gun · · Score: 4, Insightful

      Just to remind potential programmers. Lean C before you learn any other programming language, otherwise you will not understand why your code's performance is terrible.

      It may not be apparent even then. Java looks an awful lot like C++ at the code level. So... what's different? Java (and other managed languages like C#) have a bunch of neat features like reflection and automatic memory management, which inherently comes at the cost of runtime efficiency. Simply learning C or C++ won't point out exactly why those languages are so much faster than managed languages. You can write nearly the same code in C++, Java, and C#, and you'll see C++ win performance benchmarks - at least in all but the most contrived examples.

      Among the more significant differences are that C++ compilers are extremely good at optimizing, and C++ code generally compiles down to better cache-coherent structures than other languages. The difference is in the language itself, which adheres to a zero-cost principle, in that you don't pay for features you don't use. A lot of C++ abstractions are eliminated *entirely* at runtime, and are only used to protect the code's integrity during the compilation phase. We were told for years that native-equivalent performance was just around the corner or even already here, and it just never really happened outside of small, contrived benchmarks.

      I don't think it's necessary to always learn C or C++ first, although I do think it's worthwhile to learn it at some point, simply because there's a lot of it out there. I'm primarily a C++ programmer myself, but I tend to be a bit more pragmatic about language preference. Use the language that's right for the job. For example, C is a *horrible* choice if you're writing a simple application that needs to do a bunch of string processing. In many cases, high performance isn't even a consideration, rather than correctness, security, and development speed.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    5. Re:Because it was written in Seastar or C++ by angel'o'sphere · · Score: 5, Interesting

      I would say that 95% of all people I know in person, who learned C first and not: Assembler, Pascal, SmallTalk, Lisp are extremely bad on advanced language concepts like functional or oo programming. Most of them shifted to scripting and operating servers and don't "code". A minority is doing embedded programming in C++ which mainly looks like C.

      The idea that learning C first has any advantage is completely bollocks, a /. myth.

      I started with C in 1987 ... on Sun Solaris (after 6 years Assembler, Pascal and BASIC) ... 1989 I switched to C++. I never looked back.

      Only masochists would look back at C of that period.

      ANSI C is much better ... but still: when I see a self proclaimed C genius with 30 years experience program Java or C++ ... shudder.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    6. Re:Because it was written in Seastar or C++ by fyngyrz · · Score: 4, Insightful

      For example, C is a *horrible* choice if you're writing a simple application that needs to do a bunch of string processing. In many cases, high performance isn't even a consideration, rather than correctness, security, and development speed.

      That is only true if you haven't written a string processing library. Which pretty much anyone who is going to address tasks like this will do, presuming they just don't go out and find one already written. Same thing for lists, dictionaries, trees, GEOdata, IPs, etc. Whatever. There's nothing that says one has to use C's built-in model for strings, either. Make a better one. It was one of the first things I did, and I did it in assembler, as soon as I ran into the convention of an EOT embedded in the actual text being the end marker -- I thought it was stupid then, and I didn't think a zero was any smarter when C first came to my attention lo those many decades ago. It's also a bear trap anyone can throw a bear into with regard to vulnerabilities -- one that can be entirely obviated by a decent string handling module.

      C isn't a bad language to do *anything* in. It's just a language that requires you to be competent, or better, and to address it through the lens of that competence in order to get enough out of it to make the result and the effort expended worth the candle. And no, if the programmer doesn't write in such a way as to almost always create generally reusable components, I'd not be willing to apply the appellation "competent" to the programmer.

      C's key inherent characteristics are portability, leanness and close-to-the-metal speed. It doesn't hold your hand. It's a language for experienced, skilled programmers when we're talking about creating actual products that are expected to perform in the wild. Lean code isn't nearly the issue it used to be, but it's still "nice" to have.

      --
      I've fallen off your lawn, and I can't get up.
    7. Re:Because it was written in Seastar or C++ by UnknownSoldier · · Score: 2

      > Among the more significant differences are that C++ compilers are extremely good at optimizing,

      LOL. No they aren't

      Mike Acton gave an excellent talk Code Clinic 2015: How to Write Code the Compiler Can Actually Optimize where he picked an integer sequence to optimize the run-time to calculate the sequence. Techniques include: memoization, and common sub-term recognition. For 20 values pre-optimization time was: 31 seconds, post-optimization time was: 0.01 seconds.

      Linked above.

    8. Re:Because it was written in Seastar or C++ by RightwingNutjob · · Score: 2

      Lean code is always an issue. If your code incurs a x2 to x10 overhead associated with the virtual machine, that's either 2-10x the hardware you need to spend money on to achieve the same throughput as before, and 2-10x the electric bill for compute-intensive applications. If you're nowhere near the limit of your box, you don't notice. If you've got rooms upon rooms of computers doing the same thing, and you're writing your code in not C/C++, then you're wasting money.

    9. Re:Because it was written in Seastar or C++ by phantomfive · · Score: 3, Insightful

      I would say that 95% of all people I know in person, who learned C first and not: Assembler, Pascal, SmallTalk, Lisp are extremely bad on advanced language concepts like functional or oo programming. Most of them shifted to scripting and operating servers and don't "code". A minority is doing embedded programming in C++ which mainly looks like C.

      Almost no one learns to program in assembler, Pascal, SmallTalk, or Lisp as their first language these days. It's all Python now, or Java.

      --
      "First they came for the slanderers and i said nothing."
    10. Re:Because it was written in Seastar or C++ by Dutch+Gun · · Score: 2

      I'm not trying to slam C. You can do just about anything in C - that's one of it's strengths. I'm just pointing out that it's not the *optimal* choice for certain types of tasks, in my opinion. C has advantages in it's relative simplicity, portability, and power. Moreover, it works very well as a "least common denominator" language, in that nearly every other language can easily interop with it because of it's stable ABI. This is why nearly every OS and many widely-used libraries are written in pure C.

      That being said, it's very easy to make simple mistakes in C that can cause significant security or stability issues. The notion that you simply need to be "competent" in C to avoid making critical mistakes is a fallacy. ALL programmers make mistakes, as we're only human. There's really no getting around the fact that the language is inherently dangerous.

      One of C++'s strength is that it allows you to create zero-cost or near-zero-cost abstractions that prevent the programmer from making those mistakes. C simply doesn't have those same mechanisms. Of course, this comes at the cost of language complexity.

      There's a reason we have a wide variety of languages available - they all have different strengths and weaknesses. As for anyone believes a single language to be the end-all and be-all, I'd submit they simply don't have enough experience in other languages to make that judgement.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    11. Re:Because it was written in Seastar or C++ by Dutch+Gun · · Score: 2

      You have to consider that compilers are also going to perform a wide variety of micro-optimizations that humans simply couldn't do on a massive scale, over millions of lines of code. No one would argue that a compiler can radically restructure your algorithms during optimization, because it doesn't know which side-effects are acceptable and which are not. So, yes, human programmers still need to be aware of how to structure code for best results on a given platform.

      Of course, you can always find specific examples to highlight where optimizing compilers fail badly, and then build examples to highlight that deficiency. I took a quick look at that video, and to be honest, it seems a bit contrived, although interesting enough that I may watch it in full later. Thanks for the link.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    12. Re:Because it was written in Seastar or C++ by UnknownSoldier · · Score: 2

      The points are two fold:

      1. Naive use of algorithms and OOP without understanding the data flow will always be slower then understanding and optimizing for the (data) cache usage.

      Pitfalls of Object Oriented Programming

      2. C/C++ compilers do a really shitty job of optimizing even trivial code.

      CppCon 2014: Mike Acton "Data-Oriented Design and C++"

      Mike demonstrates a simple example where a bool member flag is used as a test. MSVC does a horrible job at O2; Clang does a much better job, but still crappy. (Note: Using a different compiler backend on MSVC wasn't even an option until recently.)

      Even a more slightly more complicated example blows up:

      struct Foo
      {
          bool m_NeedParentUpdate;
          int Bar( int count );
          int Baz( int count );
      };
       
      int
      Foo::Bar( int count )
      {
          int value = 0;
          for( int i=0; i < count; i++ )
          {
              if( m_NeedParentUpdate )
              {
                  value++;
              }
          }
          return value;
      }
       
      int
      Foo::Baz( int count )
      {
          int value = 0;
          for( int i=0; i < count; i++ )
          {
              if( Bar( count ) > 0 )
              {
                  value++;
              }
          }
          return value;
      }

      *Ugh.*

      Why am I forced to remove ghost reads and writes and manually hoist common static evaluation out of loops?? Why can't the compiler deduce this information?

      As Mike says, the compiler is not a magic wand. It is really good at only a _few_ transformations; it is really stupid about most other ones.

    13. Re:Because it was written in Seastar or C++ by dgatwood · · Score: 2

      I would say that 95% of all people I know in person, who learned C first and not: Assembler, Pascal, SmallTalk, Lisp are extremely bad on advanced language concepts like functional or oo programming.

      Not sure why people learning Pascal, assembler, or Lisp first would be better at OO. There's nothing OO about any of those. I would turn that around and say that 95% of programmers are bad at OO programming, period, regardless of what language they started with. Most folks frequently forget what are, IMO, some of the most basic rules of OO:

      1. A class should be a mostly self-contained. If you have two or three classes that are tightly coupled, you should probably merge them into a single class unless there are data structure reasons not to do so.

      2. A subclass should be used only when all of the following are true:

        • You need two objects that behave very differently, but share some common behaviors
        • The differences are easily achieved by replacing a few specific chunks of functionality that are reused in multiple places
        • The classes need to coexist

        Otherwise, if the behavior differences are minor, or if the functionality you're changing requires hundreds of tweaks, each of which is different from the next, you're probably better off using a property that switches from one behavior to the other and just using "if" statements.

      3. Never create a subclass until you are ready to create at least two subclasses at the same time. Otherwise, you will invariably get the boundary between subclass and superclass wrong and will have to refactor it again anyway.
      4. When you've created two or more classes that are substantially similar, that's the appropriate time to step back and ask yourself if you should create a superclass and fold the common functionality into it.
      5. If your subclasses are more than about one or two levels deep (unless you're just using subclasses because your language lacks categories/class extensions), you almost certainly have a serious design problem.
      6. Document your methods, their side effects, and their expected calling conventions early so that when you refactor it into a couple of subclasses, you can be certain that they all obey they same rules.
      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    14. Re:Because it was written in Seastar or C++ by ZeroConcept · · Score: 2, Interesting

      1. C is not portable, it's tied to the architectures/OSs/APIs the programmer chose to target at write time.
      2. Leanness and close-to-the-metal speed are irrelevant in most business scenaios (time to market rules, cores and memory are commodity, see ABAP and related monsters successfully running most of the world transactions regardless of C).
      3. C is not a language meant to implement business solutions, it's a wrapper for ASM for idiots who can't write ASM themselves.(rethorical)
      4. Writing string processing libraries is tuff stuff, text can have different endings, (rethorical)
      5. You haven't done anything really complicated that requires your focus to shift away from "bare-metal" to "time-to-value", by your own logic ASM is better than C.

    15. Re:Because it was written in Seastar or C++ by goose-incarnated · · Score: 2

      I would turn that around and say that 95% of programmers are bad at OO programming, period, regardless of what language they started with.

      That's because 95% of written OO solutions don't fit an OO domain. There is this myth that OO is the best we have, but in reality OO is very counter-intuitive to the human brain. Most OO solutions would be better off structured. The human brain handles that much better than OO.

      --
      I'm a minority race. Save your vitriol for white people.
    16. Re:Because it was written in Seastar or C++ by omfgnosis · · Score: 2

      Your characterization of functional programming is pretty astonishing, to me as a person who was most recently employed writing and maintaining software in Clojure. The following are all surely idioms, features or possibilities of one or another FP language/approach, but none of them are essential to FP:

      1. "creating a list of items and combining operators, then magically evaluating those combinators all at once and getting a cake" — I think the reality is that this can be overdone (and sometimes it is), but for people who live and breathe the abstractions that come with a FP language, it tends to reflect the expressive qualities of core concepts and operators-as-functions. (reduce + 0 args) is just a whole lot more expressive, when familiar, than total = 0; for (val in args): total += val; return total.

      2. "lazy evaluation" — I'm not really sure this needs to be specific to FP, but it does really shine with abstractions often associated with FP (e.g. map, reduce, filter, and so on).

      3. "macros" — I cherry-picked this away from the C preprocessor reference to contrast with lispy/homoiconic macros, which are hardly the nightmare of metaprogramming in languages with every-which-way syntax, but even when macros are sane they tend to be strongly discouraged.

      4. "monads" — I see monads far more in imperative code than functional, so I don't even. (I'm talking about the lovely Promise pattern, which is excellent.) Then again, I don't write Haskell.

      I'd characterize FP differently: a function of a given set of arguments should always return the same value, without side-effects; in non-pure languages, add the caveat unless you have a good reason, which should be identified clearly in code; with that caveat, you can kinda do FP in any language (though it may be horribly inefficient).

      In other words, stateful code is hard to reason with, and the greatest care should be taken to making it clear when it cannot or must not be avoided. This should be true in any programming environment.

      You're right that people tend to mentally model in terms of objects—that is, the combination of data and action. But it doesn't have to be stateful by default, and there's a lot to gain from reversing that.

      - - -

      A little anecdote: for kinda silly reasons, I've been doing some one-off work in Node.js after spending a good long while in Clojure. It came time for me to reverse an array, and I wrote code like the following: var foo = bar.reverse();. With almost no real JS work for over a year, I could not recall the following:

      1. Will bar be reversed?
      2. Does Array#reverse actually return a reversed array value, or something else?

      I couldn't know what that very simple code does, simply by reading it. I had to go evaluate it! It turns out that Array#reverse is a bit unusual in JS as it is both stateful (it reversed bar) and it sort of looks like it returns a value (it returned a reference to bar). That code which should do an obvious thing actually caused a bug, because I had to make guesses about state, and I guessed wrong.

      - - -

      There are reasons to choose a stateful implementation—performance, expressiveness, interop with stateful environments or libraries—but it's not a given, and not even always consistent with a normal mental model, for statefulness to be assumed.

  6. Out of points or would mod up by Wrexs0ul · · Score: 2

    Sans sarcasm I would've also accepted: "duh"

    --
    --- Need web hosting?
  7. Now returns null pointers in half the time! by jlowery · · Score: 2, Funny

    They also boosted performance by never freeing memory, too!

    --
    If you post it, they will read.
    1. Re:Now returns null pointers in half the time! by The+Evil+Atheist · · Score: 2

      The 90s called. They want their joke back.

      --
      Those who do not learn from commit history are doomed to regress it.
  8. In other News by s.petry · · Score: 4, Funny

    Oracle has just launched a new series of patent infringement lawsuits. Oracle allegations include reverse engineering Java to improve the speed of applications like Cassandra, benchmarking Java without permission. They are seeking an immediate cease and desist order, in addition to immediate financial relief for sustaining PPS (More commonly known as Poopy Pants Syndrome.).

    --

    -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

  9. It's a miracle! C++ makes disks spin 10x faster! by iamacat · · Score: 2, Interesting

    Databases are usually I/O bound and improvement of storage structure/network protocol is more important than spot optimization of code. A more likely statement is that scylladb performed ten times faster than Cassandra in one particular benchmark for which Cassandra has not been specifically optimized for yet and is ten percent faster in an average case.

    In either case, good luck maintaining speed and stability after 5 releases when you implement every corner case of every feature and have to deal with legacy support.

  10. I find it depressing... by Anonymous Coward · · Score: 3, Insightful

    I find it depressing that so little attention is paid to efficient computing. People now just throw memory and cycles at problems because they can with passable results. But I wonder how much more we could get out of our machines if software was carefully crafted from bottom to top.

  11. Re:It's a miracle! C++ makes disks spin 10x faster by garethjrowlands · · Score: 5, Insightful

    Databases used to be disk bound, sure. But these days we have huge RAM caches and SSDs - no spinning disks. It's very common for the vast majority of requests to be served entirely from cache. Read the guys' site - it looks like they know what they're doing.

    Imagine if Redis was ten times slower or ten times faster. It would matter.

  12. Rewrites are easier than the first strike by angel'o'sphere · · Score: 4, Insightful

    Wow, two years ago everyone here told us that NoSQL is evil and tried to convince us that we should stick to MySQL.

    Now everyone tells us Java is evil, because a rewrite in C++ is faster.

    What a surprise.

    If I would rewrite Cassandra from scratch, in Java, it also would be faster than the actual code.

    Why? Because all the learning the original team did over a course of a decade I can reuse and improve on.

    Keep in mind, the rewrite uses a new framework and new concepts for concurrency. Concurrency is one of the core areas where computing in future will certainly make lots of progress.

    I for my part I'm waiting for a Lucene rewrite, regardless in what language. Probably the worst OSS code I have ever see ... actually the worst code regardless of OSS or closed source.

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    1. Re:Rewrites are easier than the first strike by pestilence669 · · Score: 2

      I do agree that a language-to-language rewrite would yield impressive gains... but that's not the whole of it. Cassandra is an edge case ... and yes, the Lucene code could use some love (contribute some patches??)

      C++ isn't necessarily the best choice for everything, just like a Mclaren F1 isn't the optimum choice to pick up groceries. But if your requirements dictate that performance is a chief priority, it most certainly is.

      I've written many Java and C++ systems at scale. Java simply does not excel at maximizing the use of system resources, predictable real-time performance nor high uptime. Stomp your feet all you want and pretend it's not true if you like. Java trades off performance to provide features to developers that they cannot override. Fact.

      Java is fine for 99% of most everything ever written. Honest. Cookbook blog: great! That 1% though where every bit matters, that's when you take off the training wheels and code as low as you can tolerate or afford.

      What Java zealots in the Cassandra and Hadoop communities kept boasting was the idea that vertical performance doesn't matter anymore. Solve all of the problems with JVM unreliability and poor performance under the umbrella of big data and more hardware. This makes sense at a few dozen servers. It's insane when you start considering scale at 100s or 1000s.

      I hope DataStax considers making Cassandra more cost effective. The simplest way is to get rid of the JVM and give me a machine code binary. I'd really like to throw 128GB of RAM to my nodes, but Java won't let me.

    2. Re:Rewrites are easier than the first strike by Ace17 · · Score: 2

      The sole action of rewriting a piece of code doesn't magically make it faster. Nor does it make cleaner, or more stable. You have to put something more into the equation. Like a new programming language, or a new team. You might want to have a look at Joel Spolsky's post named "Things You Should Never Do", explaining why a big rewrite is almost always a bad idea.

  13. But is it web scale? MongoDB is web scale... by Anonymous Coward · · Score: 2, Funny

    I will only use MongoDB because it is web scale.

  14. Re:It's a miracle! C++ makes disks spin 10x faster by SQL+Error · · Score: 2

    Yes. It's now easy to scale to a million or more IOPS on a single server. That makes the CPU the bottleneck again.