Slashdot Mirror


OCaml vs. C++ for Dynamic Programming

jcr13 writes "OCaml is nearly as fast (or sometimes even faster) than C, right? At least according to the Computer Language Shootout [alternate] (OCaml supporters often point to these shootout results). My results on a real-world programming problem (optimizing a garden layout using dynamic programming) disagree. On one particular problem instance (a garden of size 7x3), my C++ implementation finished in 1 second, while the OCaml implementation was still running after 16 minutes. Bear in mind that my OCaml implementation was dramatically faster than my equivalent Haskell code. It seems that if you program using a functional style in OCaml (which I did, using map, filter, and other recursive structures in place of loops), it is quite slow. However, most of the shootout OCaml programs rely heavily on OCaml's imperative features (unlike Haskell, OCaml doesn't force you to be a functional purist). If you write OCaml code that is isomorphic to C code, it will be fast---what about if you use OCaml the way it was meant to be used?"

12 of 161 comments (clear)

  1. Hmmm ... by crmartin · · Score: 4, Interesting

    That difference is so dramatic that I wonder if you made a mistake in your functional implementation? Or is there something specific about your dynamic program that makes trouble?

    Dynamic programming depends basically on memoization (not "memorization", before someone complains about my typo) which inherently means preserving some state. If you don't preserve state, it becomes a good old, likely exponential time, recursive program. Any chance your implementation is not memoizing?

  2. My realworld results differ by cthulhuology · · Score: 2, Interesting

    You know I've implemented some real world applications recently for a contract job, and the Ocaml applications are actually faster than the C++ equivalents using the STL. So you mileage may vary based on your problem set (or Ocamlfu as the case may be). As for how Ocaml is supposed to be programmed, there's a reason Ocaml supports imperative programming, because you should use the form that is most efficient for your problem. Some programs benefit from a functional approach (and it helps if you implement properly tailrecursive functions, and make intelligent use of arrays and other block data structures). So one can argue Ocaml is not particularly functional, because it more pragmatically allows for multiple styles of programming. You can do functional programming in C++ actually, but depending on the optimizer you end up with stack issues. My experience with maintaining and extending the Ocaml programs over large C++ code bases is a world of difference. Ocaml wins hands down. Even extending the language to support 3rd party libraries, doesn't place sufficient barriers to maintenance. But ymmv... as with all things.

    1. Re:My realworld results differ by srussell · · Score: 2, Interesting
      They are indeed a part of the language, and definitely a new concept, but monads aren't nearly as confusing as people seem to think, certainly not more confusing than objects
      It isn't that monads are confusing, but that they're a contagion. Now, I'm not a very experienced Haskell programmer; in fact, I'm not a very experienced functional programmer, so I'm probably just making some dumb mistakes... but almost every application that I write goes through the following evolution:
      1. Application starts out at a high level, with function declarations, as I work the problem out. This is extremely elegant and natural, and defines the problem on paper nicely.
      2. Application gains some data structures and function definitions, as I start "filling in the blanks."
      3. At some point, I discover that I can't get around some problem without using monads.
      4. Trigger frantic rewrite of almost all of the code as one seemingly trivial function using monads suddenly requires all the functions in the call stack to use monads.
      Here's a really simple example that is so annoying, I'm convinced there must be a way to do it that I don't know about:

      -- Assume 'qsort' such that:
      qsort :: (a -> a -> Int) -> [a] -> [a]
      -- So that:
      qsort (\x y -> if x<y then -1 else if x>y then 1 else 0) [3,2,1]
      -- Then an array randomizer could be:
      qsort (getRand) somearray
      But not if you want to use randomIO, which provides non-seeded (or, rather, IO seeded) numbers. Once you get that IO monad in there, it makes it impossible to use any standard non-IO aware higher order functions.

      Higher order functions are one of the cool Haskell features, and monads severely restrict their use.

      --- SER

  3. Can you also link to the Haskell code? by shapr · · Score: 2, Interesting

    I'd like to see the Haskell sources for comparison.
    http://minorgems.sf.net/Haskell.hs doesn't exist, though the the C++ and OCaml code are there.

    --

    Shae Erisson - ScannedInAvian.com
  4. Re:Other languages... by Anonymous Coward · · Score: 1, Interesting

    D has no support though. At least when compared to other modern languages (C/C++, C#, Java, Perl, Python, etc.). It really not all that different from C++. It has some better syntax and extra features, but it's simply non-standard and the tiny improvements aren't worth the risk of using something unproven. It is a dead end.

  5. lookup table by jefu · · Score: 2, Interesting
    Looking at the ocaml code and in particular at the functions that handle the memoizing, I'm wondering how big those memo tables are getting. If they get big it seems quite possible that the overhead of doing the lookup (or if the lookup is fast) of doing the insert might not be a problem. It should be easy enough to just look and see how big the table is getting.

    In the same kind of vein, has the code been profiled? I'd quite like to see where the time is going.

  6. Re:Speed alone by ameline · · Score: 2, Interesting

    I know you think you're joking, but you're actually right -- that's what the good programmers do. In my application, the performance critical routines (on the order of a dozen or so) are hand coded in vector assembler (Altivec on Mac, and MMX and SSE on Intel), and are about 3 to 4 times faster than the most optimized C or C++ implementation of those algorithms. If you have code that is vectorizable, and especially if it is doing saturated small integer math, (blending, resampling etc), you can do way better than any existing compilers by hand coding in assembler.

    --
    Ian Ameline
  7. More efficient ocaml version by Anonymous Coward · · Score: 3, Interesting

    I see nothing wrong in your C++ version, while your ocaml version clearly sucks: you are memoizing using a complex key, and an association list, meaning that accessing memoized information costs a lot.

    If you are concerned by performance, you should use a complete cache, like in your C version.
    FYI, I uploaded an ocaml translation of your C code. It doesn't use mutable state except for memoizing, and uses pattern-matching on lists, and recursion rather than for loops, but otherwise it follows closely your code. Performance should be very similar.

    http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/garde n2.ml

    1. Re:More efficient ocaml version by jdh30 · · Score: 2, Interesting

      Interesting, having tidied up the C++ code, this OCaml code is still less than half the length. The original OCaml clearly had several unused functions, several pointless reimplementations of the library functions and many comments regarding Haskell (?!).

      In terms of performance, I get:

      $ ./garden2
      real 0m16.951s
      user 0m16.870s
      sys 0m0.010s

      $ ./Garden
      real 0m10.200s
      user 0m10.160s
      sys 0m0.010s

      So OCaml wins on performance per LOC. :-)

      However, this C++ is also very poorly written IMHO. Specifically, it should use the STL and, particularly, vector<bool> to implement bitmaps. That would be a better contender...

  8. Email response by jcr13 · · Score: 5, Interesting

    > Here's a laundry list of why your O'Caml program in inefficient:
    >
    > 1. You use lists. Lists aren't designed to be fast (computationally)
    > to use. They're designed to be fast (programmatically) to use. You'll
    > be hard pressed to find a production, speed-sensitive Lisp or O'Caml
    > program that uses lists.

    Okay... but here's my point: Every single example that shows how elegant Haskell and OCaml are uses lists. The 4-line Quicksort example for Haskell uses lists. All of the code that demonstrates easy reuse of functions and functions taken as arguments uses lists (like how easy it is to implement quite complicated algorithms using only map and filter, for example).

    So, proponents say "Everyone should use functional languages because they can express complicated problems in elegant ways and result in cleaner, more reusable code."

    But what you're saying in #1 above is that in "production," speed-sensitive code, no one is using lists... this would mean that no one is using map, filter, or any other pieces of reusable primitive code. So, they are instead all using mutable data structures... I.e., they are programming with side-effects and loops (random access instead of recursion, even when ever element of an array/list needs to be accessed/processed).

    That was my point exactly. If you write elegant OCaml code using all of the lovely (and I mean lovely, really) tricks that they present when they demonstrate why OCaml is cool, you end up with code that is too slow to use in the real world.

    I would say that my C++ (or most would call it C) implementation is elegant enough... easy to understand... no messy optimization tricks. Sure, I'm not using objects and templates everywhere, but these structures are hardly needed to solve this simple problem.

    > 2. Practically none of your functions are written tail-recursively.

    Good point.

    > 2.5. You use a list append (@) inside a loop (generateStates).
    > List.append is O(m), where m is the length of its first argument. If
    > you write an implementation, you'll see why. It probably doesn't make
    > much of a difference here (generateStates is only called once) but it's
    > something to watch out for.

    Of course, as you point out, generateStates has almost no effect on the running time. However, I wonder how you might implement that in an elegant way in OCaml without @. In C, I just looped over all numbers between 0 and 2^stateLength and converted the bit representations for the numbers to cell on/off states.

    > 3. For Pete's sake, man, you're using an association list for your
    > memos! Surely you know that lookup in an association list is O(n) in
    > the size of the list.

    I simply Googled for "memoization Ocaml" and found that code:
    http://www.emeraldtiger.net/modules.php?op= modload &name=News&file=article&sid=9

    The author pointed out how "sweet" polymorphism is... one block of code that can be used to memoize any function. Sweet indeed, and it certainly sped up my OCaml code a lot (without memoization, it was so slow as to be intractable for anything larger than about 4x4).

    So... maybe you can re-write higher-order memoization code using more efficient data structures? I would love to see that code, and I'm sure the OCaml community would benefit from having that in their toolbox.

    I agree that the memoization code is probably the problem in the OCaml version. However, this code came directly from the OCaml community and was the *only* example of memoization in OCaml that I could find.

    For Haskell, I used an infinite list of results that was filled in lazily as the results were needed. This also sped up the algorithm dramatically. However, I cannot get a Haskell compiler to compile itself on my platform, so I was testing all code in the Hugs interpreter, which made it too slow to be practical. Isomorphic compiled OCaml code was hundreds of times fast

    1. Re:Email response by Fourier · · Score: 3, Interesting
      So... maybe you can re-write higher-order memoization code using more efficient data structures? I would love to see that code, and I'm sure the OCaml community would benefit from having that in their toolbox.

      You get a significant boost just by dumping the list memoization in favor of a hashtable implementation. I'm not necessarily saying that's the optimal choice, but it's an easy drop-in replacement that is much better suited to the task. Here's a patch:
      --- Garden.ml 2005-03-14 13:22:04.000000000 -0500
      +++ Garden2.ml 2005-03-15 14:38:34.000000000 -0500
      @@ -135,8 +135,8 @@ let costList = map cost allStates;;

      (* Set up an associative list for memoization *)
      -let lookup key table = List.assoc key !table;;
      -let insert key value table = table := (key, value) :: !table;;
      +let lookup key table = Hashtbl.find table key;;
      +let insert key value table = Hashtbl.add table key value;;

      (* memoize any 3-parameter function *)
      @@ -150,7 +150,7 @@ let memoize3 table f x y z =
      result;;

      (* table for memoizing optLayout *)
      -let isCovered_table = ref [];;
      +let isCovered_table = Hashtbl.create 100;;

      (* checks if each cell in center colum is covered by an empty cell *)
      let rec isCovered c1 c2 c3 =
      @@ -266,7 +266,7 @@ and memo_fib n = memoize fib n;;
      *)

      (* table for memoizing optLayout *)
      -let optLayout_table = ref [];;
      +let optLayout_table = Hashtbl.create 100;;

      (*
      Also: learn to use the profiler! It takes about five seconds to see that camlList__assoc is killing you.
  9. Quick Haskell Rebuttal by Anonymous Coward · · Score: 3, Interesting
    This is a 10 minute proof-of-concept that Haskell shouldn't lag as much as claimed. It's hardwired for n*3 grids, doesn't use memoization or arrays, and it solves 7*3 in 15 seconds on my ancient hardware. 15 lines of code, not astoundingly elegant, but no optimization tricks at all. If anyone cares I will write a generalized version to kick C++'s arse later.
    import Word; import Bits; import List

    collength = 7
    full = 2^collength-1

    selfs c = (shiftR c 1) .|. c .|. ((shiftL c 1).&.full)

    invert c = foldl (.) id [if(testBit c i)then(flip setBit (collength-1-i))else id|i<-[0..collength-1]]
    0

    empties c = length [()|i<-[0..collength-1],testBit c i]

    valids = [((c1,c2,c3),e1+e2+e3)
    |(c1,s1,i1,e1)<-c's, (c2,s2,i2,e2)<-c's, (c3,s3,i3,e3)<-c's,
    c1==minimum[c1,i1,c3,i3],
    s1.|.c2==full, c1.|.s2.|.c3==full, c2.|.s3==full]
    where c's = zip4 cs (map selfs cs) (map invert cs) (map empties cs)
    cs = [(0::Word32)..full]

    bests = (best,[cs|(cs,score)<-valids,score==best])
    where (_,scores) = unzip valids
    best = minimum scores