Slashdot Mirror


Pros and Cons of Garbage Collection?

ers asks: "Most new programming languages are using garbage collection, rather than programmer-controlled memory management. The advantages are obvious: programmers no longer have to worry about forgetting to delete allocated memory, leading to far fewer memory leaks. The disadvantages are often glossed over by programming language designers - aside from the performance issues, predictable memory management can be used for controlling access to files and similar resources, creating safer thread locking code and even providing better error messages. Some programming languages, which usually predictable memory management, can also be made to behave like they are garbage collected - for example, Boost provides various C++ smart pointer classes. So, given the choice between garbage collection or manual memory management, which would you choose and why? When using a manual memory management language, when do you consider the performance and syntactic overhead of faked garbage collection to be worthwhile?"

23 of 243 comments (clear)

  1. Depends by Apreche · · Score: 4, Insightful

    It depends on what you are trying to make, duh.

    If you are trying to make something where performance is important, like a 3d game, then manage memory yourself. If you are making a simple business application where reliability and security are important, use garbage collection. If your program uses lots of RAM and you need every last drop either find an expert at RAM management to get every last bit or use garbage collection if your programmers are not so awesome.

    And so on and so on...

    --
    The GeekNights podcast is going strong. Listen!
    1. Re:Depends by swillden · · Score: 4, Insightful

      It depends on what you are trying to make, duh.

      Agreed.

      If you are trying to make something where performance is important, like a 3d game, then manage memory yourself.

      It's not that simple.

      In most cases, the total run-time cost of garbage collection is lower than that of malloc/free memory management, at the cost of higher on-average memory usage (which can obviously destroy performance if you end up having to swap). On the other hand, application-tuned manual memory management using pooled allocation is generally faster than GC. Whether or not pooled allocation increases memory usage as much or more than GC depends on many things. Another consideration is that although GC often consumes less total CPU cycles than malloc/free, non-incremental collectors tend to use those cycles in big batches, which can produce GC 'pauses'. That's bad for some applications. Incremental collectors can minimize this effect, but only with some cost in CPU cycles.

      Then there's also the whole issue of the effect of different approaches on the multi-tiered memory caching in modern systems.

      In short: yes it depends on what you're trying to make. No, it's not nearly as simple an analysis as you describe.

      Not only that, in practice other constraints usually dictate the choice anyway. Using GC generally means using something like Java, C#, Python, etc. rather than C or C++, which brings in a whole raft of other considerations, many of them more important than the memory management discussion. Platform, target environment and libraries will often dictate language selection, which will dictate much of memory management approach.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    2. Re:Depends by Spy+Hunter · · Score: 3, Insightful
      I fail to see how following a chain of references to a memory hog is harder than finding a memory leak which has nothing pointing to it at all. In a garbage-collected application, with a proper debugger and profiler you should not have any trouble figuring out exactly what's taking up every byte of your memory, and once you've done that you can easily figure out who has the references to it. I recommend you take a look at Microsoft's awesome CLR profiler; I'm sure a similar tool exists for Java but it may not be free.

      It is just as easy to keep references to unneeded objects in C++, so C++ can have the same types of so-called "hard to debug" leaks you blame on garbage collection. But on top of that, if you have a true memory leak, C++ doesn't even tell you what's stored in all that leaked memory. You'll just have to trace back to find the last guy anywhere in the code who threw away a pointer without deleting it, and it could easily be very tough to figure out. And C++ doesn't have a magical solution to leaks in third-party libraries either.

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
  2. What "performance issues"? by itistoday · · Score: 4, Insightful

    Garbage collection does not equal poor performance. In some instances, it actually speeds things up--when done properly. Take, for example, the D Programming language. It's just as fast as C (faster in some cases) yet it has a garbage collector. The reason is that most programmers tend to not realize that the free() operation actually takes up a decent amount of CPU cycles, and when you're freeing a bunch of little things all over the place, the overhead tends to add up. With a well-designed garbage collector, however, memory is freed all in one big chunk in a single go, and thereby decreasing that overhead. The myth that garbage collection = poor performance is just that, a myth, and most likely started by people who associate Java's performance issues with garbage collection.

    1. Re:What "performance issues"? by be-fan · · Score: 4, Insightful

      In theory C++ custom allocators let the programmer specify the best behavior for any given situation. In practice, very few people use it except for the simple case of pool allocation (which is an optimization you can make in the more sophisticated GC systems). The problem with the C++ mechanism is that it always exposes 100% of the complexity, even in the 99% of the time that you absolutely don't need it.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:What "performance issues"? by Spy+Hunter · · Score: 3, Insightful

      You can't ignore the complexity of manual memory management. You must free all your allocations, and you must police dangling pointers. C++ exposes that complexity all of the time, even though you only need it occasionally, if ever. You can use a smart pointer class, but the more sophisticated of those are simply slow unsafe reference-counting garbage collectors...

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
  3. Pros and cons by studerby · · Score: 4, Insightful

    As someone who works on long-lived projects with a mid-sized team (a dozen or so developers), I prefer a GC-based language. The biggest pro is the great reduction in memory leaks, closely followed by the productivity increase by not having to think about allocation/deallocation (very much). The biggest con is that far too many "young whippersnappers" seem to think memory allocation/deallocation is therefore "free" in a GC-based language and will take absolutely no care at all about when they allocate (e.g. will allocate a largish object inside a very tight loop instead of allocating it outside and reusing it...). And the 2nd biggest con is that a lot of developers can't believe you can have memory leaks in a GC-based language, won't look for them until you rub their nose in them, and don't really know how to find them when they look.

    --

    .sig generation error:468(3)

    1. Re:Pros and cons by metamatic · · Score: 4, Insightful
      And the 2nd biggest con is that a lot of developers can't believe you can have memory leaks in a GC-based language, won't look for them until you rub their nose in them, and don't really know how to find them when they look.

      I've always thought that the use of the term "memory leak" to describe resource management problems in Java is a really poor choice, as it's quite a different problem from a memory leak in (say) C.

      Keeping memory allocated and referenced for longer than you need it isn't really a leak, to my mind. It's just bad programming. To me, a memory leak is when you lose the pointers to a piece of allocated memory, so the code is no longer able to deallocate it.

      In other words, your developers might give a better answer if you ask "Are there objects you keep around longer than necessary?", rather than "Are there memory leaks?"

      Or maybe I'm the only one.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    2. Re:Pros and cons by Keeper · · Score: 3, Insightful

      Unless the language specifies that you can't have circular references between objects, I would consider that broken: the application can get into a state where orphaned objects are not garbage collected.

      Reference counted garbage collection models are inherantly flawed. Leaks are harder to find and easier to provoke. You might as well not have them if you've got to "delete" the references to the other objects.

      Modern garbage collection algorithms do not have this sort of problem.

      What bothers me about garbage collection is that it only solves part of the problem: memory is not the only resource your application holds onto, and the kludges you have to make to deal with them in garbage collected languages are just annoying (hey, you don't have to worry about cleaning up after an allocation ... unless your object has a handle in it; have fun memorizing which objects you have to twiddle with to "release" the resource). If memory didn't have a different release pattern than other resources it wouldn't be a big deal, but ...

  4. It all depends by unr_stuart · · Score: 3, Interesting

    I've always had the philosophy, "use what makes the job easiest." Typically, this involves garbage-collection. However, one of the biggest problems I have with garbage collection is that you can't have your cake and eat it too. Meaning, you can get all the memory you want, but you can only access it at a high level (think Objects in Java). In C/C++ however, you can call malloc/new, create a big pool of memory (or just a single object), and then do whatever the heck you want with it. But again, as the subject says, it all depends on which method helps get the job done, and so far neither has been perfect for everything.

  5. Re:Situational by Jerf · · Score: 4, Insightful

    If security was in question, I might opt for manual memory managment

    Wha? The evidence is against you. It's not the GC'ed languages that have buffer overflows, and that's the number one security flaw at the moment (though #2, "improperly escaped strings resulting in spilling across a boundary", i.e., XSS, SQL injection, etc. is coming up on it fast as more people use GC'ed languages).

    If security is an issue, you want GC and automatic buffer management like Java, Python, Perl, what have you, not manual management and the resulting opportunities for misallocation like in C and C++.

    (Yeah, yeah, if you program perfect C++ code it's possible to get it right. But I'm not talking theory, I'm talking about what happens in the real world, and in the real world, there seems to be quite a supply of less-than-perfect C/C++ programmers allocating buffers. You have to be on crack to argue otherwise.)

  6. C++ and others.... by try_anything · · Score: 5, Interesting

    C++'s constructor/destructor paradigm with predictable object destruction has the benefit of enabling the RAII (Resource Acquisition Is Initialization) idiom. RAII and exceptions greatly simplify resource management in the presence of error handling. Still, even as someone who knows C++ better than I know any other language, I have to admit that for many applications a garbage collected language puts the least mental burden on programmers and produces the fewest memory errors. The burden of arranging all the extra try/catch blocks in Java (because it lacks RAII) has to be weighed against the burden of investigating and fixing memory management errors in C++, and for people using new/delete, Java wins, IMHO.

    C++ programmers should be making very little use of new and delete, though; they should be using smart pointers. I think the article poster misunderstands smart pointers. boost::shared_ptr is a reference counted pointer, but std::auto_ptr and boost::scoped_ptr have nothing to do with garbage collection - they certainly aren't "faked garbage collection" and they certainly aren't unpredictable. They use C++'s object scoping and copying mechanisms to manage memory in a way completely unlike garbage collection. scoped_ptr is the simplest and most predictable memory management tool of all. Taking programmer error into account, it's more predictable than using delete. Even shared_ptr is predictable; when the reference count falls to zero, the object is immediately destroyed, not just marked for destruction.

    Sadly, although C++ is a very powerful language and can be used to write code with few errors, the language as used by beginners is as dangerous as C, perhaps even more dangerous. It takes programmers years to become proficient in all the methods and idioms that make C++ a usable language.

    (I would love to see a language that allows programmers to choose scoped allocation, smart pointer heap allocation, or garbage-collected heap allocation, and uses types to avoid dangerous combinations such as garbage-collected objects pointing to scoped objects or an object pointing to an object in an unrelated scope. Every object would have two types - the object type (int, file, circle, etc.) and the memory management type (scoped with scope S1, scoped with scope S2, garbage-collected, etc.))

  7. GC by Dr.+Photo · · Score: 5, Informative

    Pros and cons of garbage collection?

    If you don't CONS, you never need to collect garbage. *rimshot*

    More seriously, GC isn't so much about pros and cons, as it is about tradeoffs between the various GC algorithms: time vs. space, low-latency vs. high-throughput, parallelism, etc.

    If you're designing a new language, it should include garbage collection, or nobody will use it (i.e., your target audience can already program in C). You may wish to have multiple GC implementations available for different purposes, perhaps to be selected at compile-time.

    For a good overview of what's available, see http://www.memorymanagement.org/

    My personal favorite is the good old Cheney semi-space collector (and Ephemeral/Generational Garbage Collectors, which are more advanced versions designed to generally have low latency), as it is very straightforward (both to understand and to implement), compacting (it defragments memory, and can perhaps improve cache locality by grouping related objects), and it has high throughput (work is proportional to the amount of live data, not total data).

    If memory usage is of more concern than fragmentation and throughput, a mark-sweep collector may be more your style.

    There are also "real-time" (and "soft-real-time", i.e. bounded latency [see Henry Baker's Treadmill]) collectors, parallel collectors [including an interesting case for reference counting, usually considered a dog performance-wise, as a viable parallel/remote GC method], "conservative" collectors for C/C++ (see Hans-J Boehm's libgc), collectors for real and hypothetical computers with special hardware and/or OS support for GC features, and some collectors that are just plain weird.

    Note also that garbage collection algorithms are considered hard to measure for performance, especially with regard to wall-time latency, so just because a paper(*) claims that a certain GC has certain performance characteristics, be sure to benchmark if it really matters.

    (*) Did I mention papers? If you're serious about implementing GC, getting comfortable reading CS research papers is a must. The book "Garbage Collection" is your best friend here, as it provides a very good overview/survey of said papers and algorithms, and it discusses a lot of pros and cons between various algorithms, and useful variants or adaptations that have been applied to previously-published work.

    Also check out Henry Baker's papers, because he is a memory management demigod: http://home.pipeline.com/~hbaker1/home.html.

  8. Re:C++ basically has it right by try_anything · · Score: 3, Insightful

    I don't think garbage collection implies treating the programmer like an idiot. The programmer's attention is a finite resource that is often better spent on something other than memory management, especially given that garbage collection performs quite adequately for many programs. A Perl, Java, or Lisp programmer isn't an idiot for not doing his own memory management any more a person who doesn't make his own shoes is an idiot.

  9. RAII is a bad reason for manual memory management by GileadGreene · · Score: 3, Insightful

    All of the reasons given for manual memory management seem to boil down to a desire to have support for the Resource Acquisition Is Initialization (RAII) idiom, which is hard to pull off in GC languages. But, the alternative idiom Resource Acquisition Is Invocation provides the desired capability in GC languages. Same capability, no chance of memory leaks. So tell me again why manual memory management might be a good idea?

  10. Explicit management has its own costs by Pseudonym · · Score: 4, Insightful

    The answer, as always, is "it depends". I'm firmly inside the "right tool for the job" camp.

    Manual memory management is not free. In some circumstances, it can be quite expensive. There is a group of programmers who are best described as "rabidly anti-GC". These people are almost all completely unaware of the costs that manual memory management can impose on your code.

    A multi-threaded program, for example, can allocate memory from any arena, but it MUST return a block to the arena from whence it came, which can cause all sorts of difficult lock contention problems, making free() much more expensive than malloc(). (Ask anyone who has written high-performance memory-intensive multi-threaded programs.)

    In some languages, like C, the situation is even worse. In structure-hungry programs, you can end up structuring your code around data lifetimes, which precludes you from using the most natural, maintainable and efficient algorithms. Garbage collection frees you from this, as the GCC people have discovered.

    I do recommend reading Paul Wilson's excellent survey paper on the topic. It answers a lot of your questions, though it's by no means the final word.

    --
    sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  11. Re:C++ basically has it right by Pseudonym · · Score: 3, Insightful
    The C++ model is basically correct.

    On the contrary, the C++ model is basically correct for some applications.

    A "proper program" is programmed in the appropriate language for the job. Sometimes this is a domain-specific language. Sometimes you need the close-to-the-metal-yet-still-maintainable-for-larg e-applications qualities that C++ provides. And sometimes you don't.

    Very few people write web applications in C++, and for good reason. Web servers run at the speed of the network card, not the speed of the L1 cache. Pulling out extra cycles is pointless especially if you lose the maintainability that a general purpose language like C++ provides. And yet you wouldn't call many of these "quick scripting hacks".

    --
    sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  12. Personally... by g-san · · Score: 3, Funny

    I prefer garbage collection. At most, I take the cans to the edge of the driveway and some guy in a noisy truck with a cool robotic arm just hauls it away. Yeah, there is a landfill somewhere that isn't good for the overall environment but I accept that tradeoff. I also don't throw old car batteries into the trash.

    Sure the hell beats me keeping the trash around, remembering where it is, and putting it in my truck and hauling it to the heaping landfill myself. I'm not here to manage trash, I'm here to get something done.

    Is this post about programming?

  13. False dichotomies by Eric+Smith · · Score: 3, Interesting
    Some of the cited advantages of not using garbage collection are red herrings. For instance the "controlling access to files and similar resources" by RAII works fine with garbage collection. In most cases, the compiler can determine by static analysis that a particular object is allocated within a scope and no referenes are propogated upward out of scope, and can remove the reference so the garbage collector will deallocate it (possibly calling a destructor). Depending on the type of GC and its implementation, the compiler may generate code that forces the object to be deallocated immediately.

    For cases where static analysis can't do this automatically, it isn't that hard to use a design methodology that achieves the same result; it's certainly still much easier than doing manual allocation and deallocation and ensuring that the deallocation is done (or not done) correctly in all cases.

    And if you are using a reference-counting GC, or a hybrid GC that includes reference-counting, you don't have to do anything special at all.

    The same applies to the claimed mutex and error message disadvantages, since those are just specific uses of RAII.

  14. Re:C has problems too by be-fan · · Score: 4, Insightful

    It's frightening the allusions programmers have about manual memory management. They seem to think that malloc() and free() are cheap functions, when in reality they can take hundreds of clock cycles. They think that malloc() is deterministic, when in reality, a badly fragmented freelist can cause most malloc() implementations to traipse through the entire heap, just like a GC.

    The weirdest thing is C++ programmers. They freak out about every single cycle, but modern C++ idioms push the use of smart pointers, which are usually quite slow compared to a good generational GC.

    --
    A deep unwavering belief is a sure sign you're missing something...
  15. Garbage collection efficiency overstated by butlerm · · Score: 3, Interesting

    This is a common claim, but it is an apples to oranges comparison. No one (including the compiler) dynamically allocates objects in C/C++ when they can place them on the stack instead. Garbage collected languages like Java, on the other hand, require practically everything to be managed on the heap.

    In addition, an array of objects on the heap requires only a single memory allocation in C or C++, where Java has to allocate and track each separately. As one luminary once said, "C++ is better because there is less garbage to collect."

    That might be acceptable, but the worst part is random application pauses of arbitrary duration for garbage collection. Unless that problem can be resolved, garbage collected languages will be always be a poor match for latency sensitive applications, even where the net throughput is otherwise adequate.

    1. Re:Garbage collection efficiency overstated by swillden · · Score: 4, Interesting

      No one (including the compiler) dynamically allocates objects in C/C++ when they can place them on the stack instead.

      Are you certain of that? Here:

      void foo()
      {
      //...
      auto_ptr<Foo> f(new Foo);
      //...
      };

      What would the compiler do? What *could* it do, if it were smarter? And have you really never seen any code that does this? Or written it?

      Lots of C and C++ programs dynamically allocate many objects that could be heap allocated. In particular, many C++ objects that are placed on the stack immediately allocate storage on the heap. Think std::string. Many programmers do make an attempt to allocate as much on the stack as possible, but I think most don't really consider it. And keep in mind when I say this that I've been writing C and C++ (mostly C++) professionally for nearly 15 years -- I've seen more than a little code.

      Garbage collected languages like Java, on the other hand, require practically everything to be managed on the heap.

      Interestingly, Java does *not* require that at all... it's just the most obvious way to implement it. In fact, I read a while back that the next generation of Java compilers will perform escape analysis, looking for objects whose lifetime is associated with a stack frame. Here's a link. When they find such an object, it will be allocated on the stack. If such an object creates other objects, as long as the analysis can prove that their lifetimes are also frame-associated, they will also be allocate on the stack.

      The same analysis will often allow Java objects and their sub-objects to be allocated as a single block. Since the compiler can see that the constructor of class Foo always allocates objects of Bar and Baz, all of fixed size, it can allocate a single block, just like a C++ compiler would be able to for a class like:

      class Foo
      {
      // ...
      Bar bar;
      Baz baz;
      };

      The same sort of analysis should also allow your other point to be addressed: An array of objects can be allocated as a single block. The compiler can recognize code like:

      Foo[] f = new Foo[n];
      for (int i; i < n; ++i)
      f[i] = new Foo;

      And allocate a single block that is n*(sizeof(Foo)+sizeof(Bar)+sizeof(Baz)) in size, and if 'f' has a stack-associated lifetime, allocate the whole pile on the stack.

      All of the above is still theoretical, of course, but it's coming quickly.

      That might be acceptable, but the worst part is random application pauses of arbitrary duration for garbage collection. Unless that problem can be resolved, garbage collected languages will be always be a poor match for latency sensitive applications, even where the net throughput is otherwise adequate.

      As I pointed out in my previous post, whether or not that problem exists depends on the GC implementation. Incremental GCs keep the pauses small, and there are GCs designed for real-time usage that further guarantee maximum latencies. It's worth pointing out also that normal malloc() and free() implementations don't provide any run-time guarantees. Real-time code that uses a heap uses special versions that do provide guaranteed latencies, at the expense of worse average performance.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  16. Re:Java GC != No leaks by the+eric+conspiracy · · Score: 3, Insightful

    Right now someone I know is trying to track down a Java memory leak.

    Yes, but it is unlikely that somebody you know is trying to track down a Java double free error.