Slashdot Mirror


Pros and Cons of Garbage Collection?

ers asks: "Most new programming languages are using garbage collection, rather than programmer-controlled memory management. The advantages are obvious: programmers no longer have to worry about forgetting to delete allocated memory, leading to far fewer memory leaks. The disadvantages are often glossed over by programming language designers - aside from the performance issues, predictable memory management can be used for controlling access to files and similar resources, creating safer thread locking code and even providing better error messages. Some programming languages, which usually predictable memory management, can also be made to behave like they are garbage collected - for example, Boost provides various C++ smart pointer classes. So, given the choice between garbage collection or manual memory management, which would you choose and why? When using a manual memory management language, when do you consider the performance and syntactic overhead of faked garbage collection to be worthwhile?"

31 of 243 comments (clear)

  1. C++ basically has it right by keesh · · Score: 2, Insightful

    The C++ model is basically correct. It doesn't treat the programmer like an idiot (which admittedly may be a problem if you have idiot programmers), and it gives you the choice of how to handle memory allocation. The lack of reference-counted pointer in the standard library is a bit of a bitch, but the Boost shared pointer templates will likely make it into C++0x, and it's only a hundred lines of code or so to make your own in the mean time.

    Of course, the C++ model is not perfect either. Lack of virtual and const constructors can be a nuisance (the workaround being the pimpl idiom and a shared pointer), and not being able to use shared pointers to functions without nasty syntactic hackery occasionally breaks the "stuff pretending to be a pointer" illusion. Still, the power it gives over the Java model is definitely worth the occasional bit of extra effort.

    Then again, if you're coding some quick scripting hack rather than a proper program, who cares about memory allocation?

    1. Re:C++ basically has it right by try_anything · · Score: 3, Insightful

      I don't think garbage collection implies treating the programmer like an idiot. The programmer's attention is a finite resource that is often better spent on something other than memory management, especially given that garbage collection performs quite adequately for many programs. A Perl, Java, or Lisp programmer isn't an idiot for not doing his own memory management any more a person who doesn't make his own shoes is an idiot.

    2. Re:C++ basically has it right by Pseudonym · · Score: 3, Insightful
      The C++ model is basically correct.

      On the contrary, the C++ model is basically correct for some applications.

      A "proper program" is programmed in the appropriate language for the job. Sometimes this is a domain-specific language. Sometimes you need the close-to-the-metal-yet-still-maintainable-for-larg e-applications qualities that C++ provides. And sometimes you don't.

      Very few people write web applications in C++, and for good reason. Web servers run at the speed of the network card, not the speed of the L1 cache. Pulling out extra cycles is pointless especially if you lose the maintainability that a general purpose language like C++ provides. And yet you wouldn't call many of these "quick scripting hacks".

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    3. Re:C++ basically has it right by Oscaro · · Score: 2, Insightful

      I don't think garbage collection implies treating the programmer like an idiot. The programmer's attention is a finite resource that is often better spent on something other than memory management.

      I have mixed feelings regarding garbage collection. Sure enough when people are learning how to write programs, it's far better to use a language garbage collection, so that one has to really understand what's happening. Also, having to manually keep track of your data can lead to cleaner code (I know one can write clean code in Java, too, but given the "lazy" nature of people (or the strict timeframes) this is really not too common).

      On the other hand, your argument is painfully valid. The programmer's attention (and time) is a finite (and scarce) resource.

  2. Depends by Apreche · · Score: 4, Insightful

    It depends on what you are trying to make, duh.

    If you are trying to make something where performance is important, like a 3d game, then manage memory yourself. If you are making a simple business application where reliability and security are important, use garbage collection. If your program uses lots of RAM and you need every last drop either find an expert at RAM management to get every last bit or use garbage collection if your programmers are not so awesome.

    And so on and so on...

    --
    The GeekNights podcast is going strong. Listen!
    1. Re:Depends by ivan256 · · Score: 2, Insightful

      Actually, it seems to me that if you want reliability, maintainability, and perhaps most important, debugability, you want to manage your memory yourself.

      When debugging a program with a leak (Yes, garbage collected programs have leaks too, they're just nastier, because they don't look like bugs because a reference is persisting somewhere.) if memory is program managed, finding the leak is a deterministic process. You're guaranteed success in a well-defined, and finite amount of time (The amount of time it takes to reproduce the leak plus the amount of time it takes to build the application, plus the amount of time it takes to get a basic understanding of the application structure). When you're debugging a leak in a memory managed environment, you're limited to the tools provided, which are pretty much universally not good at finding the worst kinds of leaks, because to the tools your leak looks like perfectly valid utilization. You're really doomed if your leak is in a third party library. You may never track it down, or even find the culprit. (Any Jakarta users out there?)

      Sure, languages without garbage collection may be more prone to leaks, but I'd rather have more leaks that I can fix, than fewer leaks that I can't...

      Not that garbage collection doesn't have it's place. If you're application is complex enough though, allocating your own memory is the least of your worries.

      If your program uses lots of RAM and you need every last drop either find an expert at RAM management to get every last bit

      I don't care what kind of programmer you are, you should have a good foundation in certain things. Data structures, obviously. Discrete logic. Algorithmic complexity, including the ability to read and understand Big O notation (wspecially true of java programmers)... And a complete knowledge of basic memory management. Even if you're going to use garbage collection. Without these things, you're doomed to write poor programs. If you don't understand how a utility library works, you shouldn't be using it.

    2. Re:Depends by swillden · · Score: 4, Insightful

      It depends on what you are trying to make, duh.

      Agreed.

      If you are trying to make something where performance is important, like a 3d game, then manage memory yourself.

      It's not that simple.

      In most cases, the total run-time cost of garbage collection is lower than that of malloc/free memory management, at the cost of higher on-average memory usage (which can obviously destroy performance if you end up having to swap). On the other hand, application-tuned manual memory management using pooled allocation is generally faster than GC. Whether or not pooled allocation increases memory usage as much or more than GC depends on many things. Another consideration is that although GC often consumes less total CPU cycles than malloc/free, non-incremental collectors tend to use those cycles in big batches, which can produce GC 'pauses'. That's bad for some applications. Incremental collectors can minimize this effect, but only with some cost in CPU cycles.

      Then there's also the whole issue of the effect of different approaches on the multi-tiered memory caching in modern systems.

      In short: yes it depends on what you're trying to make. No, it's not nearly as simple an analysis as you describe.

      Not only that, in practice other constraints usually dictate the choice anyway. Using GC generally means using something like Java, C#, Python, etc. rather than C or C++, which brings in a whole raft of other considerations, many of them more important than the memory management discussion. Platform, target environment and libraries will often dictate language selection, which will dictate much of memory management approach.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    3. Re:Depends by toddbu · · Score: 2, Insightful
      not sure how you can call the former unpredictable...

      Because in many systems that employ GC, they try to free resources on background threads for performance. The problem is that a resource can be held way beyond what the developer expects, and suddenly they get faults happening in totally unrelated sections of code. I've seen it a million times before, and I personally think that it's one of the biggest weaknesses of the CLR. When a function is done with a resource, clean it up right then and there. At least then you'll have some chance of figuring out what's going wrong.

      Oh, and for all you folks that say that nothing should ever crash in cleanup, think again. Microsoft will tell you that publically, but there are lots of things that you can do to make an app crash in GC, like manually cleaning up some stuff yourself before exiting the function. The GC routine will get very confused trying to clean up a partially cleaned up object.

      --
      If you don't want crime to pay, let the government run it.
    4. Re:Depends by alienw · · Score: 2, Insightful

      Are you joking? A large fraction of bugs in software are due to mismanaged memory. This is one of the main reasons Java apps have much, much better reliability than C++ ones. Without a garbage collector, many types of (perfectly legitimate) structures become very difficult to use. When you create objects in one module and give them to someone else, you create bugs. Then you have to come up with some kind of reference counting system anyway.

      Yes, garbage collected programs have leaks too, they're just nastier, because they don't look like bugs because a reference is persisting somewhere.

      That's not a leak, it's sloppy programming. Are you saying it is better to leave stray pointers around and potentially crash the program or corrupt data?

    5. Re:Depends by Spy+Hunter · · Score: 3, Insightful
      I fail to see how following a chain of references to a memory hog is harder than finding a memory leak which has nothing pointing to it at all. In a garbage-collected application, with a proper debugger and profiler you should not have any trouble figuring out exactly what's taking up every byte of your memory, and once you've done that you can easily figure out who has the references to it. I recommend you take a look at Microsoft's awesome CLR profiler; I'm sure a similar tool exists for Java but it may not be free.

      It is just as easy to keep references to unneeded objects in C++, so C++ can have the same types of so-called "hard to debug" leaks you blame on garbage collection. But on top of that, if you have a true memory leak, C++ doesn't even tell you what's stored in all that leaked memory. You'll just have to trace back to find the last guy anywhere in the code who threw away a pointer without deleting it, and it could easily be very tough to figure out. And C++ doesn't have a magical solution to leaks in third-party libraries either.

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
    6. Re:Depends by ivan256 · · Score: 2, Insightful

      so C++ can have the same types of so-called "hard to debug" leaks you blame on garbage collection

      I said they become more difficult to debug because of garbage collection. They're certainly not caused by the garbage collection. They're caused (usually) by poor programming.

      Garbage collection is a tool. It makes your job as a programmer easier, but it does not free you from the need to understand things like scope. Just because you don't have to worry about the mechanics of managing your memory, you still need to understand how it works, or you are going to write crappy code. Code that leaks.

      But on top of that, if you have a true memory leak, C++ doesn't even tell you what's stored in all that leaked memory.

      That's because you don't need the programming language to tell you that. There are plenty of ways you can make it easy to tell what's in your leaked memory. And then when you're done debugging, you can turn them off and reduce your memory footprint.

    7. Re:Depends by toddbu · · Score: 2, Insightful
      You seem to be under the impression that all GC languages are made by Microsoft.

      Not at all. I was just citing the CLR as one example since it's fairly widely used. You'd also think with all that we've learned about GC on a background thread that Microsoft would have done something different for their new programming environment, but that wasn't the case.

      I never heard of anyone having a GC-related debugging problem (as in real bug, not performance) for programs written in one of those languages.

      Do those languages perform GC in the background, or do they free resources as they're no longer needed? Do they access external resources such as a database, where holding resources can exhaust the available pool, thereby creating artifically scare resources (which is a bug in my book)?

      Most languages that do GC on a background thread have trouble under heavy load because they end up spending lots of time trying to figure out whether they should or shouldn't be cleaning up memory. Also, because most GC systems aren't aware of the difference between physical and virtual memory, they can induce high levels of page faulting even though they think that there's plenty of memory available. Any system that performs GC in the background is going to suck in a high performance environment.

      --
      If you don't want crime to pay, let the government run it.
  3. What "performance issues"? by itistoday · · Score: 4, Insightful

    Garbage collection does not equal poor performance. In some instances, it actually speeds things up--when done properly. Take, for example, the D Programming language. It's just as fast as C (faster in some cases) yet it has a garbage collector. The reason is that most programmers tend to not realize that the free() operation actually takes up a decent amount of CPU cycles, and when you're freeing a bunch of little things all over the place, the overhead tends to add up. With a well-designed garbage collector, however, memory is freed all in one big chunk in a single go, and thereby decreasing that overhead. The myth that garbage collection = poor performance is just that, a myth, and most likely started by people who associate Java's performance issues with garbage collection.

    1. Re:What "performance issues"? by be-fan · · Score: 4, Insightful

      In theory C++ custom allocators let the programmer specify the best behavior for any given situation. In practice, very few people use it except for the simple case of pool allocation (which is an optimization you can make in the more sophisticated GC systems). The problem with the C++ mechanism is that it always exposes 100% of the complexity, even in the 99% of the time that you absolutely don't need it.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:What "performance issues"? by treerex · · Score: 2, Insightful

      The reason is that most programmers tend to not realize that the free() operation actually takes up a decent amount of CPU cycles, and when you're freeing a bunch of little things all over the place, the overhead tends to add up.

      This depends entirely on the underlying memory manager. Using pooled allocation or other "zone-based" allocators can obviate the hit of these frees. As with many things, it's a tradeoff between the time spent putting a block back on its free list (naive implementation) to storing appropriate metadata with each allocated block to "deallocate" it in almost constant time. There is nothing magical about GC here.

      With a well-designed garbage collector, however, memory is freed all in one big chunk in a single go, and thereby decreasing that overhead.

      Sure, memory is freed in one chunk, but you forget the time spent finding unreferenced blocks and copying them (I assume, since you imply blocks are coallesced into one big block).

      The myth that garbage collection = poor performance is just that, a myth, and most likely started by people who associate Java's performance issues with garbage collection.

      Because those of us who have used Java since the mid-90s remember when the first JVM's GC sucked like a giant black whole. I remember at OOPSLA '99 Sun had their GC engineers walking around in garbage man's overalls to show that they were serious about improving GC performance in the language.

      GC performance issues have been around a lot longer than Java, by at least three decades.

    3. Re:What "performance issues"? by Spy+Hunter · · Score: 3, Insightful

      You can't ignore the complexity of manual memory management. You must free all your allocations, and you must police dangling pointers. C++ exposes that complexity all of the time, even though you only need it occasionally, if ever. You can use a smart pointer class, but the more sophisticated of those are simply slow unsafe reference-counting garbage collectors...

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
  4. Pros and cons by studerby · · Score: 4, Insightful

    As someone who works on long-lived projects with a mid-sized team (a dozen or so developers), I prefer a GC-based language. The biggest pro is the great reduction in memory leaks, closely followed by the productivity increase by not having to think about allocation/deallocation (very much). The biggest con is that far too many "young whippersnappers" seem to think memory allocation/deallocation is therefore "free" in a GC-based language and will take absolutely no care at all about when they allocate (e.g. will allocate a largish object inside a very tight loop instead of allocating it outside and reusing it...). And the 2nd biggest con is that a lot of developers can't believe you can have memory leaks in a GC-based language, won't look for them until you rub their nose in them, and don't really know how to find them when they look.

    --

    .sig generation error:468(3)

    1. Re:Pros and cons by metamatic · · Score: 4, Insightful
      And the 2nd biggest con is that a lot of developers can't believe you can have memory leaks in a GC-based language, won't look for them until you rub their nose in them, and don't really know how to find them when they look.

      I've always thought that the use of the term "memory leak" to describe resource management problems in Java is a really poor choice, as it's quite a different problem from a memory leak in (say) C.

      Keeping memory allocated and referenced for longer than you need it isn't really a leak, to my mind. It's just bad programming. To me, a memory leak is when you lose the pointers to a piece of allocated memory, so the code is no longer able to deallocate it.

      In other words, your developers might give a better answer if you ask "Are there objects you keep around longer than necessary?", rather than "Are there memory leaks?"

      Or maybe I'm the only one.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    2. Re:Pros and cons by Keeper · · Score: 3, Insightful

      Unless the language specifies that you can't have circular references between objects, I would consider that broken: the application can get into a state where orphaned objects are not garbage collected.

      Reference counted garbage collection models are inherantly flawed. Leaks are harder to find and easier to provoke. You might as well not have them if you've got to "delete" the references to the other objects.

      Modern garbage collection algorithms do not have this sort of problem.

      What bothers me about garbage collection is that it only solves part of the problem: memory is not the only resource your application holds onto, and the kludges you have to make to deal with them in garbage collected languages are just annoying (hey, you don't have to worry about cleaning up after an allocation ... unless your object has a handle in it; have fun memorizing which objects you have to twiddle with to "release" the resource). If memory didn't have a different release pattern than other resources it wouldn't be a big deal, but ...

  5. GC v. Direct allocation by Fubar420 · · Score: 2, Insightful

    As a programmer-come-sysadmin, I vote both. Which has its issues all its own...

    When I programmed professionally, I craved the control of memory management. Objects did _exactly_ what was _explicitly_ told to do.

    Now I'm a ruby junkie, and love the OO, GC, Etc.

    Still, yes, for performance reasons, there are good reasons to do it yourself.

    For programming reasons, there are reasons to go GC.

    all in all, GC tends to be great. wouldnt work without it. But there are times I'm mystified as to why an object left scope, got destroyed, etc.

    So I would (as a programmer), like a compromise [and yes, ruby/rails provides this in its own way, but...] All my objects should be GC'd by default. But I want the ability to hook the destructor, and only have it react the way I expect.

    If I want a big block of memory to manage myself, I find an appropriate object for the language (char *, ruby C bindings to a mempool object, unsafe C# (or even safe C#, if you're good), or whatever idiom matches your lang)

    Then again, you dont get pointers in ruby, so I s'pose I'm just whining...

    So from my perspective:
        - Scripters want GC
        - RAM Intensive code needs at least somewhat programmer-managed MManagement
        - Embedded devices need hard kernel memory management
        - Short run applications generally want GC
        - Long running, RAM intensive, frequent paging, or frequently shifted data process generally should go with kernel malloc.

    Cheers.

    --
    -- (appended to the end of comments you post, 120 chars)
  6. Re:Situational by Jerf · · Score: 4, Insightful

    If security was in question, I might opt for manual memory managment

    Wha? The evidence is against you. It's not the GC'ed languages that have buffer overflows, and that's the number one security flaw at the moment (though #2, "improperly escaped strings resulting in spilling across a boundary", i.e., XSS, SQL injection, etc. is coming up on it fast as more people use GC'ed languages).

    If security is an issue, you want GC and automatic buffer management like Java, Python, Perl, what have you, not manual management and the resulting opportunities for misallocation like in C and C++.

    (Yeah, yeah, if you program perfect C++ code it's possible to get it right. But I'm not talking theory, I'm talking about what happens in the real world, and in the real world, there seems to be quite a supply of less-than-perfect C/C++ programmers allocating buffers. You have to be on crack to argue otherwise.)

  7. Re:Getting it backwards by Anonymous Coward · · Score: 1, Insightful

    Huh? You can't do RAII in GC languages because you've no control over when the destructor is called. See the Wikipedia article in the original question.

    Guaranteed cleanup isn't sufficient for RAII. It's no good knowing that the mutex you've allocated will be cleaned up at some point in the future.

  8. Re:If C++ Memory Management by plover · · Score: 1, Insightful

    Buffer overruns are not always the fault of poor memory allocation. Yes, they can be caused by allocating too little memory (or writing too much data), but they can also be caused by doing pointer math incorrectly, incorrect static casting, or one of a dozen other reasons. A garbage collector doing its job is not even related to the problem.

    --
    John
  9. RAII is a bad reason for manual memory management by GileadGreene · · Score: 3, Insightful

    All of the reasons given for manual memory management seem to boil down to a desire to have support for the Resource Acquisition Is Initialization (RAII) idiom, which is hard to pull off in GC languages. But, the alternative idiom Resource Acquisition Is Invocation provides the desired capability in GC languages. Same capability, no chance of memory leaks. So tell me again why manual memory management might be a good idea?

  10. Re:Cocoa and Objective-C by feijai · · Score: 2, Insightful

    As an old Obj-C coder, let me respectfully disagree with this. Reference counting does not handle cycles: that's a huge flaw. It forces Apple to promote a notion of "ownership" of object graphs which only works on a small scale. It does not work well: it merely reduces what you need to keep track of (but doesn't eliminate it) in return for a considerable amount of manual labor.

  11. Explicit management has its own costs by Pseudonym · · Score: 4, Insightful

    The answer, as always, is "it depends". I'm firmly inside the "right tool for the job" camp.

    Manual memory management is not free. In some circumstances, it can be quite expensive. There is a group of programmers who are best described as "rabidly anti-GC". These people are almost all completely unaware of the costs that manual memory management can impose on your code.

    A multi-threaded program, for example, can allocate memory from any arena, but it MUST return a block to the arena from whence it came, which can cause all sorts of difficult lock contention problems, making free() much more expensive than malloc(). (Ask anyone who has written high-performance memory-intensive multi-threaded programs.)

    In some languages, like C, the situation is even worse. In structure-hungry programs, you can end up structuring your code around data lifetimes, which precludes you from using the most natural, maintainable and efficient algorithms. Garbage collection frees you from this, as the GCC people have discovered.

    I do recommend reading Paul Wilson's excellent survey paper on the topic. It answers a lot of your questions, though it's by no means the final word.

    --
    sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  12. Re:Mainly GC but sometimes... by emarkp · · Score: 2, Insightful
    What I don't understand is why they couldn't have written Java (and .NET) so we could have our cake and eat it too.
    They have. Visual Studio 2005 adds syntax to Managed C++ (C++/CLI) to allow you to manage your lifetime and memory separately. Herb Sutter has been talking about this for at least a year IIRC. Dinkumware even made the STL work with it.

    See for instance this article. I'm not currently developing on .Net, but I'm hoping that these extensions can be considered at sometime for standard C++. I'm no MS apologist, but it really does seem to be the best of both worlds.

  13. Re:C has problems too by be-fan · · Score: 4, Insightful

    It's frightening the allusions programmers have about manual memory management. They seem to think that malloc() and free() are cheap functions, when in reality they can take hundreds of clock cycles. They think that malloc() is deterministic, when in reality, a badly fragmented freelist can cause most malloc() implementations to traipse through the entire heap, just like a GC.

    The weirdest thing is C++ programmers. They freak out about every single cycle, but modern C++ idioms push the use of smart pointers, which are usually quite slow compared to a good generational GC.

    --
    A deep unwavering belief is a sure sign you're missing something...
  14. Java GC != No leaks by samjam · · Score: 2, Insightful

    Right now someone I know is trying to track down a Java memory leak.

    No doubt some reference is left in a persistent collection of some sort (hash, list, array, etc)

    Just As C/C++ programmers must remember to free when done, so Java programmers must remember do undo such "life maintaining" references when they are done.

    Sam

    1. Re:Java GC != No leaks by the+eric+conspiracy · · Score: 3, Insightful

      Right now someone I know is trying to track down a Java memory leak.

      Yes, but it is unlikely that somebody you know is trying to track down a Java double free error.

  15. Re:RAII is a bad reason for manual memory manageme by GileadGreene · · Score: 2, Insightful

    As some of the folks at C2 pointed out in the originally linked references, RAIInitialization is a misnomer anyway (as is RAIInvocation - but that was created in response to the other RAII). Most of what we're talking about here is (despite the name) finalization. That's the part that conflicts with GC. No one's saying that we should eliminate initialization or constructors. The point is that using the RAIInit idiom to handle resource finalization is not a good argument for manual memory management, because the RAIInvoc can produce the same finalization effects without producing any conflict with garbage collection.