Pros and Cons of Garbage Collection?
ers asks: "Most new programming languages are using garbage collection, rather than programmer-controlled memory management. The advantages are obvious: programmers no longer have to worry about forgetting to delete allocated memory, leading to far fewer memory leaks. The disadvantages are often glossed over by programming language designers - aside from the performance issues, predictable memory management can be used for controlling access to files and similar resources, creating safer thread locking code and even providing better error messages. Some programming languages, which usually predictable memory management, can also be made to behave like they are garbage collected - for example, Boost provides various C++ smart pointer classes. So, given the choice between garbage collection or manual memory management, which would you choose and why? When using a manual memory management language, when do you consider the performance and syntactic overhead of faked garbage collection to be worthwhile?"
The C++ model is basically correct. It doesn't treat the programmer like an idiot (which admittedly may be a problem if you have idiot programmers), and it gives you the choice of how to handle memory allocation. The lack of reference-counted pointer in the standard library is a bit of a bitch, but the Boost shared pointer templates will likely make it into C++0x, and it's only a hundred lines of code or so to make your own in the mean time.
Of course, the C++ model is not perfect either. Lack of virtual and const constructors can be a nuisance (the workaround being the pimpl idiom and a shared pointer), and not being able to use shared pointers to functions without nasty syntactic hackery occasionally breaks the "stuff pretending to be a pointer" illusion. Still, the power it gives over the Java model is definitely worth the occasional bit of extra effort.
Then again, if you're coding some quick scripting hack rather than a proper program, who cares about memory allocation?
It depends on what you are trying to make, duh.
If you are trying to make something where performance is important, like a 3d game, then manage memory yourself. If you are making a simple business application where reliability and security are important, use garbage collection. If your program uses lots of RAM and you need every last drop either find an expert at RAM management to get every last bit or use garbage collection if your programmers are not so awesome.
And so on and so on...
The GeekNights podcast is going strong. Listen!
In general I prefer having a GC because most of the time I don't want to have to worry about memory management... there's no need. However, sometimes it would really be nice to have more direct control. Not being a VM expert myself, it seems like it should be possible (though I can imagine the types of problems that would arise) to allow specifying that you're assuming manual memory control either over certain objects or while inside a particular context.
"Let your heart soar as high as it will. Refuse to be average." - A. W. Tozer
Garbage collection does not equal poor performance. In some instances, it actually speeds things up--when done properly. Take, for example, the D Programming language. It's just as fast as C (faster in some cases) yet it has a garbage collector. The reason is that most programmers tend to not realize that the free() operation actually takes up a decent amount of CPU cycles, and when you're freeing a bunch of little things all over the place, the overhead tends to add up. With a well-designed garbage collector, however, memory is freed all in one big chunk in a single go, and thereby decreasing that overhead. The myth that garbage collection = poor performance is just that, a myth, and most likely started by people who associate Java's performance issues with garbage collection.
Best. Webhost. Ever. Dreamhost.
As someone who works on long-lived projects with a mid-sized team (a dozen or so developers), I prefer a GC-based language. The biggest pro is the great reduction in memory leaks, closely followed by the productivity increase by not having to think about allocation/deallocation (very much). The biggest con is that far too many "young whippersnappers" seem to think memory allocation/deallocation is therefore "free" in a GC-based language and will take absolutely no care at all about when they allocate (e.g. will allocate a largish object inside a very tight loop instead of allocating it outside and reusing it...). And the 2nd biggest con is that a lot of developers can't believe you can have memory leaks in a GC-based language, won't look for them until you rub their nose in them, and don't really know how to find them when they look.
But then, the question is rather ambiguous. Is the writer asking:
I'll leave further interpretation of the writer's words to other posters, as well as more thorough responses.
I've always had the philosophy, "use what makes the job easiest." Typically, this involves garbage-collection. However, one of the biggest problems I have with garbage collection is that you can't have your cake and eat it too. Meaning, you can get all the memory you want, but you can only access it at a high level (think Objects in Java). In C/C++ however, you can call malloc/new, create a big pool of memory (or just a single object), and then do whatever the heck you want with it. But again, as the subject says, it all depends on which method helps get the job done, and so far neither has been perfect for everything.
As a programmer-come-sysadmin, I vote both. Which has its issues all its own...
When I programmed professionally, I craved the control of memory management. Objects did _exactly_ what was _explicitly_ told to do.
Now I'm a ruby junkie, and love the OO, GC, Etc.
Still, yes, for performance reasons, there are good reasons to do it yourself.
For programming reasons, there are reasons to go GC.
all in all, GC tends to be great. wouldnt work without it. But there are times I'm mystified as to why an object left scope, got destroyed, etc.
So I would (as a programmer), like a compromise [and yes, ruby/rails provides this in its own way, but...] All my objects should be GC'd by default. But I want the ability to hook the destructor, and only have it react the way I expect.
If I want a big block of memory to manage myself, I find an appropriate object for the language (char *, ruby C bindings to a mempool object, unsafe C# (or even safe C#, if you're good), or whatever idiom matches your lang)
Then again, you dont get pointers in ruby, so I s'pose I'm just whining...
So from my perspective:
- Scripters want GC
- RAM Intensive code needs at least somewhat programmer-managed MManagement
- Embedded devices need hard kernel memory management
- Short run applications generally want GC
- Long running, RAM intensive, frequent paging, or frequently shifted data process generally should go with kernel malloc.
Cheers.
-- (appended to the end of comments you post, 120 chars)
It's definitely worth checking out before people go spouting off the traditional rants against garbage collection.
Of course, determining which one is best always depends on your application and your available resources, among other things. There are good arguements for both in various situations. I code C++ for embedded devices for a living, which means that I am working with the new/delete/malloc/free model, but for school projects I really like to work with Java, because it lets me focus entirely on implementing an algorithm without having to spend any time thinking about memory allocation or the underlying hardware.
If security was in question, I might opt for manual memory managment
Wha? The evidence is against you. It's not the GC'ed languages that have buffer overflows, and that's the number one security flaw at the moment (though #2, "improperly escaped strings resulting in spilling across a boundary", i.e., XSS, SQL injection, etc. is coming up on it fast as more people use GC'ed languages).
If security is an issue, you want GC and automatic buffer management like Java, Python, Perl, what have you, not manual management and the resulting opportunities for misallocation like in C and C++.
(Yeah, yeah, if you program perfect C++ code it's possible to get it right. But I'm not talking theory, I'm talking about what happens in the real world, and in the real world, there seems to be quite a supply of less-than-perfect C/C++ programmers allocating buffers. You have to be on crack to argue otherwise.)
C++'s constructor/destructor paradigm with predictable object destruction has the benefit of enabling the RAII (Resource Acquisition Is Initialization) idiom. RAII and exceptions greatly simplify resource management in the presence of error handling. Still, even as someone who knows C++ better than I know any other language, I have to admit that for many applications a garbage collected language puts the least mental burden on programmers and produces the fewest memory errors. The burden of arranging all the extra try/catch blocks in Java (because it lacks RAII) has to be weighed against the burden of investigating and fixing memory management errors in C++, and for people using new/delete, Java wins, IMHO.
C++ programmers should be making very little use of new and delete, though; they should be using smart pointers. I think the article poster misunderstands smart pointers. boost::shared_ptr is a reference counted pointer, but std::auto_ptr and boost::scoped_ptr have nothing to do with garbage collection - they certainly aren't "faked garbage collection" and they certainly aren't unpredictable. They use C++'s object scoping and copying mechanisms to manage memory in a way completely unlike garbage collection. scoped_ptr is the simplest and most predictable memory management tool of all. Taking programmer error into account, it's more predictable than using delete. Even shared_ptr is predictable; when the reference count falls to zero, the object is immediately destroyed, not just marked for destruction.
Sadly, although C++ is a very powerful language and can be used to write code with few errors, the language as used by beginners is as dangerous as C, perhaps even more dangerous. It takes programmers years to become proficient in all the methods and idioms that make C++ a usable language.
(I would love to see a language that allows programmers to choose scoped allocation, smart pointer heap allocation, or garbage-collected heap allocation, and uses types to avoid dangerous combinations such as garbage-collected objects pointing to scoped objects or an object pointing to an object in an unrelated scope. Every object would have two types - the object type (int, file, circle, etc.) and the memory management type (scoped with scope S1, scoped with scope S2, garbage-collected, etc.))
Most new programming languages are using garbage collection
You mean like Lisp and Smalltalk? ;-)
The advantages are obvious: programmers no longer have to worry about forgetting to delete allocated memory, leading to far fewer memory leaks.
In other words: the computer is perfectly capable of figuring out what to do, so let it! This is almost always the best thing.
When using a manual memory management language, when do you consider the performance and syntactic overhead of faked garbage collection to be worthwhile?
I haven't used manual memory management in years, except for a couple old C programs that I still maintain (and those are written to be as easy to "visually inspect" as possible.. allocate, use, de-allocate in three separate lines).
There's rarely good reason to be tinkering around with pointers and other "implementation details". I prefer using very high-level languages (like Haskell for instance) that allow me to express problems and solutions as directly as possible, rather than having to deal with implementation details that only resemble the problem domain from a distance.
I suppose there are times when you need to override the memory management, so there should definitely be "hooks and hints" where appropriate. And even the highest-level language shouldn't free you from the burden of understanding how computers work. However, I personally haven't needed to know about low-level details in *long* time, not since the Pentium era began at least..
Pros and cons of garbage collection?
If you don't CONS, you never need to collect garbage. *rimshot*
More seriously, GC isn't so much about pros and cons, as it is about tradeoffs between the various GC algorithms: time vs. space, low-latency vs. high-throughput, parallelism, etc.
If you're designing a new language, it should include garbage collection, or nobody will use it (i.e., your target audience can already program in C). You may wish to have multiple GC implementations available for different purposes, perhaps to be selected at compile-time.
For a good overview of what's available, see http://www.memorymanagement.org/
My personal favorite is the good old Cheney semi-space collector (and Ephemeral/Generational Garbage Collectors, which are more advanced versions designed to generally have low latency), as it is very straightforward (both to understand and to implement), compacting (it defragments memory, and can perhaps improve cache locality by grouping related objects), and it has high throughput (work is proportional to the amount of live data, not total data).
If memory usage is of more concern than fragmentation and throughput, a mark-sweep collector may be more your style.
There are also "real-time" (and "soft-real-time", i.e. bounded latency [see Henry Baker's Treadmill]) collectors, parallel collectors [including an interesting case for reference counting, usually considered a dog performance-wise, as a viable parallel/remote GC method], "conservative" collectors for C/C++ (see Hans-J Boehm's libgc), collectors for real and hypothetical computers with special hardware and/or OS support for GC features, and some collectors that are just plain weird.
Note also that garbage collection algorithms are considered hard to measure for performance, especially with regard to wall-time latency, so just because a paper(*) claims that a certain GC has certain performance characteristics, be sure to benchmark if it really matters.
(*) Did I mention papers? If you're serious about implementing GC, getting comfortable reading CS research papers is a must. The book "Garbage Collection" is your best friend here, as it provides a very good overview/survey of said papers and algorithms, and it discusses a lot of pros and cons between various algorithms, and useful variants or adaptations that have been applied to previously-published work.
Also check out Henry Baker's papers, because he is a memory management demigod: http://home.pipeline.com/~hbaker1/home.html.
People seem to have confused Memory Access with Memory Allocation. Neither GC nor PC (programmer collected) should allow memory accesses on out-of-scope data. GC just delays when that out-of-scope data gets freed for reallocation. Those unfamiliar with GC and used to the unreallocated = still-accessible-data situation of improper PC coding think that in the GC world unreallocated means still-accessible, which is not necessarily nor usually the case.
Apple / NeXT takes a reference counting approach. It is not automatic, but it works well once you understand the rules.
All of the reasons given for manual memory management seem to boil down to a desire to have support for the Resource Acquisition Is Initialization (RAII) idiom, which is hard to pull off in GC languages. But, the alternative idiom Resource Acquisition Is Invocation provides the desired capability in GC languages. Same capability, no chance of memory leaks. So tell me again why manual memory management might be a good idea?
The answer, as always, is "it depends". I'm firmly inside the "right tool for the job" camp.
Manual memory management is not free. In some circumstances, it can be quite expensive. There is a group of programmers who are best described as "rabidly anti-GC". These people are almost all completely unaware of the costs that manual memory management can impose on your code.
A multi-threaded program, for example, can allocate memory from any arena, but it MUST return a block to the arena from whence it came, which can cause all sorts of difficult lock contention problems, making free() much more expensive than malloc(). (Ask anyone who has written high-performance memory-intensive multi-threaded programs.)
In some languages, like C, the situation is even worse. In structure-hungry programs, you can end up structuring your code around data lifetimes, which precludes you from using the most natural, maintainable and efficient algorithms. Garbage collection frees you from this, as the GCC people have discovered.
I do recommend reading Paul Wilson's excellent survey paper on the topic. It answers a lot of your questions, though it's by no means the final word.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
I agree with a comment posted elsewhere that unchecked memory access and manual memory management, while both unsafe and fertile sources of errors, are different beasts.
Manual memory management is a control issue. Unchecked memory access is a matter of asceticism.
Buffer overruns happen because the devotion to performance and minimalism among a certain crowd is religious. Because of this, the C++ standards guys were terrified of encouraging the use of anything slower and safer than what a C programmer would do. In std::vector and boost::array the default (and readable) access operator [] is unchecked, and checked access is relegated to the ugly at() method. This is a political decision that turns on its head the principle that quiet syntax should be used for routine, safe operations and ugly syntax should be used for rare, dangerous operations.
Using template metaprogramming, it's easy to produce your own array classes with bounds-checking chosen at compile time, but because the survival of C++ depended on the "just as fast as C!" slogan, unchecked access was officially encouraged and has unsurprisingly become the norm among C++ developers.
Rationally, even a developer who expects that bounds checking will always be too expensive in production should insist that bounds checking be easily enabled at compile time. Electric Fence helps with that, but template metaprogramming is a more straightforward solution.
Not using GC requires that you to write code to free those resources repeatedly. That goes against the DRY (Don't Repeat Yourself) principle.
I wonder how many of the people who use the "C++ model" bother to unit test that they have freed all their resources.
C memory management is not completely deterministic either, since a fragmented heap will not always take the same amount of time to allocate in. To make it completely deterministic you would have to pre-allocate objects. But if you're going to do that, you could do it in a GC language and turn the GC off.
Having used languages with and without garbage collection, my view is that
/either/ garbage collection or C-style memory management when I am using it. (And of course, you can use either of those with C++ just fine.) There are all kinds of tools and patterns to help you out, although again it's going to be up to you to use them and make a habit of it. There's the "resource aquisition is initialization", smart pointers (built into the standard library, provided by Boost, and roll-your own), new and delete, malloc() and free(), statically allocated memory, and garbage collectors for you to link in. Take your pick! Use the right one for the job.
g e-collectionr y-leaksc ed
garbage collection is often very nice... but I don't really mind the "lack" of garbage collection in C and especially don't miss it in C++.
My opinion is that it takes some effort on the programmer's part to learn to use C safely. I'm not sure why, but this answer seems to suprise some people. Do they seriously expect that in the real world-- of software or of anything else-- that they should be able to pick up any tool they want and understand how to use it well right away? Appearantly so, but it's pretty silly, isn't
it?
C++ and C are like very sharp carving gouges: It's up to you to learn how to use them properly, to hone them peridically, and to build safe habits. You need to do this in any language that you use, but the sharper the tool the more important it is to bear this principle in mind. A lot of people try to pick up these sharp tools, and then blame the gouge when they've cut themselves. It's no suprise, because they did not take the time to learn to be careful.
In my view your best bet for using a programming language safely is to develop
certain habits and adopt idioms that cause you to simply not write code with some types of errors. In the case of resource-freeing errors, when you open a door, you must remember to close it. It's not complicated, but you have got to turn it into a real habit, something you just do-- and it follows that there will also be some patterns that you should train yourself to just never write. Give it a go (and give it some time).
That will go a long way, but I'm afraid that there's some bad news: even that is not enough. I've been writing software in C for 17 years now, and although without a doubt I write fewer errors than I used to, I still do not produce error-free programs. Eventually, I make plenty of mistakes. Complexity does that.
Some very good news: Memory management in C++ is so nice that very frankly I do not miss
Bjarne Stroustrup explains all of this extremely well. Here's a link or two to his FAQ and what it has to say about C++ and garbage collection:
http://www.research.att.com/~bs/bs_faq.html#garba
http://www.research.att.com/~bs/bs_faq2.html#memo
http://www.research.att.com/~bs/bs_faq.html#advan
Think about it for a while. Remember that memory is only one kind of resource that we need to manage. It's up to you to ferret out and apply the tools available to you.
If you want a garbage collector, why not use one? But it's not the only way to achieve the ultimate goal of producing reliable, efficient software.
>> you can write "destructor's" in a garbage collected language
You mean like Java's Object.finalize()?
The same one that causes significant performance problems fundamental to how GCs work, and is not guaranteed to execute in any specific order, or even at all?
I prefer garbage collection. At most, I take the cans to the edge of the driveway and some guy in a noisy truck with a cool robotic arm just hauls it away. Yeah, there is a landfill somewhere that isn't good for the overall environment but I accept that tradeoff. I also don't throw old car batteries into the trash.
Sure the hell beats me keeping the trash around, remembering where it is, and putting it in my truck and hauling it to the heaping landfill myself. I'm not here to manage trash, I'm here to get something done.
Is this post about programming?
For cases where static analysis can't do this automatically, it isn't that hard to use a design methodology that achieves the same result; it's certainly still much easier than doing manual allocation and deallocation and ensuring that the deallocation is done (or not done) correctly in all cases.
And if you are using a reference-counting GC, or a hybrid GC that includes reference-counting, you don't have to do anything special at all.
The same applies to the claimed mutex and error message disadvantages, since those are just specific uses of RAII.
Right now someone I know is trying to track down a Java memory leak.
No doubt some reference is left in a persistent collection of some sort (hash, list, array, etc)
Just As C/C++ programmers must remember to free when done, so Java programmers must remember do undo such "life maintaining" references when they are done.
Sam
blog.sam.liddicott.com
This is a common claim, but it is an apples to oranges comparison. No one (including the compiler) dynamically allocates objects in C/C++ when they can place them on the stack instead. Garbage collected languages like Java, on the other hand, require practically everything to be managed on the heap.
In addition, an array of objects on the heap requires only a single memory allocation in C or C++, where Java has to allocate and track each separately. As one luminary once said, "C++ is better because there is less garbage to collect."
That might be acceptable, but the worst part is random application pauses of arbitrary duration for garbage collection. Unless that problem can be resolved, garbage collected languages will be always be a poor match for latency sensitive applications, even where the net throughput is otherwise adequate.
A paper I've found interesting is on a GC which communicates with the VM systems to avoid putting too much load on the VM system.
It needs a modification of the VM, but IMHO this is better than having to handtune the memory used by the GC. (Note: I'm not an expert in GC)
http://www.cs.umass.edu/~emery/pubs/04-16.pdf
In lisp that would be using the unwind-protect macro.
Notice that this is totally unrelated with memory management.
So yes, it IS possible to have our cake and eat it too.
We are Turing O-Machines. The Oracle is out there.
You're absolutely correct, however...
You don't usually use unwind-protect directly. You use a macro that uses it underneath. Like the with-open-file macro or the with-database macro of CLSQL.
Like this:
(with-open-file (stream "/some/file/name.txt")
(format t "~a~%" (read-line stream)))
You can never forget to close the file in that code.
Having to use unwind-protect directly is just another case of the "human compiler" pattern at work. It's very unusual to have to use database access or file access or something like that just once in your program, so writing a macro is a good investment.
We are Turing O-Machines. The Oracle is out there.
As some of the folks at C2 pointed out in the originally linked references, RAIInitialization is a misnomer anyway (as is RAIInvocation - but that was created in response to the other RAII). Most of what we're talking about here is (despite the name) finalization. That's the part that conflicts with GC. No one's saying that we should eliminate initialization or constructors. The point is that using the RAIInit idiom to handle resource finalization is not a good argument for manual memory management, because the RAIInvoc can produce the same finalization effects without producing any conflict with garbage collection.