A Glance At Garbage Collection In OO Languages
JigSaw writes "Garbage collection (GC) is a technology that frees programmers from the hassle of explicitly managing memory allocation for every object they create. Traditionally, the benefit of this automation has come at the cost of significant overhead. However, more efficient algorithms and techniques, coupled with the increased computational power of computers have made the overhead negligible for all but the most extreme situations. Zachary Pinter wrote an excellent article about all this."
A previous poster noted that most GC algorithms are distinctly unfriendly to virtual memory systems. They usually have similar problems with cache locality, which can result in an enormous slowdown, regardless of the time actually spent in the GC itself. A practical problem is that GC regimes are notoriously non-portable, so that each new language implementation needs to have the (increasingly complex) GC re-done again.
A more fundamental problem is that memory is only one of many resources a typical industrial program must manage. GC takes over memory management, but leaves the other scarce resources -- file descriptors, sockets, mutexes, database connections -- to be managed manually, as in C. (Java has this problem, for instance.) "Finalization" simply cannot provide the necessary guarantees.
Given a resource management regime that can handle all these other important resources, as is commonly practiced in C++, memory becomes just another resource. Management is encapsulated the same way for all. A language that lacks the tools necessary to implement such a regime needs GC, so the presence of GC may actually (as in the case of Java) indicate a fundamental weakness in the language.
(Anybody who thinks languages like Haskell or ML are fundamentally more powerful than C++ must be unaware of the Boost Lambda library, and of FC++, a set of header files that implements Haskell language semantics for C++ programs. They get along fine without GC, as well.)
> Is reference counting really that bad?
Refcounting can't collect everything if you have any circular references. It's
what Perl5 has, and we live with it, but Perl6 is getting real garbage
collection (mark-and-sweep I think, or at any rate something more advanced
than refcounting).
Cut that out, or I will ship you to Norilsk in a box.
is sufficient. If you really want to, you can call the garbage collector manually:
LOAD "SIG",8,1
An obvious fault that seems to go with out notice about garbage collectors, particularly stop-and-copy collectors is that when ever they do the full blow stop and copy, they have to touch all of those memory pages, and fault all of your virtual memory back into ram.
That's why people use things like generational garbage collectors. Newer objects are generally shorter-lived, so newer objects get collected more often.
What that means is that generally speaking a generational collecter will usually only be operating in main memory, and it'll only head into virtual memory once in a while. End result? The problem you describe becomes much rarer, and often doesn't occur at all in normal operation.
It was mentionned earlier that reference counting was pretty good, but had a few drawbacks when it came to cycles and multi-threading.
...
I took a bit of time to go and read Wikipedia's page
In the description they give, they mention that reference counting GC can represent managed objects by directed graphs.
I know there exists algorithm to find cycles in such graphs. So I suppose these could be applied to this problem. Other proposal are to use a tracing GC to detect them. To which it was replied that this would be able to reclaim the memory but not to properly finalize the objects. I don't see why that would be true. I mean, if you have found a member of the cycle to be collected, can't you just finalize that one and let the whole cycle unravel itself ? If there are cycles inside that cycle, just do it again on these etc
As I said, another common objection was the cost of updating the counters in multithreaded environnments. Multiple solutions have been proposed, some more portable than others (using processor/platform specific atomic increments, or deferring the update until it is really necessary and using the standard mutex protection)
All this said, I try to understand a couple of things.
-I am no genius, thus these ideas must not be new, what is the problem which can't be solved with these?
-Reference counting seems to integrate better in the runtime of the program. All the other techniques proposed seem to imply some monolithic operation on the memory summing up all the overheads at on time and doing the cleaning once in a while, with the possibility of becoming a bottleneck in heavily loaded systems. Reference counting OTOH seems to allow the cleanup to continually add a little bit of overhead to the system but nothing which will bring the whole thing to a grinding halt before allowing it to go on. What have I missed?
A couple of relevant references for garbage collection are the following website (which unfortunately hasn't been updated for a while - still, it's useful):
The Memory Management Reference
and of course Jones and Lins book, Garbage Collection: Algorithms for Automatic Dynamic Memory Management
Unfortunately, this doesn't force it to run. From the JDK 1.4.2 Javadocs for System.gc():
Note the "suggests". No guarantees...
The Army reading list
It has different collectors, which you can select according to the needs of your application. Currently there are two, the default collector (generational) and an incremental collector which is slower but less likely to pause.
Also, the default collector is a 3-generation one, not 2, at least as of Java 1.4.1. More details here.
I'm ready to believe you're simplifying this deliberately to illustrate functional programming techniques, but I think the simplification here is confusing.
It's important to keep in mind the difference between code routines and closures. The term "function" as is commonly used doesn't respect this difference. C's "functions" are code routines, while ocaml's are closures, i.e. a pair of a routine and an invocation frame.
What's being "created on the fly" are closures, which are like stack frames (storage for local values of identifiers in an invocation), but which:
- are allocated in the heap,
- have a pointer to a "parent" frame (the bindings in the enclosing environment), and
- have unlimited extent (since the invocation of a closure A might return a closure B whose "parent" is A, requiring that A be kept around indefinitely after the call to it returns).
I think the point the original poster is making can be expressed in another way, but one that's more revealing of what's at stake: stack allocation is a form of automatic memory management.In any modern language, there is some form of automatic storage management behind the scenes for function-local storage. Imagine if in C, you had to manually allocate the stack frame of any function you called, and every function had to deallocate its frame before returning. This would be tedious and repetitive. Automatic management of a stack of limited-extent frames provides the programmer a simple (but restricted) way of doing this.
It would be possible to have a functional language where the storage for closures was managed by hand. Imagine a language like C except that it allowed you, when you called a function, to specify a heap-allocated binding to be used in the invocation, instead of a stack frame.
This would be similar to the hypothetical C variant from above, where the programmer was responsible for creating and destroying stack frames. But much harder, since closures have unlimited extent. In the stack allocation case, it's clear when the allocations and deallocations need to happen (before the a function is called, and before one returns). In the manually allocated closure language, the programmer would have to figure out on his own the extent of every closure, and when and where it's safe to free them. This is not simply tedious like in stack allocation, but rather devilishly complex in general.
So, garbage collection solves it.
Are you adequate?