A Glance At Garbage Collection In OO Languages
JigSaw writes "Garbage collection (GC) is a technology that frees programmers from the hassle of explicitly managing memory allocation for every object they create. Traditionally, the benefit of this automation has come at the cost of significant overhead. However, more efficient algorithms and techniques, coupled with the increased computational power of computers have made the overhead negligible for all but the most extreme situations. Zachary Pinter wrote an excellent article about all this."
> By "correctly," I'm specifically leaving out memory leaks.
What a thing to leave out. Memory leaks are one of the hardest-to-track-down
and most annoying kinds of bugs that we perpetually see in application after
application. Okay, crashes are more annoying and pervasive, sure. And
buffer overruns (which are not a problem in most languages that have GC,
albeit GC is not the reason they're not a problem). But memory leaks are
high on the list.
> And in functional programming, you're creating functions on the fly.
I'm trying to imagine a programming language that doesn't let you create
functions on the fly but is powerful enough for writing real applications.
The only thing I can come up with is that you could write what basically
amounts to an interpreter so that you wouldn't have to write "functions"
in the implementation language but could write them in the interpreted
language instead. But that seems like a really ugly hack, just to avoid
including real memory management in the compiler/interpreter/vm/whatever.
It is possible to get around the need for closures (i.e., anonymous routines
that hold references to otherwise-out-of-scope lexicals), if you have a
sufficiently powerful object system. But again, it seems like a questionable
goal; sometimes closures are really the most convenient way to accomplish
something. (Sometimes they're not, of course... that's why I favour
multiparadigmatic languages.)
> So for all those languages, it's not an "ease of use" thing. It's a
> "there's no way for a programmer to do even do it manually at all" thing.
> GC is the only option.
Strictly *theoretically*, the programmer can do all that stuff in any
Turing-complete language; it's possible to do functional programming in
8086 assembly language, for example, if you're willing to go far out of
your way to do it. But in practice, neither assembly language nor C
really makes that easy or practical, no. But then, there are actually
quite a lot of things that those languages don't make easy or practical.
Cut that out, or I will ship you to Norilsk in a box.
It might be useful if some languages had an optional method of hinting that an object should be garbage collected soon. This would help in languages like Java where you get a huge amount of data stored and then all at once the disk thrashes as it GC everything. For some algorithms, it would be nice to tell Java ahead of time that you're done with the object and you're not going to reference it anymore. The nice thing is though, it wouldn't be a requirement, so you wouldn't have to worry about deleting an object still in use by mistake. I wonder how efficient this would be.
Reference counting can interact nicely with multithreading on modern (post `96) hardware - most modern CPUs have this nice "compare-and-swap" atomic operation, which can be used to manage refcounts without any form of locking. Yes, it is a little less efficient and a little more intricate, but it's doable; In Windows, for example, it's called "InterlockedIncrement()" and "InterlockedDecrement()".
Also, in many environments you DON'T modify the reference count every time you copy a pointer; there's a concept called "borrowed references" which is used in Python, COM, and many other ref count schemes to avoid some useless refcounts.
Python (pre 2.0) used to do only refcount, and did it much better than Java (using GC) in all respects except thread friendliness. Modern python (2.0 and beyond) does both -- but it's extremely rare for the gc to be needed at all.