A Glance At Garbage Collection In OO Languages

← Back to Stories (view on slashdot.org)

A Glance At Garbage Collection In OO Languages

Posted by timothy on Tuesday April 27, 2004 @02:59PM from the go-on-strike dept.

JigSaw writes "Garbage collection (GC) is a technology that frees programmers from the hassle of explicitly managing memory allocation for every object they create. Traditionally, the benefit of this automation has come at the cost of significant overhead. However, more efficient algorithms and techniques, coupled with the increased computational power of computers have made the overhead negligible for all but the most extreme situations. Zachary Pinter wrote an excellent article about all this."

12 of 216 comments (clear)

Min score:

Reason:

Sort:

An Obvious Fault by vandel405 · 2004-04-27 15:02 · Score: 2, Interesting

An obvious fault that seems to go with out notice about garbage collectors, particularly stop-and-copy collectors is that when ever they do the full blow stop and copy, they have to touch all of those memory pages, and fault all of your virtual memory back into ram.
1. Re:An Obvious Fault by 10101001+10101001 · 2004-04-28 00:01 · Score: 2, Interesting
  
  That's not how big O notation works. It's actually defined as:
  
  If O(n log n), then
  
  c*f(n)+k < n log n
  
  defines the worst case scenario for time where c and k are constants and f is the function being used. n is always the number of elements being sorted (for sort algorithms). So, the issue is what k and c are in each of the various algorithms. It might be that c and k are huge for heap/merge sort, but with quicksort as O(n^2), it'd take either a hugely massive c and/or k or really small n for quicksort to always be faster than heap/merge sort.
  
  Of course, as was pointed out, quicksort is more trivially to write, is generally n log n performance, and apparently has a lower c value (IIRC, heapsort has a c of 2). So, people use quicksort even though quicksort isn't guaranteed to always have the best time. Heck, people still use bubblesort (which is fine for really small n's as bubblesort's k is really small). Personally, I'd rather sort algorithms (and gc algorithms) be stuffed into system wide libraries and possibly let an outside function chose which to use; gmp which uses several different methods for doing bignum integer math is a good example of just having a library choose the right algorithm for the job; it'd seem smart to have an equivalent sort algorithm which was based on n and either worst or average case as chosen by the programmer.
  
  Big O Notation
  
  --
  Eurohacker European paranoia, gun rights, and h
Re:Under the Rug by sfjoe · 2004-04-27 16:33 · Score: 2, Interesting

...languages dependent on garbage collection have always failed to find much deployment in industrial settings.

Huh? The world's busiest e-commerce websites are largely written in Java. Just what is your definition of "industrial settings"? If you mean that Java isn't used much in a foundry, then I guess you're right.

--
It's simple: I demand prosecution for torture.
Re:Sigh. It's not a "feature" of other languages.. by jonadab · 2004-04-27 16:53 · Score: 3, Interesting

> By "correctly," I'm specifically leaving out memory leaks.

What a thing to leave out. Memory leaks are one of the hardest-to-track-down
and most annoying kinds of bugs that we perpetually see in application after
application. Okay, crashes are more annoying and pervasive, sure. And
buffer overruns (which are not a problem in most languages that have GC,
albeit GC is not the reason they're not a problem). But memory leaks are
high on the list.

> And in functional programming, you're creating functions on the fly.

I'm trying to imagine a programming language that doesn't let you create
functions on the fly but is powerful enough for writing real applications.
The only thing I can come up with is that you could write what basically
amounts to an interpreter so that you wouldn't have to write "functions"
in the implementation language but could write them in the interpreted
language instead. But that seems like a really ugly hack, just to avoid
including real memory management in the compiler/interpreter/vm/whatever.

It is possible to get around the need for closures (i.e., anonymous routines
that hold references to otherwise-out-of-scope lexicals), if you have a
sufficiently powerful object system. But again, it seems like a questionable
goal; sometimes closures are really the most convenient way to accomplish
something. (Sometimes they're not, of course... that's why I favour
multiparadigmatic languages.)

> So for all those languages, it's not an "ease of use" thing. It's a
> "there's no way for a programmer to do even do it manually at all" thing.
> GC is the only option.

Strictly *theoretically*, the programmer can do all that stuff in any
Turing-complete language; it's possible to do functional programming in
8086 assembly language, for example, if you're willing to go far out of
your way to do it. But in practice, neither assembly language nor C
really makes that easy or practical, no. But then, there are actually
quite a lot of things that those languages don't make easy or practical.

--
Cut that out, or I will ship you to Norilsk in a box.
a way to give the GC a hint? by Doppler00 · 2004-04-27 17:20 · Score: 3, Interesting

It might be useful if some languages had an optional method of hinting that an object should be garbage collected soon. This would help in languages like Java where you get a huge amount of data stored and then all at once the disk thrashes as it GC everything. For some algorithms, it would be nice to tell Java ahead of time that you're done with the object and you're not going to reference it anymore. The nice thing is though, it wouldn't be a requirement, so you wouldn't have to worry about deleting an object still in use by mistake. I wonder how efficient this would be.
Feels like by nate+nice · 2004-04-27 18:35 · Score: 2, Interesting

I feel like I just read a small section in the memory management section of an operating systems or programming languages text book. I'm not sure what to discuss here, no knew ideas were expressed or presented here. Perhaps the author could have postulated new ideas for memory management or suggested how current ideas could be improved. Interesting read if you're a programmer who never really got into the mechanics of a programming language and what certain runtime systems do to make your program work. Then again, I would probably call you a strict-scripter and when scripting you're generally more concerned with expressions rather than mechanics.

Although, the point the author made about CPU's being cheaper and faster and how this is allowing the programmer to care less and less about mechanics so the can make use of this extra power to make programming a more expressive rather than mechanical practice is interesting.

Personally, I see no problem with one day having high level application programmers who know nothing of hex, memory management or physical hardware but rather algorithms, computability and productions, etc. Of course, there will always be a place for the "computer programmer", but also a place for the "analytical abstractionist engineer".

--
"If you are a dreamer, a wisher, a liar, A hope-er, a pray-er, a magic bean buyer ..."
Re:Reference Counting... by Circuit+Breaker · 2004-04-27 20:00 · Score: 3, Interesting

Reference counting can interact nicely with multithreading on modern (post `96) hardware - most modern CPUs have this nice "compare-and-swap" atomic operation, which can be used to manage refcounts without any form of locking. Yes, it is a little less efficient and a little more intricate, but it's doable; In Windows, for example, it's called "InterlockedIncrement()" and "InterlockedDecrement()".

Also, in many environments you DON'T modify the reference count every time you copy a pointer; there's a concept called "borrowed references" which is used in Python, COM, and many other ref count schemes to avoid some useless refcounts.

Python (pre 2.0) used to do only refcount, and did it much better than Java (using GC) in all respects except thread friendliness. Modern python (2.0 and beyond) does both -- but it's extremely rare for the gc to be needed at all.
Re:Under the Rug by DavidTurner · 2004-04-27 20:37 · Score: 2, Interesting

Please explain to me exactly how you can implement a resource management system (or regime as you call it) in C++ for, lets say, managing socket connections, that has no equivlent in Java. You are aware of this method, right?

Okie dokie (pardon the bad formatting):

class TCPSocket { int handle_; public: TCPSocket() { handle_ = socket (AF_INET, SOCK_STREAM, 0); } ~TCPSocket() { close(handle_); } };

I think you'll find that there's no equivalent to the above in Java.

This may prove to be good reading, if you are interested in learning more.
Re:The GC pitfall by StrawberryFrog · 2004-04-27 21:33 · Score: 2, Interesting

there are a lot of Java coders who don't think at *all* about memory management, because they think it's all handled for them. Mix that in with an over-excitement about OO, and you get some impressively slow and non-scaleable code.

While you are entirely right, this is no differnt from previous generations of programming languages. You always do better if you have a bit of understanding of the wiring behind the board.

I'm sure that there were objections to high-level languages by assembler coders who objected that "there are a lot of C coders who don't think at *all* about the assember generated" and "there are a lot of C++ coders who don't think at *all* about the pointers behind those object refs"

--
My Karma: ran over your Dogma
StrawberryFrog
Re:Sigh. It's not a "feature" of other languages.. by jonadab · 2004-04-28 05:55 · Score: 2, Interesting

> Any compiled language by definition can't create functions on the fly

This is flat-out false. There are various compiled languages (compiled as
in compiled to native machine code, yes) that not only allow creating functions
on the fly but actively encourage it. Common Lisp is just one example. Yes,
garbage collection gets compiled in. (This is no weirder than compiling a
memory-management library into a C program, and actually being standardized
is an advantage.)

Besides that, the whole compiled-versus-interpreted-languages argument is
getting fairly blurry these days. It's no longer as simple as C and C++ on
the one extreme, which take hours to compile and then run on systems that
don't even have a compiler, and BASIC on the other extreme where you can stop
the program while it's running, change some variables and maybe some lines of
code, and set it running again (possibly at a different line) in-progress
with the state intact. There are all kinds of in-between cases now, Perl
and Java and Python and so on, which technically are both compiled and
interpreted or neither or somewhere in-between. Java runs on a virtual
machine, okay, and Perl6 will, but what do you do with Perl5 and others like
it, which don't really run on a vm per se but have separate compile-time and
run-time phases yet allow more code to be compiled later at run time (through
eval and things like it), ... and then there's JIT compilation... and then
you have compilers that take languages designed to compile to a virtual
machine and instead compile them to native machine code for a specific
platform...

--
Cut that out, or I will ship you to Norilsk in a box.
Re:Under the Rug by Bloater · 2004-04-28 08:21 · Score: 2, Interesting

Building a nifty little lock management facade *does* make the problem go away. That is precisely the point. The user of that interface cannot even *compile* a program that is not safe with respect to it (Not to say that C++ is any good in other respects, of course).

The only real ways to do this in Java are twofold:

1) Use synchronize, and only return the accessor object if the appropriate monitor is held. Then the accessor object throws an exception for each method if you use it without holding that monitor. But there is no reasonable way to implement a cache within that accessor object. There may be a way with several waiting threads and using notify in a complex way that will probably deadlock - but that is hardly the goal of using an object oriented design.

The problem is that leaving the synchronize block should cause all cached data in the accessor to be flushed to the *real* object - this is often necessary to guarantee performance when the main memory bus is significantly slower than the on-die cache. Furthermore, using a reference to the accessor outside of the synchronize block in which it was created should not only be prohibited - it should be inhibited.

2) Continuation passing style. But to get that inline one define a subclass inline, which means that any access to local variables requires that they be final. That is not always desirable, and frequently undesirable. This *could* be alleviated in Java with some simple changes (such as allowing reference to non-final variables if it can be guaranteed that there will remain no references to the object after return, but it is still quite ugly conceptually.

I also don't believe that the alternative of folding the cache into the real object lazily when another accessor is created is workable without an eccessive level of synchronization, possibly causing immense slowdowns on NUMA architectures.

So while, in C++, the programmer of the interface manages the resources, the interface they produce causes the user of the interface not to have to worry about it. Without that, the user of the interface must be very cautious of their use, and consider far more exceptions. And probably have to take bug reports from end-users that would *not* happen with the C++ method (as long as they don't use "&", see below).

This post is not intended to mark GC as bad, as there *may* be a solution. An object can be explicitly deleted at a given moment by waiting for notification performed in the finalize method - if the runtime guarantees to GC the object early when waiting for that notification. That is, however, difficult to make both safe and strict in Java, since the wait may be a long one, and it can be hard to statically prove that there are no other references, or to inhibit their creation. C++ also cannot utterly inhibit the creation of references to the accessor object that exists beyond its lifetime, but that is a problem with allowing pointers.

An unfortunate choice in C++ is that the register keyword is nothing more than an optimisation hint, while in C it inhibits the address-of ("&") operator.
An unfortunate choice in Java is that references are just C++ pointers with no arithmetic operations - only the "dereference and resolve member" operator "." ("->" from C++) that can throw an exception, "==", and crucially, assignment.
Re:The extreme situations are the only ones ... by Tom7 · 2004-04-28 08:44 · Score: 2, Interesting

Any application that can tolerate garbage collection is trivial. Thanks anyway -- I'll stick to C and assembly.

Wow, with this attitude I can see why you are worried about keeping your job.
Did you know that GCC uses a garbage collector? They found it too difficult to manually manage memory. Is GCC a trivial application?
In my program we write loads of decidedly non-trivial software all of the time that not only tolerates, but benefits greatly from GC.

Garbage collection is not appropriate for every task, but to assert that all tasks worth doing demand C and assembly is ridiculous.