Slashdot Mirror


A Glance At Garbage Collection In OO Languages

JigSaw writes "Garbage collection (GC) is a technology that frees programmers from the hassle of explicitly managing memory allocation for every object they create. Traditionally, the benefit of this automation has come at the cost of significant overhead. However, more efficient algorithms and techniques, coupled with the increased computational power of computers have made the overhead negligible for all but the most extreme situations. Zachary Pinter wrote an excellent article about all this."

17 of 216 comments (clear)

  1. Under the Rug by Markus+Registrada · · Score: 4, Informative
    As with most such presentations, this article sweeps under the rug most of the reasons why languages dependent on garbage collection have always failed to find much deployment in industrial settings.

    A previous poster noted that most GC algorithms are distinctly unfriendly to virtual memory systems. They usually have similar problems with cache locality, which can result in an enormous slowdown, regardless of the time actually spent in the GC itself. A practical problem is that GC regimes are notoriously non-portable, so that each new language implementation needs to have the (increasingly complex) GC re-done again.

    A more fundamental problem is that memory is only one of many resources a typical industrial program must manage. GC takes over memory management, but leaves the other scarce resources -- file descriptors, sockets, mutexes, database connections -- to be managed manually, as in C. (Java has this problem, for instance.) "Finalization" simply cannot provide the necessary guarantees.

    Given a resource management regime that can handle all these other important resources, as is commonly practiced in C++, memory becomes just another resource. Management is encapsulated the same way for all. A language that lacks the tools necessary to implement such a regime needs GC, so the presence of GC may actually (as in the case of Java) indicate a fundamental weakness in the language.

    (Anybody who thinks languages like Haskell or ML are fundamentally more powerful than C++ must be unaware of the Boost Lambda library, and of FC++, a set of header files that implements Haskell language semantics for C++ programs. They get along fine without GC, as well.)

    1. Re:Under the Rug by WayneConrad · · Score: 4, Informative

      GC takes over memory management, but leaves the other scarce resources -- file descriptors, sockets, mutexes, database connections -- to be managed manually, as in C.

      Ruby has an interesting approach using closures to handle manual resource allocation. One calls the function that allocates a resource, passing it a closure. The function allocates the resource, calls the closure, and then deallocates the resource (even if an exception occurs). Here's how you might write to a file the manual way (I apologize for the lousy formatting; I don't know how to trick /. into indenting):

      file = File.new("foo")
      file.puts "My mistress's eyes are nothing like the sun"
      file.close

      That's the usual way, easy to get wrong: What if an exception occurs? What if I forget to call close? Here's the better way, calling File.open and passing it a closure:

      File.open("foo") do |file|
      file.puts "My mistress's eyes are nothing like the sun"
      end

      File.open might use this common idiom:

      def File.open(filename)
      file = File.open(filename)
      begin
      yield(file)
      ensure
      file.close
      end
      end

      The "yield" calls the closure that was passed in, passing it the file object. The "begin...ensure" is like Java's "try...finally" construct, used here to make sure that the file gets closed whether the closure terminates normally or raises an exception.

      This idiom doesn't solve all manual resource allocation/cleanup problems, but it's a pretty way to solve some of them.

      I don't think Ruby invented this idiom, but I don't know where it came from. Perhaps Lisp: Everything seems to have come from Lisp.

    2. Re:Under the Rug by BCoates · · Score: 4, Informative
      That sounds like the way a C++ destructor is used with the "Resource Acquisition is Initialization" model. You'd open a file by creating an object on the stack, and the destructor would close the file-handle once control returns (or the object is deleted, if on the heap)
      // some_file_object is a hypothetical file i/o object with manual open(), close(), write(), etc. functions

      class File : public some_file_object {
      public:
      File( const std::string & fname ) : m_handle( open(fname) ) {}
      ~File() { close(m_handle); }
      private:
      const handle m_handle;
      };
      It's sort of inside-out relative to the ruby version becuase it doesn't use the closure, but the useage is near-identical:
      {
      File file( "foo" );
      file.write( "There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy." );
      } // close happens here, or at the throw/return/break/continue site, if any
      new/delete just being another open/close pair to be avoided or contained away in a small object when practical, so it reduces the benefit gained from GC use.
    3. Re:Under the Rug by DavidTurner · · Score: 3, Informative

      Well put!

      Another important consideration is that where the programmer has the expectation that his garbage will be cleaned up for him, he will tend to assume that all of his resources will be cleaned up. This is clearly not the case. The seminal example for me was the use of database query result sets in C# - if you don't explicitly close them, they tend hang around, and the next time you try to perform a query on the same connection, you'll as likely as not get an exception. Surprise!

      Also, as some other posters have pointed out, not all garbage collection is automatically bad. It works pretty well in Lisp, Scheme, Haskell, and friends. However in the cases of Java and C# it is certainly detrimental, as it disables the only really effective mechanism for managing resources in those languages - destructors.

      Perhaps the thing to do is introduce an analogue for Lisp unwind-protect mechanisms. I suggested this a while back on the Java community forums, in the form of having sentry objects with the lifetime semantics of C++ automatic objects. Someone made the suggestion that the volatile keyword could be twisted to serve this purpose.

    4. Re:Under the Rug by DavidTurner · · Score: 2, Informative

      Java doesn't have destructors. It has finalizers, but they are called at unpredictable and occasionally awkward times (if at all). This presents a whole new challenge in terms of establishing invariants.

      Consider, for example, the case of a mutex. In C++ I can encapsulate the acquisition and release of the mutex in an object (for an example of this, see boost::mutex::scoped_lock at www.boost.org), and know for certain that I will own the mutex object for the duration of the scope of the locking object. This invariant is maintained even in the presence of exceptions, break or continue statements, early returns, or other surprises.

      The net effect is that the code is (a) robust, (b) maintainable, and (c) easy to read.

      Contrast:

      mutex::scoped_lock lock(name_list_update);
      if (!should_insert) break;
      name_list.insert("Bjarne");

      with:

      name_list_update.lock();
      if (!should_insert) {
      name_list_update.unlock();
      break;
      }
      try { name_list.insert("Bjarne"); }
      finally { name_list.unlock(); }
    5. Re:Under the Rug by hding · · Score: 2, Informative

      In Lisp a pretty common way to handle this sort of thing is with a with-some-resource macro (with-open-file is a commonly used built-in). This sort of macro will typically take information for dealing with the resource (e.g. in the file example the pathname of the file, options for opening it, etc.) and the body of code to be executed with the resource; it will then expand to code that acquires the resource and then uses it and releases it (the latter two typically wrapped in unwind-protect to assure the release should there be an exit via a condition or whatnot).

    6. Re:Under the Rug by arkanes · · Score: 2, Informative
      Sun is being misleading. You can do that, but it's a lousy idea. Read up on when finalization occurs (the relevant docs were posted in part elsewhere in this thread). Basically, it's run when the object is GCed, not when it goes out of scope (which makes it useless for RAII), and it's not guaranted to be run at all. Finalization is not the same as destruction, and can't be used for the same purposes. Simple as that.

      I'll try to clear it up a little more with a better example. Say you're locking a file. In C++, you'd create an object to control the lock.

      class locker {
      locker() {lockfile();}
      ~locker() {unlockfile();}
      };
      function WriteFile() {
      locker l;
      ....
      }
      The file is locked when the function is entered, and is unlocked when the function is left, no mater how it's left - early return, exception, anything. There is no way to leave that function scope with the file locked. You can't do anything like that with Java, because it has no automatic objects and non-deterministic finalization.
    7. Re:Under the Rug by Gaijin42 · · Score: 2, Informative

      the using keyword in c# handles exactly this situation.

      This code :

      using(DisposableObj x = new DisposableObj())
      {
      x.DoSomething();
      }

      is equivilent to

      DisposableObj x = new DisposableObj();
      try
      {
      x.DoSomething();
      }
      finally
      {
      x.Dispose();
      }

      So as long as the object's author put any "close/release" code into the dispose method, it will get handled automatically when you are done with it.
      (Even if an exception occurs!)

      Most c# objects that have dispose() also have a close()

      For example, databases. You can close() them and then reopen them multiple times. If you call dispose() and the object is open, it closes for you.

      (So what, I still have to call dispose manually!)

      And thats the purpose of the using(){} syntax. You dont have to remember anymore.

    8. Re:Under the Rug by Pseudonym · · Score: 2, Informative
      In fact, most programs in those languages never need to manage anything but memory. Academia is like that.

      Ah, yes. Any language which isn't procedural is "academia only".

      There are many counter-examples, some of which I can't really talk about. Probably the most famous is Erlang, a functional language used to implement highly scalable, fault-tolerant telephone exchanges. Declarative symbolic languages like O'Caml are also widely used in bioinformatics, but somebody will probably dismiss that as "academia", too. (Admittedly, memory is usually the most critical resource that bioinformatics software has to manage.)

      Lisp's "with-this-and-that" and C#'s "using" construct also just paper over the hole [...]

      Actually, they solve a fairly large proportion of resource management problems. Many resource acquisitions really are scoped. I work writing highly scalable database servers for a living (in C++, incidentally) and I can report that for our domain at least, this is true most of the time.

      "Resource acquisition is initialisation" is an extremely good discipline for object-oriented-esque imperative languages, such as C++. Still, you have to remember that in C++, object lifetimes are still scoped! Even those which are stored on the heap are still, when you get down to it, managed by scoped objects. Those which aren't are, when you get down to it, managed manually. In other words, "resource acquisition is initialisation" doesn't free you from manual resource management, it just pushes it to a higher level. Another way to think about this is that it effectively ties related resources together, letting you manage them as an abstracted bundle.

      Still, most of the time, outside of "academia", the abstractions still leak. In our database server, for example, you still have to be aware of what locks are held when to avoid deadlock, no matter how much you abstract it. So even if you don't have to write code to manage resources manually, you still have to go through most of the thought processes that you would if you did manage them manually.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  2. Re:Reference Counting... by jonadab · · Score: 2, Informative

    > Is reference counting really that bad?

    Refcounting can't collect everything if you have any circular references. It's
    what Perl5 has, and we live with it, but Perl6 is getting real garbage
    collection (mark-and-sweep I think, or at any rate something more advanced
    than refcounting).

    --
    Cut that out, or I will ship you to Norilsk in a box.
  3. Re:a way to give the GC a hint? by Nasarius · · Score: 2, Informative
    Simply doing something like:
    foo = null;
    is sufficient. If you really want to, you can call the garbage collector manually:
    System.gc();
    --
    LOAD "SIG",8,1
  4. Re:An Obvious Fault by Anonymous Coward · · Score: 1, Informative

    An obvious fault that seems to go with out notice about garbage collectors, particularly stop-and-copy collectors is that when ever they do the full blow stop and copy, they have to touch all of those memory pages, and fault all of your virtual memory back into ram.

    That's why people use things like generational garbage collectors. Newer objects are generally shorter-lived, so newer objects get collected more often.

    What that means is that generally speaking a generational collecter will usually only be operating in main memory, and it'll only head into virtual memory once in a while. End result? The problem you describe becomes much rarer, and often doesn't occur at all in normal operation.

  5. Reference counting by Antity-H · · Score: 3, Informative

    It was mentionned earlier that reference counting was pretty good, but had a few drawbacks when it came to cycles and multi-threading.

    I took a bit of time to go and read Wikipedia's page

    In the description they give, they mention that reference counting GC can represent managed objects by directed graphs.
    I know there exists algorithm to find cycles in such graphs. So I suppose these could be applied to this problem. Other proposal are to use a tracing GC to detect them. To which it was replied that this would be able to reclaim the memory but not to properly finalize the objects. I don't see why that would be true. I mean, if you have found a member of the cycle to be collected, can't you just finalize that one and let the whole cycle unravel itself ? If there are cycles inside that cycle, just do it again on these etc ...

    As I said, another common objection was the cost of updating the counters in multithreaded environnments. Multiple solutions have been proposed, some more portable than others (using processor/platform specific atomic increments, or deferring the update until it is really necessary and using the standard mutex protection)

    All this said, I try to understand a couple of things.
    -I am no genius, thus these ideas must not be new, what is the problem which can't be solved with these?
    -Reference counting seems to integrate better in the runtime of the program. All the other techniques proposed seem to imply some monolithic operation on the memory summing up all the overheads at on time and doing the cleaning once in a while, with the possibility of becoming a bottleneck in heavily loaded systems. Reference counting OTOH seems to allow the cleanup to continually add a little bit of overhead to the system but nothing which will bring the whole thing to a grinding halt before allowing it to go on. What have I missed?

  6. Relevant references by hding · · Score: 2, Informative

    A couple of relevant references for garbage collection are the following website (which unfortunately hasn't been updated for a while - still, it's useful):

    The Memory Management Reference

    and of course Jones and Lins book, Garbage Collection: Algorithms for Automatic Dynamic Memory Management

  7. Re:a way to give the GC a hint? by tcopeland · · Score: 2, Informative
    > you can call the garbage collector manually

    Unfortunately, this doesn't force it to run. From the JDK 1.4.2 Javadocs for System.gc():
    Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse.
    Note the "suggests". No guarantees...
  8. Java doesn't have *a* garbage collector by blamanj · · Score: 3, Informative

    It has different collectors, which you can select according to the needs of your application. Currently there are two, the default collector (generational) and an incremental collector which is slower but less likely to pause.

    Also, the default collector is a 3-generation one, not 2, at least as of Java 1.4.1. More details here.

  9. *sigh* by Estanislao+Mart�nez · · Score: 2, Informative
    In this OCaml code, the plusx is created "on the fly" and it is different function, depending on the value of x that is read on runtime.

    I'm ready to believe you're simplifying this deliberately to illustrate functional programming techniques, but I think the simplification here is confusing.

    It's important to keep in mind the difference between code routines and closures. The term "function" as is commonly used doesn't respect this difference. C's "functions" are code routines, while ocaml's are closures, i.e. a pair of a routine and an invocation frame.

    What's being "created on the fly" are closures, which are like stack frames (storage for local values of identifiers in an invocation), but which:

    1. are allocated in the heap,
    2. have a pointer to a "parent" frame (the bindings in the enclosing environment), and
    3. have unlimited extent (since the invocation of a closure A might return a closure B whose "parent" is A, requiring that A be kept around indefinitely after the call to it returns).
    I think the point the original poster is making can be expressed in another way, but one that's more revealing of what's at stake: stack allocation is a form of automatic memory management.

    In any modern language, there is some form of automatic storage management behind the scenes for function-local storage. Imagine if in C, you had to manually allocate the stack frame of any function you called, and every function had to deallocate its frame before returning. This would be tedious and repetitive. Automatic management of a stack of limited-extent frames provides the programmer a simple (but restricted) way of doing this.

    It would be possible to have a functional language where the storage for closures was managed by hand. Imagine a language like C except that it allowed you, when you called a function, to specify a heap-allocated binding to be used in the invocation, instead of a stack frame.

    This would be similar to the hypothetical C variant from above, where the programmer was responsible for creating and destroying stack frames. But much harder, since closures have unlimited extent. In the stack allocation case, it's clear when the allocations and deallocations need to happen (before the a function is called, and before one returns). In the manually allocated closure language, the programmer would have to figure out on his own the extent of every closure, and when and where it's safe to free them. This is not simply tedious like in stack allocation, but rather devilishly complex in general.

    So, garbage collection solves it.