Slashdot Mirror


Review:Garbage Collection

A.M. Kuchling is back again, this time with a review of Richard Jones and Rafael Lins' oddly named Garbage Collection. This book is not actually about Waste Management Inc, but is "A highly detailed survey of garbage collection algorithms, and the research literature surrounding them. Interesting, if at times very dry.". Click below for more information. Garbage Collection author Richard Jones and Rafael Lins pages publisher Wiley & Sons rating 6 reviewer A.M. Kuchling ISBN summary A highly detailed survey of garbage collection algorithms, and the research literature surrounding them. Interesting, if at times very dry.

The first time you encounter garbage collection (GC, for short) is often by using Lisp or Scheme in an introductory computer science course, where it can seem an almost magical technology. How does the interpreter determine that a variable has become garbage, and can safely be deallocated? This book explains how GC works, in great detail, and interest people implementing garbage collectors or who've wondered how it's done. There are 3 approaches to GC, all of which are covered in this book:

  1. Reference counting:
    Every object keeps a count of the references pointing to it. When new references are created, the counter must be increased; when references are destroyed, the counter is decremented, and, if the count is zero, the object has become garbage and can be reclaimed. Reference counting unfortunately exacts the most overhead from constantly incrementing and decrementing the counters, and in its simplest form it can't collect cyclic data structures, because their count will never reach zero.
  2. Mark-sweep:
    Traverse the graph of object references, starting from the roots of the computation; roots might be the program's global variables, the contents of the C stack, static class variables, or a main namespace of some sort. Each object that's traversed is marked as being reachable; when the traversal is complete, any unmarked objects are unreachable because they can't be reached by the program any longer, and can be reclaimed as garbage.
  3. Copying:
    Divide memory into two halves, called "Fromspace" and "Tospace". Memory for new objects is always allocated out of Fromspace. When Fromspace fills up, a traversal similar to the one used by mark-sweep is done, but instead of just marking objects as reachable, they're copied into Tospace. Once the traversal is done, all the live objects are now in Tospace, and Fromspace no longer contains any live data; only garbage objects can be left in Fromspace. Tospace is now declared the new Fromspace, and new objects are allocated until it fills up; the algorithm is then repeated.

This book covers the above 3 algorithms for GC in great detail; there's a chapter for each algorithm, examining its limitations, common implementation techniques, and historical notes on the its development. Later chapters cover more advanced collectors that extend the basic algorithms in various ways. Generational collectors are based on the observation that many objects are temporary and have short lifetimes, while some few objects may last for the entire lifetime of the program. A generational garbage collector concentrates on young objects, because they will occupy the most recyclable space, and spends less time examining old objects that probably won't be reclaimed. Incremental and concurrent collectors don't bring the whole program to a halt while they scavenge through memory doing a collection, making them more suited for interactive or even real-time applications.

Chapter 9 is an interesting examination of the implementation of GC in an unfriendly environment, namely C programs. The Boehm-Demer-Weiser collector implements a mark-sweep garbage collector that can replace C's malloc() function, scanning stack frames and registers for potential pointers, and managing to do it fairly efficiently. One study found the Boehm-Demer-Weiser collector added 20% more execution time overhead than malloc() did -- that is, the time spent in memory allocation was 20% longer. (That doesn't mean it made programs 20% slower, because most programs spend more time computing results than they spend allocating memory). A garbage collector for C is a tricky job, complicated by the lack of type information, data values that may mimic pointers, and compiler optimizations that may render objects invisible to the collector. Barlett's Mostly Copying Garbage Collector, also described in this chapter, is more liberal and requires some assistance from the C programmer in exchange for greater accuracy.

The remaining 3 chapters consider suggestions for adding garbage collection to C++, the interaction of GC with cache memory, and distributed garbage collection. The topics of these chapters are still subjects of active research, so the final chapters are lighter on implementation details and heavy on references to research papers, making them a lot less interesting.

The authors are computer scientists specializing in GC (Richard Jones maintains a garbage collection page), and their coverage is quite complete, which at times makes for exceedingly dry reading as the authors enumerate what seems like every minor implementation variation and every relevant paper. On the other hand, if you're actually faced with implementing a garbage collector, the pointers into the research literature will be very useful. A casual reader (like me) can simply skim, or even skip, pages until the next interesting section arrives.

Excerpt

However, the matter is more subtle than this. The collectors described in this chapter are very simple and inefficient. The behaviour of these collectors cannot be automatically ascribed to their more sophisticated variants. The cost of copying an object is likely to be more expensive than simply testing and setting a mark-bit, particularly if the object is large. Although mark-sweep must sweep the entire heap, in practice its real cost is dominated by the mark phase. Linearly scanning the heap will generally be less expensive than tracing data structures even using the simple technique shown above. ... Furthermore, more sophisticated methods can substantially reduce the cost of the sweep. Although copying collectors have predominated in the past, recent studies suggest that the choice between mark-sweep and copying garbage collection may well depend as much on the behaviour of the client program as on the inherent properties of the garbage collection algorithm.
-- From section 2.4 of the book.

To purchase this book, head over to Fatbrain.

Table of Contents
  1. Introduction
  2. The Classical Algorithms
  3. Reference Counting
  4. Mark-Sweep Garbage Collection
  5. Mark-Compact Garbage Collection
  6. Copying Garbage Collection
  7. Generational Garbage Collection
  8. Incremental and Concurrent Garbage Collection
  9. Garbage Collection for C
  10. Garbage Collection for C++
  11. Cache-Conscious Garbage Collection
  12. Distributed Garbage Collection

105 comments

  1. Linux uses GC by Anonymous Coward · · Score: 0

    Did you know the Linux uses GC also ?
    look for garbage.c the function unix_gc is
    a simple mark and sweep GC.

    Since this seems to be the only book about GC
    it's probably the best ;-)

  2. Grrr... by Anonymous Coward · · Score: 0

    Of course, one could argue that good programmers don't spend their time creating needless oportunities for bugs to bite them when the problem has been solved.

    The argument seems to be that good programmers write the smallest tightest code, no matter how long it takes them to get it right. I would suggest that perhaps a good programmer chooses his/her tools so as to achieve the best programming value.

    For many applications, coding time and maintainability is more important than running speed. There are a few problem domains where this is not the case, but they are few and far between.

    There's a reason why I don't code in assembly language, and it also applies to why I'd rather not deal with memory management issues.

    Johan,

    who'd login if someone hadn't taken his userid already :-(

  3. Garbage collection leads to better programs. by Anonymous Coward · · Score: 0

    Which is better depends on what you're doing. If you have a memory leak in something it might still work, it'll just use too much memory. If you get a segfault, it won't work at all. Probably.

  4. Grrr... by Anonymous Coward · · Score: 0
    I have NEVER seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching malloc()/free() pairs.

    yes, and I have NEVER seen a program that couldn't be written in assembly.

    The point it that for large project it has been found that about 40% of the time was spent managing memory (see report on the Cedar project). So if you're willing to spent 40% more time programming/debugging just to gain 20% speed just go ahead.

    Anyway in OO code it is VERY difficult to determine lifetime of objects. An example: assume you have a list of object in C++, say vector. And now you pass around a vector* coolObjectList, in many objects, some of them copy some CoolObject* from the list. Now, how do you now when to free the CoolObject* ? You cannot unless:

    • You're implementing manually reference counting (Flame: talk about being stupid).
    • You're copying all the object: this will fail when the object are big, or more importantly, when you want the objects to be actually shared (one modification on the object should be reflected on the copies).

    My guess is that you never have actually written Real Programs(tm).

  5. C++ with Boehm-Demer-Weiser by Anonymous Coward · · Score: 0

    You're exactly right about keeping track of the lifetime of
    objects. I've been using C++ for years, and that's been a
    constant headache. So when I started my current project, a
    pair of compilers sharing a common backend, I decided
    to try the Boehm-Demer-Weizer GC library.

    The GC library may add 20% overhead as the book claims,
    but it's amazing how much simpler one's code can be when
    one doesn't need to worry about when to free memory.
    All that messy reference counting and redundant copying
    goes away! My programs seem to run much faster than
    the previous ones I did with traditional memory allocation.

    The only tricky thing is to make sure that all classes are
    derived from the library's "gc" class. If you forget one,
    the results can be really frustrating, as anything only pointed to
    by instances of that class are considered fair game for
    deletion by the GC system.

    Also, while the BDW system is fine for a compiler, whose
    execution has a limited lifetime, it might not be appropriate
    for a more persistent piece of software, like a word
    processor. This is because BDW is "conservative" in
    its recognition of what memory is being used, so it may not
    delete everything that it should.

  6. GC isn't "oddly named" by Anonymous Coward · · Score: 0

    GC is one method of memory management.

    It's not the only method.

  7. Generational collectors best. by Anonymous Coward · · Score: 0

    Generational scavenging and generational ephemeral collectors are the best. Collectors for Perl, Java and company are all immature and suck. If you want to see a real garbage collector, look at a good Common Lisp implementation like the public domain CMU Common Lisp or especially the commercial Allegro Common Lisp. Lisp has been in the GC business for 40 years. Hell, it's been in the language business for 40 years to so it pretty damn advanced.

    If you're sick of boring GUIs that are all the same and what a truly intelligent UI, you have no alternative.

  8. Garbage collection leads to better programs. by Anonymous Coward · · Score: 0

    and perhaps a feature that really should have made it into C++

    that would violate C++'s "no overhead" rule - you only pay for what you buy. If you don't want GC, you don't pay for its performance hit.

    That is why C++ makes things like virtual functions explicit where in java they are part of the whole ball of wax.

  9. GC can be fast. by Anonymous Coward · · Score: 0

    Most modern (= not in Java yet) gc algorithms use combination of generational and copying gc. They can be very fast. (Look at good Smalltalk or Common Lisp implementation)

    If application is run months in a row, gc may be faster than normal allocators used in gcc & co. Somehow malloc/free allocators seem to converge towards bad allocation patters (bad locality, lots pagefaults) in the long run.

    Copying gc is said to increace locality because it 'packs' memory when copying.

    If gc comes with good API. You can stop gc when doing time critical things. When using generational gc you can usually tell (locally) how big generations you want. That will give speed equal to malloc/free without memory leaks.

    1. Re: GC can be fast. by Anonymous Coward · · Score: 0
      Most modern (= not in Java yet) gc algorithms use combination of generational and copying gc. They can be very fast. (Look at good Smalltalk or Common Lisp implementation)

      Not everything is perfect though: the main problem with GC is currently multi-threaded (or shared memory) incremental garbage collection: there isn't a satisfactory GC because read and write barriers are expensive (and VM page granularity is around 4Kbytes-8KBytes, a bit to big to be useful).

  10. But it needs a new euphamism by Anonymous Coward · · Score: 0

    Just for correctness' sake... Microsoft wasn't the first to name directories "folders." Apple did it before MS, for one. I wouldn't be surprised if Apple borrowed that from someone else, also.

  11. GC is sometimes more efficient than free() by Anonymous Coward · · Score: 0

    I have seen a few cases of C code where garbage collection was more efficient than free().

    (1) Sometimes the fastest way to release parts of a complex data structure, to reuse the memory, is to do mark-scan or similar things (such as pointer reversal), and then to free contiguous clusters of objects at a time. This reduces the number of free() calls. See also (4) below [hidden overheads of free()].

    (2) Deciding when to call free() for nodes in some of the more complex data structures, usually involving graphs, can be expensive. In fact it can be equivalent to garbage collecting. Sometimes it is simpler to use a general purpose mechanism.

    (3) malloc()/free() of many varying size objects, for example for strings, causes fragmentation. Much of your memory is wasted in tiny free areas.

    A copying garbage collector (as opposed to the mark-scan kind) removes this sort of fragmentation, but with an occasional copying cost.

    (4) free() sometimes has non-obvious hidden penalties:

    For example, an implementation that maintains a free-list will appear to have a very fast free() operation.

    However if you're freeing in the order that your data structure releases objects, that can leave a very poorly structured free list, resulting in poor memory cache behaviour for subsequent allocations. Caches generally prefer contiguous access sequences, so contiguous allocation is preferred.

    Some algorithms have a tendency to pull fragments off the free list and put it back in FIFO order. When repeated and/or interleaved this leads to a complete mixing up of the free list order.

    On the other hand, if you attempt to maintain an ordered free list, that ventures into the realm of O(log n) complexity, depending on just how ordered you want it to be. The gain may be worth it due to memory cache effects though.

    If you leave the equivalent work to later malloc() calls, well, the overhead is still there.

    On the other hand, using a mark-scan garbage collector regenerates nice contiguous free lists. The overhead of garbage collecting may or may not be offset by faster access to subsequently allocated memory.

    I have even been known to implement some form of GC inside a malloc()/free()-style of allocator, specialised for specific object types, to improve memory access patterns.

    .............

    For reference (no endorsement is to be implied):

    Emacs uses a copying collector for strings (to avoid fragmentation I presume), a mark-scan collector using free lists in blocks for Lisp objects (which don't vary in size), and malloc()/free() for Lisp vectors (which come in various sizes to large for efficient group-in-block allocation). Editing buffers use a special "relocating allocator" which is similar to a copying garbage collector, but cooperates with malloc()/free()/brk() because the overheads of not doing so with large editing buffers are significant.

    Enjoy,
    -- Jamie Lokier

  12. Garbage collection leads to better programs. by Anonymous Coward · · Score: 0

    Undetected memory corruption is the worst
    possible outcome. It can silently produce
    wrong answers that look plausible.


  13. Here's 1 Example from the literature by Anonymous Coward · · Score: 0

    Look at the first complete published algorithm for calculating three dimensional convex hulls (Algorithms in Computational Geometry, Edelsbrunner, 1987). It works by making small, disjoint convex hulls, then repeatedly forms larger from smaller by "wrapping a plane" around adjacent pairs of small convex hulls. The lines and points of the small hulls that are in the interior of the volume enclosed by the original small hulls and the wrapping plane are not part of the convex hull and their space is available for reclaiming. Edelsbrunner's algorithm explains precisely how to do that algorithmically, with an extremely detailed description of "red lines" and "blue lines" and how to traverse them and reclaim (free) them. When I implemented the algorithm in Eiffel in 1990, I simplified the algorithm tremendously by ignoring all that and letting the garbage collecter clean up those edges and nodes. Note that reference counting would be completely inadequate for this task. The garbage includes complete cycles.

    This anonymous coward is Tom Morrisette.

  14. Grrr... by Anonymous Coward · · Score: 0

    I guess the point is, that while GC can be a time-saver (in having to not re-design sections of badly written code), it is also a crutch bad programmers use far too often to cover their own lazyness

    And a car is a crutch I use every day to get to work faster than walking.

    Just because a piece of technology makes a job easier, does not mean we should revert to "the good old days" before that technology -- because while old, those days were rarely good.

    Similar to how people first programmed in ASM, then C, and now by dragging forms and buttons around the screen, garbage collection makes a programmer more productive.

    I do not believe that being able to finish a job and quickly move on to the next one equates with "laziness" -- I think programmers who let the tools do more of the work are actually the ones I'd rather hire.

    Cheers,
    KenB

  15. Gwydion Dylan (Hans Boehm garbage collector) by Anonymous Coward · · Score: 0

    Anyone interested in advanced programming languages in invited to join our
    volunteer effort: Gwydion Dylan, an open-source implementation of the Dylan
    programming for unix-compatible systems.

    Gwydion Dylan's chief componenent is a highly optimizing Dylan-to-C compiler. The
    Hans Boehm conservative garbage collector for C is utilized for memory
    management.

    http://gwydiondylan.org

    (Harlequin Dylan, an commercial implementation of Dylan for Windows uses more
    advanced GC techniques).

    Dylan is an dynamic, object oriented language, which can be efficiently compiled.
    Dylan is much more abstract than a language such as Java, but can compete
    speed-wise with more static languages. The Dylan language features a wide-range
    of productivity-enhancing language features, such as hygenic macros, keyword
    arguments, multi-methods, advanced exception handling capabilities (and a lot of
    other stuff which slips my mind at the moment.)

  16. But it needs a new euphamism by Anonymous Coward · · Score: 0


    I vote for data ethanaisa

  17. Garbage collection leads to better programs. by Anonymous Coward · · Score: 0

    Garbage collection is a tremendous aid to writing good code.

    Actually, last I checked, Perl's memory management was jut simple refcounting, not even smart enough to avoid leaking circular garbage.

    Yes, current implementation of Perl and Python have the same problem. But naively implemented GC should not be used as an argument against GC.

    The open source Gwydion Dylan does a little better. It uses the the Hans Boehm collector.

    A well-implemented GC outperforms manual deallocation. (Programs utilizing GC will have more memory requirements, but memory is cheap these days).

  18. Reference Counting vs. Mark & Sweep by Anonymous Coward · · Score: 0


    Try to run mod_perl, where the interpreter is persistent, and issues with
    reference counting become immediately apparent.


    Yep, that's trying to fit a square peg in a round hole. IHMO, people would be better off with Dylan (which has decent GC) for long-lived programs.

    Java has some interesting arguments about their GC implementation, along with the
    little known fact that up to 40% of CPU time can be used just in GC: they do it
    right but it's not cheap...


    I've never heard that 40% figure. Java GC implementations have historically not been state-of-the-art, though. The fact is that a good GC is faster than manual deallocation.

  19. true no leaks with GC, but it has opposite problem by Anonymous Coward · · Score: 0

    There may be no leaks with GC, but such GC
    languages have the opposite problem of holding
    on to TOO much memory due to tangled and
    mistakenly held object references.

    I don't have 80 Megs to run a hello world
    java application.

    There is no silver bullet.

  20. Generational collectors best. by Anonymous Coward · · Score: 0

    Generational scavenging and generational ephemeral collectors are the best. Collectors for Perl, Java and company are all immature and suck.

    Inflammatory, but true. (Some of the newest Java implementations may have state-of-the-art GC though.)

    If you want to see a real garbage collector, look at a good Common Lisp implementation

    Quite a few implementations of other languages have advanced GC too. Smalltalk, Dylan, etc., though some are closed-source.

  21. Garbage collection leads to better programs. by Anonymous Coward · · Score: 0

    FWIW, there is a variant of C++ called EC++ (embedded C++), which strips out many of the less
    well though out "features" of C++


    If you stripped all the less well thought out features of C++, wouldn't you be left with the empty language?

  22. Memory Management Reference web site worth reading by Anonymous Coward · · Score: 0

    Harlequin should certainly know their stuff... They have great implmentations of Dylan, Lisp, *and* SML.

  23. GC and heap fragmentation by Anonymous Coward · · Score: 0

    GC does not magically fix heap fragmentation. It does give you a nice time to compact your heap as you are going to be walking it anyway, but you don't need GC to do this. Perhaps it's pointers that you really don't like? I just wish that the folks with the next big ADA would humor the old school fossils and include some form of realloc().

  24. GC and Timing by Anonymous Coward · · Score: 0

    The one thing that I never found to be acceptable was that the program/system froze for anywhere from 100s of milliseconds to several seconds when the GC decided to clean up. Has this problem been fixed in modern implementations of GC?

    Yes, in any decent implementation, pauses due to GC are not noticable.

    There are also real-time garbage collectors, thought this area of research is much more immature.

  25. Gwydion Dylan (Hans Boehm garbage collector) by Anonymous Coward · · Score: 0
    Gwydion Dylan's chief componenent is a highly optimizing Dylan-to-C compiler.

    Well, my favorite language is Python, but I just wondered, is there any documentation about Dylan-to-C compiler ? The internals manuals aren't very verbose :-).
    I'm asking this question because the Pythoneers are dreaming of compilation, and Dylan is one of the rare languages that are both static and dynamically typed. Common Lisp is another but its object system is just crappy performance-wise, so it doesn't tell how to mix dynamic types and efficiently compiled OO code.
    Maybe a DPython (Dylan implementation of Python, just like there is a Java implementation of Python) might be even done...

  26. GC is the way by Anonymous Coward · · Score: 0

    GC is a *big* reason why Java is so popular in enterprise environments.

    If that was true, then Smalltalk would have ruled the enterprise environment long ago. I think Java's popularity is more due to the hype machine that Sun was able to generate. GC has certainly been around for a very long time. Granted GC did not perform well in it's early years, but it certainly has performed very well for quite a fews years since the advent of better algorithms. Java implementations so far have had rather primitive GC implementations compared to Smalltalk implementations, so GC would not seem to explain the popularity of Java.

    Unfortunately Java is a rather limiting language overall. It takes about twice as many lines of code to accomplish something in Java than it does to accomplish the same thing in a more abstract language, such as Dylan (or Smalltalk). Somethings are virtually impossible to do in Java without altering the language itself (witness Microsoft's hacking their Java implementation to support COM).

  27. Gwydion Dylan (Hans Boehm garbage collector) by Anonymous Coward · · Score: 0


    Well, my favorite language is Python, but I just wondered, is there any
    documentation about Dylan-to-C compiler ? The internals manuals aren't very
    verbose :-).


    The Gwydion Dylan compiler internals aren't very well documented, but on the positive side, even the hairiest Dylan code tends to be very readable and not very hairy. Just about everything in Gwydion Dylan, including the libraries, are written in Dylan.

    Maybe a DPython (Dylan implementation of Python, just like
    there is a Java implementation of Python) might be even done...


    Sure... but if you learned Dylan well enough to implement a Python compiler in Dylan, then you might find yourself losing interest in Python in favor of Dylan. :-)

  28. GC is sometimes more efficient than free() by Anonymous Coward · · Score: 0
    A copying garbage collector (as opposed to the mark-scan kind) removes this sort of fragmentation, but with an occasional copying cost.

    The issue copying/not copying is not clear at all: first you aren't "occasionally" copying, in some cases you should copy rather often (that is unless it is always the latest generations that die). Copying is also the best way to sweep your data cache :-(.

    Second, fragmentation isn't such an issue. There is soemwhere a thesis that shows that an good allocator can loose around 15% of memory in fragmentation, which is equivalent to the overhead of the allocator internal data structures and might be inferior to the memory overhead of a GC (due to delayed frees and conservative GC).

    However if you're freeing in the order that your data structure releases objects, that can leave a very poorly structured free list, resulting in poor memory cache behaviour for subsequent allocations. Caches generally prefer contiguous access sequences, so contiguous allocation is preferred.

    But also objects tend to be allocated and freed in bunches, so depending on the behavior of your program, there are still chances that structures will be allocated in contiguous blocks of memory (i.e. in the areas of big bunches just freed).

    This is why I've come to think that GC implementation is a black art with numerous uncertain memory-CPU trade-offs, that depends on the programs, the CPUs, the amount of memory ; (that might change from year to year, CPU to CPU, machine to machine, language to language). The only thing which is sure is that programming without GC is just too painful.

  29. true no leaks with GC, but it has opposite problem by Anonymous Coward · · Score: 0

    > Almost as a rule, Java implementation don't have modern GC. You are still exaggerating.
    > The memory requirements of a garbage collected program are generally about 2x that of a manually deallocated one.

    I beg to differ.

    I am commonly seeing Java GUI applications taking up 4X the amount of process memory to their C++ equivalents (under UNIX GTK+ or Qt)

  30. Lack of Real Hackers by Anonymous Coward · · Score: 0

    It is sad that the world does not have very
    many people that can handle powerful tools
    like manual memory management.

    I don't see what is so hard about it. If you
    allocate something, you should free it. If you
    don't allocate something, you had damn well
    better not try to use or free it.

    For the less clueful, try static arrays.
    They are friendly and fast.

  31. Grrr... by Anonymous Coward · · Score: 0

    "And a car is a crutch I use every day to get to work faster than walking. "

    And because of all the cars, it is faster for me to ride a bike to work than to take a car.

    *Appropriate* technology is usually the answer to the problem.

  32. Xerox again by Anonymous Coward · · Score: 0

    The Xerox Star used folders.

  33. Dylan watered-down Lisp. by Anonymous Coward · · Score: 0

    It's true. It's a weaker version of Lisp without the parenthesis that makes doing things that Lisp can do so well extremely painful. Dylan has a powerful macro system, the second most powerful there is perhaps next to that of Common Lisp, but it's extremely yucky to use.

  34. Learn how to use your Common Lisp compiler.... by Anonymous Coward · · Score: 0

    "Common Lisp is another but its object system is just crappy performance-wise, so it doesn't tell how to mix dynamic types and efficiently compiled OO code."

    Learn how to declare types and use your Common Lisp compiler you dufus. Don't spread lies and misinformation. CLOS is the most powerful object system there is. Period. Common Lisp code will in many cases outperform C++ code, and in some cases even GCC floating point code (CMU Common Lisp).

    1. Re: Learn how to use your Common Lisp compiler.... by Anonymous Coward · · Score: 0
      Learn how to declare types and use your Common Lisp compiler you dufus.

      Because I'm not a Common Lisp expert, I didn't found a way to declare types for CLOS objects to the compiler. Please enlighten me.

      CLOS is the most powerful object system there is. Period.

      Exactly! Power always comes at a price. Either performance or complexity of the compiler. Now it may well be that Common Lisp compilers are very elaborate for CLOS code (I don't doubt this for normal function calls). But I fail to even imagine how it is possible for a runtime dynamic metaobject-protocol to be as fast as C++ method call which is just a standard compiled function call that can be often inlined. Of course JIT and optimizations such as those found in Self and Smalltalk are doable, but my understanding was that they aren't that wide spread ; maybe they are considered to be redundant with the CL compiler for non-OO code anyway. In contrast Dylan has clearly defined mechanisms that allow compilation to fast code.

      Common Lisp code will in many cases outperform C++ code, and in some cases even GCC floating point code (CMU Common Lisp).

      I certainly don't dispute that CL code can be faster than gcc code. I know it. But can CLOS code outperform C++ ? Most CL advocates are strangely silent on that point. The usual arguments are:

      • Common Lisp (non-OO) can be faster than C or C++.
      • Common Lisp has a much more powerful OO system than C++ (CLOS).
      somehow implying that CLOS is faster than C++. But this is far from being obvious to me, and I have never found real figures that quantify how fast is CLOS for pure OO code(which is what I write).

      For instance if I'm writing a chess program and have a class ChessBoard with methods getSquare(int x, int y), movePiece(int fromX, int fromY, int toX, int toY), xyToPos(int x, int y) where xyToPos is used internally everywhere to convert from x,y to single index (x+y*8), and getSquare/setSquare is used everywhere internally to get/set the board data, are you sure that you won't get a huge performance hit ?

      Don't spread lies and misinformation.

      Ok, I'll apologize as soon as I have found information (papers, tech reports, WWW, DejaNews, ...) of the speed of CLOS.

    2. Re: Learn how to use your Common Lisp compiler.... by Anonymous Coward · · Score: 0

      Huh? Did you know that most Common Lisps are implemented in CLOS? It makes writing everything MUCH MUCH easier, but bootstrapping a real bitch. For many Common Lisps even by using DEFUN you are using CLOS.

    3. Re: Learn how to use your Common Lisp compiler.... by Anonymous Coward · · Score: 0
      Huh? Did you know that most Common Lisps are implemented in CLOS? It makes writing everything MUCH MUCH easier, but bootstrapping a real bitch. For many Common Lisps even by using DEFUN you are using CLOS

      No I didn't know, and it isn't obvious from the clisp, gcl and cmucl source code. I checked and found out that it is the case in clicc. Maybe is it just that some freely available implementations have an inferior implementation of CLOS.

    4. Re: Learn how to use your Common Lisp compiler.... by Anonymous Coward · · Score: 0
      I wrote: Ok, I'll apologize as soon as I have found information (papers, tech reports, WWW, DejaNews, ...) of the speed of CLOS.

      While double checking, I saw that indeed CLOS was well integrated in commercial Lisps (it is not obvious from "On Lisp", and "The Art of the Metaobject Protocol"), so indeed my last answer was a kind of misinformation. Commercial Lisps are very sophisticated even with respect to CLOS, so I do apologize. The problem with CLOS is not a matter of runtime dynamic metaobject protocol but of dispatch based on class type.

      However my initial statement was that "Common Lisp is another but its object system is just crappy performance-wise, so it doesn't tell how to mix dynamic types and efficiently compiled OO code", has still has some truth, especially in the context of making a Python to Lisp converter (vs a Python to Dylan compiler). Because in that case, I'll have something like:

      (defclass PythonObject [...])
      (defclass PythonInteger (PythonObject [...])
      (defclass PythonList (PythonObject) [...])

      (defmethod add ((PythonInteger i1) (PythonInteger i2)) [...])
      (defmethod getitem ((PythonList list) (PythonInteger index)) [...])

      And the goal is now, that with some type declarations, that (getitem myList (add integer1 integer2)) is transformed in code equivalent to the C myList[a+b], which is far from obvious for CLOS (in case you wonder Python addition/getitem requires a multiple dispatch which is hard-coded in the bytecode switch of the Python interpreter).
      It is not clear at all that, even forgetting about type declaration and optimization, the Lisp code will be acceptable, compared to a Dylan implementation with sealed PythonInteger and PythonList.

  35. Use customizable generational collectors. by Anonymous Coward · · Score: 0

    Yes, GC pauses are why I hate Java so much. Sliding a Swing scrollbar and having it jump around is unacceptable to me. I can't wait until Java(TM) starts getting good Common Lisp style generational collectors. The workload is spread out much more fine-grained and for sufficiently dynamic programs where there is lots of object copying and creation, generational collectors can often outperform malloc()/free().

    The better Common Lisps let you configure and tune the GC extensively for size, collection frequency, and all kinds of stuff so that this is not a problem.

  36. Lack of Real Hackers by Anonymous Coward · · Score: 0

    I remember back in the 70s when I graduated from static arrays in BASIC to dynamic systems like malloc(). At first the power went to my head and I allocated until I ran out of memory and the OS wanted to swap to cassette tape, but I later learned about free() which sometimes delayed running out of memory for a long time.

    I never understood this garbage collector idea, which seems like staying in a fancy hotel where the maid comes around when you're out at the beach and tidies things up and rearranges your shaving kit. If I'm paying I'd rather stay in a cheaper hotel where I clean up after myself and save some money, but maybe the people who use garbage collection work for big companies that can afford the maid service and using the minibar isn't a problem on their expense report.

  37. Dude, you are wrong. by Anonymous Coward · · Score: 0


    Nuff said.

  38. mozilla by Anonymous Coward · · Score: 0


    The mozilla JIT compiler uses a generational
    collector. It's looking to be pretty good
    (but I'm no expert), so, if as you say, the
    java runtime gc from Sun "sucks", I'm sure
    they will get better.

  39. 0: BASIC/Windows, 1: C/C++/Unix, 2:Eiffel.., 3: .. by Anonymous Coward · · Score: 0

    Levels of literacy.

    Obviously you still have a lot to learn. But let's just make one important point:

    USER-LEVEL CODE HAS NO BUSINESS WHATSOEVER WITH
    ADDRESSES AND MEMORY.

    Some MIS departments have moved from COBOL to C++.
    Even if I'm totally disgusted by COBOL, I must say
    it is a language adapted to the task at hand.

  40. GC is the way by Anonymous Coward · · Score: 0
    Java implementations so far have had rather primitive GC implementations compared to Smalltalk implementations, so GC would not seem to explain the popularity of Java.

    This is partly true. The original statement could be reworded to be: "Java being a C++ with a GC is a big reason why Java is so popular". Granted there are some other improvements on C++ (like more reflectivity and being more dynamic), that may account also for its popularity. But being a better C++ is a factor.

    Objective-C could have also be a better C, but at the time it was C++ that played this part, maybe because of the hype, maybe because it has the same philosophy as C ([nearly] nothing happens behind the scenes, and OO is reduced a glorified function table lookup). Nowadays it is clear that it wasn't the right way, so we have Java as a better C++. We just have to wait the day people will discover that Smalltalk is a better Java (or maybe 1 or 2 intermediary languages in between are necessary).

  41. After internet porn.. by Anonymous Coward · · Score: 0

    Make big bucks selling `garbage collection' books on the net!

    C'mon man. Worthless data is garbage after all.

    Have fun. Dress up, throw yourself into a trash bin, and then go to the bookstore. Great Halloween disguise!


  42. Gwydion Dylan (Hans Boehm garbage collector) by Anonymous Coward · · Score: 0
    That's not true! I help maintain a Dylan compiler, and I still like Python!

    It's interesting to see that someone that like Python, still help maintaining a Dylan compiler :-) (I'm a Pythoneer).

  43. Lack of Real Hackers by Anonymous Coward · · Score: 0
    It is sad that the world does not have very many people that can handle powerful tools like manual memory management.

    People that can handle powerful tools have still better things to do than wasting their time trying to match new/delete pairs. Especially since the computer can perfectly do it (isn't it the point of computers: making them do things that otherwise people would have to do ?).

    I don't see what is so hard about it. If you allocate something, you should free it. If you don't allocate something, you had damn well better not try to use or free it.

    This doesn't work in OO. Example: if I have a OO library that parse 3D models (say in 3DS, or OFF, or whatever). If I use them in a game, and I'm using a renderer compatible using the same data-structures as this parser. Now look what happens: I read the 3D model say a ship, I create "instances" of that 3D model, say an army of that ship ; then I hand them periodically to the renderer, and to a collision detection system. When should I free it ? I don't really know: first only when all the ships have disappear (say they have all explosed). This already really sucks because I have to implement reference counting (so I'm already wasting my time). But then maybe the renderer is being "paused" (a player hit the "pause" key), that froze the display, and still use the 3D model, while all the ships of the army have disappeared from the game. And then maybe the parser library has given another 3D model (say a ship with a turret) that is based on this 3D model (i.e. has in its internal data structures a pointer to the first 3D model). And maybe the 3D collision detection system holds a cache of based on objects of previous iteration (in order to speed up collision detection), that is still refering to the first 3D object.
    How would you handle that ? You'll probably have to try to force completly arbitrary restrictions (you shall not cache objects, etc...), that'll get in the way of the designs of programmers, and optimization, add much code, and gratuitously add bugs. And at the end you'll spent 60% more time to finish your program, just because you insist doing manually something that the computer can perfectly do.

  44. Try comp.lang.lisp by Anonymous Coward · · Score: 0

    The people there are much friendlier than those at other language groups.

  45. If you want to help Dylan.. by Anonymous Coward · · Score: 0

    Get the alternative Lisp syntax back. That's not hard to do.

    Yep, it wouldn't be very hard to do. (Dylan had a prefix syntax in its early days before it was decided that Dylan would be much more commercially viable with an infix syntax.) You are welcome to join the Gwydion Dylan effort if you are interested in adding this feature.

  46. But it needs a new euphamism by Anonymous Coward · · Score: 1

    This book should have created a new euphamism for garbage collection. I was going to buy it, but was embarrassed to be buying a book about cleaning up trash. Microsoft called directories folders, dejanews and netscape both made up other words for usenet newsgroups, and following that precedent, we need a new name for garbage collection.

  47. true no leaks with GC, but it has opposite problem by Anonymous Coward · · Score: 1

    I don't have 80 Megs to run a hello world java application.

    Almost as a rule, Java implementation don't have modern GC. You are still exaggerating.

    The memory requirements of a garbage collected program are generally about 2x that of a manually deallocated one.

  48. Garbage collection leads to better programs. by unruh · · Score: 2
    The most hideous bugs can lurk in memory management. You get memory leaks if you are lucky or segfaults if you are not. GC frees you from a whole class of bugs and permits leaner, more legible code.

    GC was a revelation to me when I switched from C/C++ to Perl. Of course Perl's memory management goes beyond GC, but that's another story.

    Rapid Application Development is nearly impossible without GC -- you can't develop rapidly when you are hunting down memory leaks and illegal pointers.

  49. Which is GC "oddly named"? by gavinhall · · Score: 1

    Posted by !ErrorBookmarkNotDefined:

    The GC term is used in literature, code, conferences, and everyday conversation about memory management.
    What's odd about that (besides the obvious geekiness of it all)?

    -----------------------------
    Computers are useless. They can only give answers.

  50. Garbage collection leads to better programs. by spacey · · Score: 1

    Perl only uses a reference counting GC, and last time I looked at one of the postscript formatting modules its circular data structures caused the refcount to never reach 0.

    It's still a lot easier then not having a GC at all, though.

    -Peter

    --
    == Just my opinion(s)
  51. But it needs a new euphamism by spacey · · Score: 1

    Rubbish disposal?

    Sanitary deportation?

    Forceful of unwanted elements?

    -Peter

    --
    == Just my opinion(s)
  52. Grrr... by spacey · · Score: 1

    If you're doing good generalized libraries then memory management is a problem. Just doing malloc and free is difficult if you don't know the lifetime of an object because another programmer determines that. Overriding malloc() calloc() realloc() and free() in these situations with a GC is a very nice thing.

    -Peter



    --
    == Just my opinion(s)
  53. Memory Management Reference web site worth reading by Paul+Crowley · · Score: 1

    The Harlequin Memory Management Reference is a good reference for lots of this stuff, especially the glossary:

    http://www.harlequin.com/mm/reference/
    --

  54. malloc/free can be slower than GC by Paul+Crowley · · Score: 1

    Some programs can be *faster* and *more responsive* as a result of substituting proper MM technology for malloc/free. As well as being cleaner, more reusable and more bug-free.
    --

  55. Grrr... by bluGill · · Score: 2

    Obviously you have never written a real program spanning several million lines of code with subroutines written by someone other then you.

    Combine the above with a data structure malloced in a library not written by you (possibal in OS/2, though it isn't malloc) that needs to be freeed in a latter subroutine. Maybe we should toss threads in too, so you may have a differnt thread looking at your memory after your subroutine is done, and your subroutine has to return a pointer to memory malloced. Suddenly you don't know who is done with that memory you just started with first. Can't call free() too soon lest you prevent access by someone else who needs it, but you have to call it. GC solves this problem, and if it was fast enough would be more universial.

    The creater of C++ has listed allowing himself to be convinced that GC was too slow as his greatest regret.

    Now for simple cases you are right, but the simple cases are rarely got wrong (for long) but when the memory needs to hang around for a long time the case is no longer simple.

  56. very good / one complaint by Kyril · · Score: 1

    One problem with GC in CORBA: when is an object garbage, and how can the server with the object tell? If I stringify an object reference, write it down on a piece of paper, and send it message-in-a-bottle to parts unknown...the object is still referenced.

    Of course, "nobody seems to be using the object recently" is useful for objects which can be shut down and restarted...

  57. GC is the way by Stu+Charlton · · Score: 1

    GC is a *big* reason why Java is so popular in enterprise environments. Forget write once, run anywhere.

    GC and simplified pointers help you to pay more attention to the problem *at hand*, then to have to worry about differentiating between (in C++):

    - stack allocated objects that don't have to be deleted but can't be safely passed around because they go out of scope
    - heap allocated objects that have to be deleted
    - pointers vs references (i.e. dot-notation vs. '->' notation)

    Granted, these are not big problems by themselves. Put together however, they wind up being an unnecessary headache, especially when dealing with application-level programming. And on the bright side of things, Java catches your null-pointers with a catchable-exception - not a segfault.

    When dealing with systems-level stuff, putting up with C++'s self-memory management is a "grin & bear it scenario", until Java's performance increases.

    Though, of course, C++ and C will always be used where bit-twiddling is needed.

    --
    -Stu
  58. SmallTalk by Stu+Charlton · · Score: 2

    Actually, I agree more or less with your assessment, though I'm a little more optimistic about Java's abilities. (This probably comes down to the strong vs. weak typing argument that I won't get into. In summary: I like both, but going foward there are certain domains that need strong typing, and others that should have weak typing. This is too much of a religious issue: witness the flamewars against Tcl :)

    SmallTalk did catch on in many industries, but it never quite penetrated into mainstream use. Does anyone really know why? Perhaps it primarily because it was so advanced for its time (in the business community, I'd say late 80s to early 90s). Perhaps it was because C++ offered the "path of least resistance". I don't really know.

    Java is just an attempt (imho, and from what I understand, some veteren SmallTalker's opinions) at doing SmallTalk over again for the C++ crowd (albeit with some limitations, since the programming community at large seems to be set against weakly-typed languages)

    I love SmallTalk [and am very interested in Dylan or Self], and wish it would resurge with industry support. Until it does, (go Squeak go!) however, Java is an acceptable alternative, as it also provides the "path of least resistence" to some of SmallTalk's better technologies: GC, dynamic typing (though restricted through java interfaces), and a Packages/Classes/Methods IDE (through VisualAge for Java).

    Regarding COM: I felt that Microsoft actually did some cool stuff with their VM to support COM and make it MUCH easier to use than the C++-way of using COM. COM inherently is based upon the idea of C/C++ pointers, so it's not going to naturally map to any language, Java or otherwise (witness the atrocity called "Automation" to allow COM to talk to Visual Basic).

    Furthermore, as for your "double lines of code" assertion, I think you're overstating it a bit. Java doesn't have blocks, and it requires explicit type down-casting, but I don't see this as "doubling" code requirements. Certainly, the code requirement is larger than SmallTalk, but not significantly so.

    --
    -Stu
  59. Dry my foot... by kinga · · Score: 1

    Tony got Richard Jones to sign his copy of the book the last time he was up here visiting. Richard signed it:

    "A Good Collection of Garbage"

    :-)

  60. But it needs a new euphamism by fizbin · · Score: 1

    Data euthanasia?
    Memory cleaning?
    Data cleansing?
    Memory downsizing? (or "rightsizing")
    RIM? ("Reduction in (allocated) memory" - analogous to corporate-speak RIF)

  61. Reference Counting vs. Mark & Sweep by Will+Sargent · · Score: 1

    Perl actually uses a reference counting algorithm for garbage collection, rather than mark and sweep -- this can cause problems with self-referential data structures (i.e. if A points to B and B points to A, there are references, but the variable may not be available.)

    In the short run, this doesn't matter so much with Perl because it's more useful as a tool than as an actual language. Try to run mod_perl, where the interpreter is persistent, and issues with reference counting become immediately apparent.

    Many people don't consider reference counting to be "true" GC at all -- Java has some interesting arguments about their GC implementation, along with the little known fact that up to 40% of CPU time can be used just in GC: they do it right but it's not cheap...

    I believe there's even a koan regarding GC in the Hacker Dictionary(which is appropriately self-referential...)

  62. Grrr... by furball · · Score: 1
    I have NEVER seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching malloc()/free() pairs.

    No one ever tries to be not careful. The cautious statement you made of "if you were careful" points out that there are times that you were possibly not careful. malloc()/free() pairs gets more complicated with distributed code.

  63. GC has a place by slew · · Score: 1

    Although GC is pretty helpful in RAD applications, so many people misuse GC that it has gotten a
    bad name. Usually the same people who misuse GC also misuse pointers.

    The main form of GC misuse in OO-land is to avoid properly designing object destructors. As things
    get more complicated and multiple people start working on things this is often the place of the
    first design failure.

    Selecting the right time to destroy persitent objects is the biggest problem for real-time/
    multi-threaded/retentrant code. Subtle bugs (like the TCP listen-timeout bugs) usually live here.
    Defering these problems to a generic GC is probably the wrong thing to do in most cases.

    Using GC to free nodes on a recursive traversal is great, but using GC to solve buffer deadlock
    is probably a horrible thing to do.

    Remember that most GC algorithms are only faster in that they essentially defer free() calls and
    process them in bulk reducing overhead. Ref-count and mark-sweep type algorithms require that
    more work be done than simply marking unused on a call to free().

    As they say: pick the right tool for the job.

  64. Chip designers, WAKE UP! by matomira · · Score: 1

    So many transistors going down the drain of diminishing returns: more and more branch prediction, out-of-order execution, multiple execution units..

    And just a little bit of silicon helps GC in a big way!

    I hope that the trend resumed by PicoJava will propagate to other embedded and general purpose processors.

    And dynamic type checking is mainstream too now,
    but that is another story..

  65. Too many _Hackers_ (in the bad sense of the word) by matomira · · Score: 1

    There's too many people out there (some with good positions at important companies) who either:

    a) Don't know anything besides C and C++
    b) Are forced to use C/C++ because that's the company orientation

    so that then they later force that trash on developers in the form of libraries, debuggers that crash, broken compilers, etc.

    Every time I upgrade systems/tools, the release notes make me sick.

    No. Open Source is not doing better. You can't ignore C/C++ on Linux either.

    For simple things, malloc/free can be enough, but
    there's a certain point where the complex relationships between components/subsystems/clients crave for GC.

  66. If you want to help Dylan.. by matomira · · Score: 1

    Get the alternative Lisp syntax back. That's not hard to do.

  67. But at least one book.. by matomira · · Score: 1

    Ever tried searching for `garbage collection' or `memory management' in amazon.com? I'm happy enough this book exists. I'm sure others will come (The recommendations list I get consists mostly of `patterns' titles. I'm so bored I started getting into semantics, partial evaluation and old Lisp books)

  68. Try alloca() in C/C++ by Jonathan · · Score: 1

    Not to mention just plain sounding better than malloc or calloc. Malloc and Calloc sound like they should be villians in Dr. Who or something. Alloca has a nice Hawaii-five-o ring to it.

    But seriously, how bad is alloca for portability? Man pages generally claim that it is system dependent...

  69. Grrr... by dvdeug · · Score: 1
    In your example, all you need is . . . maintain your own refrence counts internally, or some such.



    But reference counting is a form of garbage collection. So in this case, you suggest writing your own garbage collection instead of having the language provide it.

  70. GC and Timing by Doctor+Memory · · Score: 1

    Yes, there are incremental garbage collectors which don't try to collect *all* abandoned objects, they instead just try to keep some fraction of the allocation space free. They get called more often, but do less work, so the GC time is spread out over the execution. Of course, when they fail, they have to call a more serious GC routine, which will often cause a "burp" in performance.

    --
    Just junk food for thought...
  71. very good / one complaint by Jan · · Score: 1

    I enjoyed reading this. It is great to have a well-organized one-source presentation of the classics.

    My one complaint: as is often the case for academic texts, is it does not even mention important deployed industry technologies.

    For example, if I recall correctly, in its discussion of distributed GC, there is nary a mention of DCOM or CORBA. (The book was published in 1996, when DCOM and some CORBA ORBs were shipping.)

  72. Garbage collection leads to better programs. by JanneM · · Score: 1

    You get memory leaks if you are lucky or segfaults if you are not.

    Actually the other way around... I've worked as a C++ programmer under Windows (ugh!). A segfault is an obvious indication that something is wrong and will tell you where it occured, while a memory leak can go unnoticed for days or weeks of development and is a pain to track down. Give me segfaults over memory leaks everytime.

    --
    Trust the Computer. The Computer is your friend.
  73. I can growl louder than you can by SimonK · · Score: 2

    Garbage collection for those who find it to lazy or dificult to call free()...

    I am afraid you have just pressed one of my buttons. There is nothing that alerts me to danger more than one programmer acusing another of being 'lazy' because he does not want to use some rediculously complicated tool of which the first programmer is fond.

    I have NEVER seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching malloc()/free() pairs.

    If this were usenet, at this point I would endeavour to call you out on this, and we would end up in a pissing contest over who had the most experience and whose experience was most real

    Since this is not usenet, and I am in a good mood, I will not. I will just say that I have seen many examples of supposed OO code hacked to death or made overly complicated by the need to keep track of memory allocation. The stability, memory usage and locality of a great many C programs (including XFree86) can be improved by linking them with a garbage collector. In many cases this can actually make them faster.

    I am, at this very minute, attempting to track down a bug in a C++ library linked against my Java program, which is caused by a failure of the evil reference counting scheme the library author used to keep track of his memory usage. Now I know, from looking at the code, that he could not have used malloc and free in the traditional way without breaking the clean interface of his code, and using a real garbage collector with C++ is hard, so what choice did he have ? But the need to reimplement garbage collection badly for every project rather than doing it properly once is a total waste of time.

  74. Are you serious? by Cato · · Score: 1

    I don't believe it - now we are going to have politically correct computer jargon??? Please tell me you're kidding...

  75. Garbage collection leads to better programs. by Rob_D_Clark · · Score: 1

    The most hideous bugs can lurk in memory management. You get memory leaks if you are lucky or segfaults if you are not. GC frees you from a whole class of bugs and permits leaner, more legible code.

    I think I'd rather have the segfaults... sometimes memory leaks can be tricky to track down. At any rate, gc is the way to go.

    GC was a revelation to me when I switched from C/C++ to Perl. Of course Perl's memory management goes beyond GC, but that's another story.

    Rapid Application Development is nearly impossible without GC -- you can't develop rapidly when you are hunting down memory leaks and illegal pointers.



    This, IMHO, is one of the really nice things about java, and perhaps a feature that really should have made it into C++ but didn't. It seems to me that properly handling exceptions w/o gc can make life real hairy for the C++ compiler.

    I think GC is an interesting topic, and I am glad to read about it on /.

    --
    --Rob
  76. Garbage collection leads to better programs. by Rob_D_Clark · · Score: 1

    that would violate C++'s "no overhead" rule - you only pay for what you buy. If you don't want GC, you don't pay for its performance hit.

    FWIW, there is a variant of C++ called EC++ (embedded C++), which strips out many of the less well though out "features" of C++, like exception throwing (other than in constructors, where it is required to deal with error conditions, because you can't return a value), templates, etc.

    I have heard that it helps reduce the C++ code bloat a lot...



    --
    --Rob
  77. Lack of Real Hackers by Rob_D_Clark · · Score: 1


    It is sad that the world does not have very many people that can handle powerful tools like manual memory management.

    I don't see what is so hard about it. If you allocate something, you should free it. If you don't allocate something, you had damn well better not try to use or free it.

    For the less clueful, try static arrays. They are friendly and fast.


    If I take that argument to the extreme, then I would arrive at the conclusion that real hackers should program in assembly. (or even machine language!)


    Yes GC makes it harder for idiots to make mistakes, but I don't think that limits its usefulness to real hackers.

    GC is of more use than simply memory management. In an OO language that supports destructors, GC can handle resources other than memory. For example, if you had an class that mixed audio streams together and wrote them to the audio device, your class's constructor could detach from the audio device freeing it for use by other programs. There would not be an easy way to do this without GC besides reference counting... and what hacker wants to waste time re-inventing the wheel!

    BTW, I can't see any good excuse for using static arrays, other than perhaps real-time embedded systems where you can't afford the overhead of GC.

    --
    --Rob
  78. Grrr... by Rob_D_Clark · · Score: 2

    My point is exactly that. If you have bad program design, GC is necessary, but well thought out design makes it unnecessary.

    Well, you could probably do without GC, but the question is whether you can do it better without GC.

    Without GC, there are situations where you either need to (a) implement reference counting (which is a form of GC) in order to know when to free some data, or (b) make duplicate copies of objects so each object can be freed by the code that receives a reference to the object when it is done with it. I am really not convinced that either approach is better than just using GC in the first place. (FWIW, the example I am thinking of is a multithreaded application that communicates between threads via message passing, where you would like to be able to have multicast messages. The objects passed are treated as read-only, so side effects are not an issue. I am sure that given a couple minutes I could come up with a better example.)

    --
    --Rob
  79. Grrr... by Communa · · Score: 1

    If you have bad program design, GC is necessary, but well thought out design makes it unnecessary. In your example, all you need is some sort of semaphore between the threads, or maintain your own refrence counts internally, or some such.


    This is garbage collection! It's just GC done on a per-program level, rather than a systemwide level. I think you just defeated your own argument...


    But reference counting is a poor form of garbage collection; it's slow and can't reclaim cyclical structures. Most modern garbage collectors are based on mark and sweep somewhere along the line. (I'll go so far as to claim that copying is a modification of mark and sweep, with the two operations combined into one copy.)


    it is also a crutch bad programmers use far too often to cover their own lazyness, and for really complex programs, doesn't solve the problem reliably!


    Actually, if a GC algorithm isn't reliable, it's useless. A hell of a lot of care and attention is paid to proving GC algorithms correct. It's permissible for GC to err on the side of caution (we won't reclaim that block because it might still be in use), and a fair few GC methods do this (usually incremental and conservative algorithms) but reclaiming blocks in use is a complete no-no. And whilst I appreciate your points about copying, for one thing it's usually unnecessary in modern systems - tracing works just as well, if not better, and even Baker's copying method can be done without copying - and for another thing, in conservative systems you must trace, as copying would have undesirable side effects on those things which were not pointers...


    In any case, you say `lazy' as if it were an insult. I would have thought that the good programmers were the lazy ones, the ones who aren't going to expend any unnecessary effort or have their programs do the same. Why build storage management more than once? If the OS or the language runtime has a totally reliable (and remember, it *must* be, or it's useless) GC system, why storage manage at all?


    In any case, there's no harm in having a `free' call that you can call to put something on the collectable list straightaway - but when it becomes difficult to free something in an orderly fashion, why not let the infrastructure take the strain? It's what it's there for.

    --

  80. Garbage Collection considered harmful? by loki7 · · Score: 1

    Sure, and I've never seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching label/goto pairs. But all you structured programming weenies insist on using while loops and for loops and switch statements.

    Garbage collection isn't necessary, but it makes the programmer's life much easier and simplifies your code by moving the memory management responsibilities out of the program and into the language runtime library (or VM, or whatever).

    peter

  81. Commercial garbage collection for C and C++ by Nick+Barnes · · Score: 1

    I work for Geodesic Systems. We sell a drop-in memory management product called "Great Circle" which includes garbage collection for C and C++ and has a web interface.

  82. Grrr... by Endymion · · Score: 1

    Garbage collection for those who find it to lazy or dificult to call free()...

    I have NEVER seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching malloc()/free() pairs.

    (Yes, I know in LISP and such GC is a must, but those languages don't even have the same concept of variables as a normal imperitive language such as C)

    -The Code Nazi
    (now you see why some friends of mine
    gave me that nickname...)

    --
    Ce n'est pas une signature automatique.
  83. Grrr... by Endymion · · Score: 1
    Obviously you have never written a real program spanning several million lines of code with subroutines written by someone other then you.

    That is rather assuming... and self contradictory.
    [snip - some example of program design]

    My point is exactly that. If you have bad program design, GC is necessary, but well thought out design makes it unnecessary. In your example, all you need is some sort of semaphore between the threads, or maintain your own refrence counts internally, or some such.

    If the library blindly mallocs such structures, and gives the user a pointer to them, I have a few things to say about the library as well. I guess that could be ok, if the library never touched it again, and there was an equivilent destruct funtion in the library (it becomes another malloc/free pair). If the library returns a pointer, and continues to modify it, you not only have a GC problem, you will be clobbering data!

    I guess the point is, that while GC can be a time-saver (in having to not re-design sections of badly written code), it is also a crutch bad programmers use far too often to cover their own lazyness, and for really complex programs, doesn't solve the problem reliably! (I really hate it when structures and code change behind my back (hence my hate for C++) )

    -The Code Nazi

    --
    Ce n'est pas une signature automatique.
  84. "No Overhead Rule" by jetson123 · · Score: 1
    Your comments suggest that you think that GC somehow imposes overheads that you otherwise don't pay in C++. But that's not true.

    Memory allocation and deallocation in C++ imposes runtime costs qualitatively no different from costs that you incur if you allocate memory in the presence of a GC. Quantitatively, GC is actually often faster.

    The main reason C++ doesn't have GC is because C++ has C pointer semantics and because many C++ programmers habitually make a lot of unportable assumptions about the runtime and memory layout. That makes adding a GC to C++ tricky and would break lots of programs, programs that are not legal C++ programs but that still work.

    The best compromise for GC in C++ is to use a conservative GC (like Boehm's GC). Boehm's GC may break some valid programs, but on the whole it lives fairly well within the framework of C and C++ and does not change C++ semantics too much. But in most languages other than C++, one can do considerably better with GC.

  85. true no leaks with GC, but it has opposite problem by jetson123 · · Score: 1
    Holding on to too much memory is not a practical problem in languages with GC. If you are really concerned about it, you can use many of the standard techniques from C/C++ for dealing with it and you still retain the advantages of GC (including the runtime safety).

    Java's Hello World does not take 80M of memory. In fact, Java runs in pretty small footprints and on the whole is probably smaller than current C/C++ runtimes. The only reason you don't see the C/C++ runtimes is because they are usually loaded at boot time these days and you consider them part of the OS.

  86. comparing apples and oranges by jetson123 · · Score: 1
    Comparing actual memory usage of Java and C/C++ is rather tricky because memory is handled so differently. For example, in C/C++, the GUI libraries are "shared" and not necessarily counted against memory usage of the process, but they still take up space. The Java GUI libraries, however, are stored on the Java runtime's heap.

    Because Java isolates different software components from one another, you can run different applications and services within the same runtime, and that's really the right comparison to make (at least in the long run when more and more applications run in Java).

    That may seem like a distant hypothetical future, but it actually already has a lot of practical significance. For example, when writing servlets in Java, the footprint of each additional servlet is tiny, yet servlets are isolated from one another as well as if they were running in separate processes. In contrast, CGI scripts and even shared library implementations of web services have enormous overhead.

  87. Garbage collection leads to better programs. by Maciej+Stachowiak · · Score: 1

    Actually, last I checked, Perl's memory management was jut simple refcounting, not even smart enough to avoid leaking circular garbage.

  88. Grrr... by BillWhite · · Score: 1

    So, it's all free() and malloc(), but garbage collection lets you defer the decision to do free().

    I once worked on a compiler which did garbage collection this way. When you compile a procedure, the signature is persistent, but the body isn't. That is to say, the data structure used to represent the signature will remain forever, but the data structures used to represent the bodies are not useful after you have finished compiling the body. So, you allocate space for the persistent stuff in permanent storage, and space for the body in some other, ephemeral blocks of storage. When you are finished with the procedure, you can just collect all the ephemeral stuff and throw it out, without thinking about it. This is much faster than using reference counts, since you don't pay a penalty for each pointer assignment. Since the language was Ada, what was persistent in one context was ephemeral in another. C or C++ wouldn't have the same problem. I thought it was a neat idea, and wish that it was *my* idea.

  89. A wonderful book by kaet · · Score: 1

    This is a wonderful book. I've owned it for some months and read it end to end in one sitting. I'm not kidding. This seems to be the first `proper book' on the subject, something I've been looking for for about five years, and I've had to make do with terse survey papers. Buy it. :)

  90. Leaks and segfaults are both bugs that need fixing by cpeterso · · Score: 1

    Which is easier to find? The segfault, of course. When developing/testing your application, a memory is very likely to slip through into the shipped product. Any way to set "tripwires" (like assertions) will help you shake out bugs before shipping.

  91. Try alloca() in C/C++ by cpeterso · · Score: 1

    alloca() is like malloc(), except it allocates memory on top of your stack (not the heap). Allocation is fast, O(1), because alloca() simply pushes the stack pointer forward. You don't need to free() alloca blocks. When your function returns, the stack disappears! But you need to be careful not to use alloca() for any memory blocks that need to be retained after the function returns (like nodes in a global data structure).

  92. Good book! by Eric+Kidd · · Score: 1

    This is a really good book. It explains garbage collection in lots of detail and covers various implementation tradeoffs.

    If you're not willing to read several hundred academic papers on this subject--and sort them all out--this book is definitely the best starting place.

  93. Other languages by Eric+Kidd · · Score: 1

    Wow. Somebody's posted an advertisement for our project. Since high praise always makes me nervous, let me try to deflect some of that enthusiasm in various directions. :-)

    You also want the check out GNU Common Lisp, GNU Guile, Eiffel, ML, Prolog, Haskell and the various R4RS scheme interpreters.

    There's a lot of free (and GPL'd) programming languages out there, and some of them are pretty fast and impressively powerful.

  94. Gwydion Dylan (Hans Boehm garbage collector) by Eric+Kidd · · Score: 1

    Sure... but if you learned Dylan well enough to implement a Python compiler in Dylan, then you might find yourself losing interest in Python in favor of Dylan. :-)

    That's not true! I help maintain a Dylan compiler, and I still like Python!

  95. Gwydion Dylan (Hans Boehm garbage collector) by Eric+Kidd · · Score: 1

    Well, my favorite language is Python, but I just wondered, is there any documentation about Dylan-to-C compiler ? The internals manuals aren't very verbose :-).

    Oops. We're still filling those out as we find our way around the code.

  96. Grrr... by Mr+T · · Score: 1

    You sound like you've never worked on anything other than a one man project. With large, dynamic, multithreaded applications it is very easy to lose track of when a piece of memory needs to be returned. It happens in just about every application but it's usually small enough that you don't notice it. I've got a masters in software engineering and I can tell you that the best programming shops in the world (CMM level 4 type places) end up with code that leaks, being careful is not good enough because nobody is good enough to be that careful on large projects. Good modern garbage collection (difficult to do with C or C++) offers a lot of advantages. Generational Copy Propgation collection is fast (much faster than reference counting), it compacts memory so the OS can reclaim pages that aren't in use (tell me how to do this with out garbage collection, you can free all your mallocs but you can't control which page the memory lands on. If all you have is two bytes on two different pages and you free everything else, you're still consuming a lot of memory...) it speeds up memory allocation which can have a dramatic impact on performance in some problems spaces, and it can enhance cache performance when accessing memory in some circumstances. Not to mention it fixes memory leaks.. The advantages far out weigh the disadvantages (a small performance hit, but you'll have a much worse one if you count refs) and it has nothing to do with design. If design is what you really care about, GC does you a huge favor because it allows you to actually focus on the design and solving the problem at hand. The anti-GC crowd are really old school thinkers who didn't understand the problems in the first place, GC is the future, even realtime systems are using it now.

    --
    This is my signature. There are many signatures like it but this one is mine..
  97. GC and Timing by Xenu · · Score: 1

    It's been a while since I've used a language with built-in GC. The one thing that I never found to be acceptable was that the program/system froze for anywhere from 100s of milliseconds to several seconds when the GC decided to clean up. Has this problem been fixed in modern implementations of GC?

  98. Reference Counting vs. Mark & Sweep by Llamedos · · Score: 1

    I've never heard that 40% figure. Java GC implementations have historically not been state-of-the-art, though. The fact is that a good GC is faster than manual deallocation.

    Hmm.. That sounds odd... I imagine it's dependant on the type of program that's running... One with lots of idle time should be better off with GC, one with less should be better with manual deallocation.

  99. Missing distributed GC treatment by the+ed+menace · · Score: 1

    Good book but should have covered distributed GC. These days one of the "big things" for objects is using them to build distributed systems.

    A good paper on distributed GC came from one of the INRIA project OS's called COOL, but there have been several research papers on the topic.

  100. Missing distributed GC treatment by lbr · · Score: 1

    No mention of distributed GC in the book?

    Then they sure picked a funny title for section 12...