The New Garbage Man
We've all heard of "garbage collection" in reference to cleaning up after yourself after a memory allocation. Two graduate students at the Illinois Institute of Technology are proposing a new method of garbage collection and memory allocation called DMM. What they are trying to do is "map basic software algorithms such as malloc(), free(), and realloc() into hardware." Sounds as if it has some promise, but can it ever be as flexible as some of the memory managers used today?
The whole idea behind RISC is that the hardware provide FAST primitives that the upper levels can use to create more complex structures. Putting malloc free etc into hardware would just make things more complicated and maybe faster in the VERY short run but would surely not be as good as just making a DAMN fast memory subsytem or an awesome processor.
Thoughts on tech, Software Engineering, and stuff
They are going to insert malloc/free into hardware, that doesn't mean they will build garbage collection into the hardware (which is a completely different thing). However, as the project description says, it could improve garbage collection performance, but that is another issue.
DMM (Dynamic Memory Management) is a well known term, not something new as the slashdot article tries to make it sound like.
Building DMM into hardware it not a new idea. However, I'm not really sure if it has been successfully implemented in any "real" CPUs.
People would love to write programs for such systems using a high-level language like Java or C++. The main issue is that real-time systems must be able to deterministically predict how long any operation will take. Problem is, with current implementations of runtimes, the garbage collector could kick in at any time and take some arbitrary amount of time to finish, totally messing things up.
Lets say you've got a software-controlled robot arm, putting together a car or something (cheesy example). A start command is executed, and the arm begins moving. Then, right before the stop command was supposed to be executed, in comes your garbage collector. It does its thing and finishes, and the stop command executes - 50 milliseconds too late, putting the arm out of position. Oops - I hope you weren't planning on selling that car.
But if you can deterministically predict when and how long all of the operations will take, voila, you can just take them into account when scheduling what happens and everything is all hunky-dory. The hardware part of it is a bit of a distraction I think - the deterministic part is what's cool.
Sigh. Have you ever written a garbage collector? Think about it- the people who cared enough about garbage collection to implement it in Lisp, Java, Python, etc. did. I've written one too, and I promise, you MUST have an intimate knowledge of memory to get one working. It's not incompetent programmers who write these things.
So why did those non-incompetent programmers who know all about managing memory on their own bother with writing these tools for the weak-minded? Mainly, it's that good programmers understand the principles of program design. One important design principle is that when you want to know what your program is doing at the top level, you shouldn't have to care what is really going on at the lower levels. That principle is the reason we have operating systems and compilers, for example. All of those things allow the programmer not to have to worry about what goes on under the hood, not because the programmer is stupid, but because the programmer's time is better spent solving the problem than dealing over and over again with low-level details that have nothing to do with the program at hand as opposed to any other program in the world.
Garbage-collected languages do the same thing- you trade a little speed (the idea that garbage collectors must be slower than hand allocation and deallocation actually turns out not to be true, incidentally, but in practice it often is) for the ability not to have to think about a ridiculously machine-level idea (id est, 'in which cell of memory am I going to write down this particular chunk of data?') so your code can just concisely reflect what your program actually does- quicker to write, easier to read, better all the way around. A smidgen slower, but who cares? You can certainly write programs of acceptable speed in garbage-collected languages, so it's no big deal unless you're writing something really speed critical, like tight graphics rendering loops or something, in which case us softheaded garbage-collection-needing incompetent programmers just use inline assembly. =)
-jacob
None of those machines have caught on. The reason is that generally, you seem to do better if you invest the same kinds of resources into just making your processor faster.
Simply implementing existing malloc/free algorithms in hardware is unlikely to help. Those are complex algorithms with complex data structures, and they are probably memory limited. Where hardware could help is by changing the representation of pointers themselves: adding tag bits that are invisible to other instructions (allowing pointers and non-pointers to be distinguished, as well as space for mark bits), and possibly representing pointers as base+offset or some other kind of descriptor that makes it easier for the allocator to move memory around without the C/C++ program knowing about. As a side benefit, this kind of approach could also make C/C++ programs much safer and help run languages like Java faster.
But altogether, after having seen several attempts at this, I'm not hopeful. Unless Intel or Motorola do this in a mainstream, mass market processor, it will be very expensive and not perform as well. If you want those features, you end up getting better performance at a lower price by simply buying a mainstream processor without the features and simulating what you need.
Argh! Moving stuff to hardware, when we have long struggled to get rid of esoteric hardware instructions?!? The whole idea behind RISC is to remove this kind of stupidness.
History is full of examples where this kind of hardware "features" made things faster originally, but has since become bottlenecks instead of improvements. "rep movsb" made sense on the 8086, but on a Pentium Pro it's just another thing which makes the chip more complicated, big, and power-comsuming.
And as for garbage collection, those of you who say that "GC is slow" are just plain wrong. There is a lot of research on garbage collection, and nowadays a good GC may be faster than manual malloc/freeing. It does not lock up your programs either - there are incremental GC's which you won't notice. They are even used in real-time environments today.
A major hindrance for good GC's in commonplace programs is that people usually write in C(++), and you can't have a good GC in a programming language which allows pointers. Just to give an example: I once saw a program which used a doubly-linked list. But instead of storing "next" and "prev" pointers, it only used one value - the XOR of prev and next! It was possible to traverse, since you always knew the address of either the previous or the next entry, so getting the other was just an XOR operation. But please tell me, how the h* is a GC algorithm supposed to know this? Even if you use the loosest approach, "treat every single word in memory as a possible pointer", it will fail - since there are no real pointers at all!
So the result is this: There are lots of good GC's around, but they can't be used in C - so people don't know that they exist. More or less.
GC for procedural code is optional. A good idea, IMHO- but still optional. If worse comes to worse, you can always "bubble up" responsibility for deallocating the memory to the block in which encompasses the entire lifetime of the memory. For example, if you had:
/* use mem_p */
/* does mem_p need to be allocated? */
/* use mem_p */
/* does mem_p need to be freed? */
...
...
/* we don't need to free mem_p here */
...
void * mem_p;
void foo(void) {
mem_p = malloc(somesize);
}
void bar(void) {
}
void bang(void) {
foo();
bar();
}
We can refactor this to:
void foo (void * mem_p) {
}
void bar (void * mem_p) {
}
void bang (void) {
void * mem_p = malloc(somesize);
foo(mem_p);
bar(mem_p);
}
We can do this because the life time of the body of bang() completely encompasses the life time of mem_p- which neither foo() nor bar() does.
Unfortunately, OO programming makes GC signifigantly less optional. And exceptions make GC no longer an option. Consider the following C++ code:
{
someclass * ptr1 = new someclass();
someclass * ptr2 = new someclass();
}
Can you spot the bug? If the program runs out of memory trying to allocate the second class, new throws an exception. But since the exception isn't immediatly caught, the stack is simply poped, and the memory pointed to by ptr1 is leaked.
The solution, as any C++ programmer will tell you, is to use smart classes and smart pointers, which implement GC. I.e., if the language doesn't have GC, you have to add it.
There are other reasons to use GC- by using GC you can more effectively divorce the implementation of a class from it's interface. In a GC system you don't have to export information about the memory management aspects of the implementation. This is especially important if you have two different classes implementing the same interface, but with different memory management requirements.
Which is faster, GC or manual allocation, often depends upon both the program in question, and the GC algorithm implemented. There is some data indicating that copying garbage collection is the fastest in complex situations- what you lose copying the data you win back in better cache and virtual memory utilization.