The New Garbage Man
We've all heard of "garbage collection" in reference to cleaning up after yourself after a memory allocation. Two graduate students at the Illinois Institute of Technology are proposing a new method of garbage collection and memory allocation called DMM. What they are trying to do is "map basic software algorithms such as malloc(), free(), and realloc() into hardware." Sounds as if it has some promise, but can it ever be as flexible as some of the memory managers used today?
If they were to do this, it would probably increase performance. But marginally. At least in linux programs, when I've done profiles, most of the cycles were taken up in console i/o. Printf, etc.
It would also require a modification to the kernel - so that the brk() system call would simply call the hardware variants.
I think that it probably *would* remove flexibility, but at the same time I think that's a good thing. Part of the problem with memory leaks and stuff is the flexibility...
Then again, I might just be talking out of a bodily orifice.
If you can't figure out how to mail me, don't.
For linux tips: http://www.linuxtipsblog.com
Week files that can't survive a crash should be abandoned and deleted. you heard it here first!
What's the point of trying to speed up memory allocations? Why not just make the hardware faster?
I'm not giving everyone the right to go out and write inefficient code. It just seems that this research will not get very far. By the time they have something working, memory will be faster than their hardware solution. Sure, just use their hardware to make things even faster. Only if they get the cost down very low.
Barjne looked at some specialized machines. They just weren't cost justified. I would have to think that is the case here.
I'd rather see research into fabricating memory that ran at core processor speeds. That would speed up every memory access, not just the malloc()s and free()s.
--
then it comes to be that the soothing light at the end of your tunnel is just a freight train coming your way
After working on a project that was rewriting a heap allocator (I just wrote the test program, but I did read up on memory allocators quite a bit), I realized exactly how complicated memory allocators can be and more importantly, how little the community knows about exactly what is the optimal way to do memory allocation so that both speed and fragmentation are mimimized. Given this uncertanty it is probably best to keep the allocator in software so that it can be modified.
The whole idea behind RISC is that the hardware provide FAST primitives that the upper levels can use to create more complex structures. Putting malloc free etc into hardware would just make things more complicated and maybe faster in the VERY short run but would surely not be as good as just making a DAMN fast memory subsytem or an awesome processor.
Thoughts on tech, Software Engineering, and stuff
JWZ's thoughts on garbage collections.
How is this different from what Lisp machines (Symbolics and LMI) did 15-20 years ago?
FrontPage's lack of proper software garbage collection is the perfect environment to test hardware garbage collection in.
Wouldn't hardware dedicated to collecting software garbage tend to delete FrontPage entirely?
People complain about FrontPage and grammar? That's hardly a valid reason. Having met two of these individuals (Chang and Srisa-an), I can say that I do not believe either one to be a native English speaker. Both are, however, very knowledgable about computer architecture and hardware design.
I'm not saying that this is a good/bad idea, or that it will/won't improve performance. I don't know. I do know that the two men I've met are two of the very few people in IIT's CS department who might come up with something usable, regardless of their lack of English and HTML skills.
The English I saw browsing the pages was really not bad, and even if the page was done in FrontPage it's still better than the main IIT site (take a look if you don't believe me). Many successful open-source software projects involve people who do not use flawless English, but that does not affect the quality of the work.
They are going to insert malloc/free into hardware, that doesn't mean they will build garbage collection into the hardware (which is a completely different thing). However, as the project description says, it could improve garbage collection performance, but that is another issue.
DMM (Dynamic Memory Management) is a well known term, not something new as the slashdot article tries to make it sound like.
Building DMM into hardware it not a new idea. However, I'm not really sure if it has been successfully implemented in any "real" CPUs.
What do you think Pascal, Ada, and Java call when they want memory? Malloc. It's what anything calls when it wants memory, ultimately.
But they all do it differently, and even the same languages in different compilers allocate memory differently, this is what I'm talking about. If you impliment C style malloc(), free(), etc in hardware, then only C languages will benifit because other languages do it too differently for it to be useful to them.
-- iCEBaLM
People would love to write programs for such systems using a high-level language like Java or C++. The main issue is that real-time systems must be able to deterministically predict how long any operation will take. Problem is, with current implementations of runtimes, the garbage collector could kick in at any time and take some arbitrary amount of time to finish, totally messing things up.
Lets say you've got a software-controlled robot arm, putting together a car or something (cheesy example). A start command is executed, and the arm begins moving. Then, right before the stop command was supposed to be executed, in comes your garbage collector. It does its thing and finishes, and the stop command executes - 50 milliseconds too late, putting the arm out of position. Oops - I hope you weren't planning on selling that car.
But if you can deterministically predict when and how long all of the operations will take, voila, you can just take them into account when scheduling what happens and everything is all hunky-dory. The hardware part of it is a bit of a distraction I think - the deterministic part is what's cool.
On the flip side, Windows should release the memory of a process after termination. Memory leaks should not persist past the termination of a process.
And your god, BG, should take that advice too, I've yet to see ANY program except for netscape with more memory leaks than Windows.
If you can't figure out how to mail me, don't.
For linux tips: http://www.linuxtipsblog.com
Sigh. Have you ever written a garbage collector? Think about it- the people who cared enough about garbage collection to implement it in Lisp, Java, Python, etc. did. I've written one too, and I promise, you MUST have an intimate knowledge of memory to get one working. It's not incompetent programmers who write these things.
So why did those non-incompetent programmers who know all about managing memory on their own bother with writing these tools for the weak-minded? Mainly, it's that good programmers understand the principles of program design. One important design principle is that when you want to know what your program is doing at the top level, you shouldn't have to care what is really going on at the lower levels. That principle is the reason we have operating systems and compilers, for example. All of those things allow the programmer not to have to worry about what goes on under the hood, not because the programmer is stupid, but because the programmer's time is better spent solving the problem than dealing over and over again with low-level details that have nothing to do with the program at hand as opposed to any other program in the world.
Garbage-collected languages do the same thing- you trade a little speed (the idea that garbage collectors must be slower than hand allocation and deallocation actually turns out not to be true, incidentally, but in practice it often is) for the ability not to have to think about a ridiculously machine-level idea (id est, 'in which cell of memory am I going to write down this particular chunk of data?') so your code can just concisely reflect what your program actually does- quicker to write, easier to read, better all the way around. A smidgen slower, but who cares? You can certainly write programs of acceptable speed in garbage-collected languages, so it's no big deal unless you're writing something really speed critical, like tight graphics rendering loops or something, in which case us softheaded garbage-collection-needing incompetent programmers just use inline assembly. =)
-jacob
None of those machines have caught on. The reason is that generally, you seem to do better if you invest the same kinds of resources into just making your processor faster.
Simply implementing existing malloc/free algorithms in hardware is unlikely to help. Those are complex algorithms with complex data structures, and they are probably memory limited. Where hardware could help is by changing the representation of pointers themselves: adding tag bits that are invisible to other instructions (allowing pointers and non-pointers to be distinguished, as well as space for mark bits), and possibly representing pointers as base+offset or some other kind of descriptor that makes it easier for the allocator to move memory around without the C/C++ program knowing about. As a side benefit, this kind of approach could also make C/C++ programs much safer and help run languages like Java faster.
But altogether, after having seen several attempts at this, I'm not hopeful. Unless Intel or Motorola do this in a mainstream, mass market processor, it will be very expensive and not perform as well. If you want those features, you end up getting better performance at a lower price by simply buying a mainstream processor without the features and simulating what you need.
If they're planning on doing hardware-level memory management, they're probably going to be improving the control algorithms for the TAG and/or CAM memories.
Your basic cache/main-memory hit, miss, refresh, and stale address mapping functions, but with a more flexible and comprehensive interface.
--The more you know, the less you know.
Argh! Moving stuff to hardware, when we have long struggled to get rid of esoteric hardware instructions?!? The whole idea behind RISC is to remove this kind of stupidness.
History is full of examples where this kind of hardware "features" made things faster originally, but has since become bottlenecks instead of improvements. "rep movsb" made sense on the 8086, but on a Pentium Pro it's just another thing which makes the chip more complicated, big, and power-comsuming.
And as for garbage collection, those of you who say that "GC is slow" are just plain wrong. There is a lot of research on garbage collection, and nowadays a good GC may be faster than manual malloc/freeing. It does not lock up your programs either - there are incremental GC's which you won't notice. They are even used in real-time environments today.
A major hindrance for good GC's in commonplace programs is that people usually write in C(++), and you can't have a good GC in a programming language which allows pointers. Just to give an example: I once saw a program which used a doubly-linked list. But instead of storing "next" and "prev" pointers, it only used one value - the XOR of prev and next! It was possible to traverse, since you always knew the address of either the previous or the next entry, so getting the other was just an XOR operation. But please tell me, how the h* is a GC algorithm supposed to know this? Even if you use the loosest approach, "treat every single word in memory as a possible pointer", it will fail - since there are no real pointers at all!
So the result is this: There are lots of good GC's around, but they can't be used in C - so people don't know that they exist. More or less.
He mentioned he was porting to windows. In my experience, the windows kernel is much worse at handling things like that than the unix kernel is. Sometimes you have to reboot to free the memory.
If you can't figure out how to mail me, don't.
For linux tips: http://www.linuxtipsblog.com
Maybe I gave the wrong impresion. I didn't mean to imply that writing a garbage collector was a gigantic time-intensive effort and that garbage collectors were huge beasts- the garbage collector I wrote (for a Scheme-like toy language, but it's not important) wasn't much bigger than that. I just said that you have to understand how memory works to be able to write one (if you'll recall, the original poster said that garbage collection was for people who don't know how memory works). The program fragment you posted certainly takes more memory smarts than it takes to free() your malloc()'s, though it might not take more typing.
-jacob
GC for procedural code is optional. A good idea, IMHO- but still optional. If worse comes to worse, you can always "bubble up" responsibility for deallocating the memory to the block in which encompasses the entire lifetime of the memory. For example, if you had:
/* use mem_p */
/* does mem_p need to be allocated? */
/* use mem_p */
/* does mem_p need to be freed? */
...
...
/* we don't need to free mem_p here */
...
void * mem_p;
void foo(void) {
mem_p = malloc(somesize);
}
void bar(void) {
}
void bang(void) {
foo();
bar();
}
We can refactor this to:
void foo (void * mem_p) {
}
void bar (void * mem_p) {
}
void bang (void) {
void * mem_p = malloc(somesize);
foo(mem_p);
bar(mem_p);
}
We can do this because the life time of the body of bang() completely encompasses the life time of mem_p- which neither foo() nor bar() does.
Unfortunately, OO programming makes GC signifigantly less optional. And exceptions make GC no longer an option. Consider the following C++ code:
{
someclass * ptr1 = new someclass();
someclass * ptr2 = new someclass();
}
Can you spot the bug? If the program runs out of memory trying to allocate the second class, new throws an exception. But since the exception isn't immediatly caught, the stack is simply poped, and the memory pointed to by ptr1 is leaked.
The solution, as any C++ programmer will tell you, is to use smart classes and smart pointers, which implement GC. I.e., if the language doesn't have GC, you have to add it.
There are other reasons to use GC- by using GC you can more effectively divorce the implementation of a class from it's interface. In a GC system you don't have to export information about the memory management aspects of the implementation. This is especially important if you have two different classes implementing the same interface, but with different memory management requirements.
Which is faster, GC or manual allocation, often depends upon both the program in question, and the GC algorithm implemented. There is some data indicating that copying garbage collection is the fastest in complex situations- what you lose copying the data you win back in better cache and virtual memory utilization.
One of the biggest problems with garbage collection is that it can't really be controlled. If you're writing Quake3, you do not want garbage collection. You don't want the program to start cleaning up memory right when you're trying to generate the next frame...
BUT, if they do this and make it so you can, say, force the garbage collection to do its thing at a given time, maybe that could be do-able... still pretty crappy for high-speed apps though that need consistent speed...
Esperandi
-
From the site (DMM):
>The memory intensive nature of object-oriented languages such as C++ and Java
From a reply:
>why not make the hardware faster?
Fellow programmers,
am I the only one who still remembers Assembler and the intensive search for memory preserving methods ?
Am I the only one who tries to make things fast, without thinking about the processor ?
In the 70s we struggled for every byte we could spare (yes, and we created the Y2K problem).
"Star programmers" like me even changed the code during runtime, by moving new instruction over old ones.
Yes, it was very hard to read, but it was top efficient.
Fellow programmers,
are you all too young or what happened to the common sense ?
If I have to solve a problem for my daily life with my machines, I FIRST check if a SHELL SCRIPT can do it.
Not, because I'm too lazy to use C, but because it might be faster.
If you run The Shell, there are inbuild commands and external commands, some having the very same name.
Ie: A "test" runs faster than a "command test".
BUT, YOUR:
case "$1" in
hello) COMMAND
esac
RUNS *** NOT *** FASTER
If "test" is inbuild in your shell (and it is, folks) and you write:
if test "$1" = hello
then
COMMAND
fi
Am I the only one who knows that ?
Java needs a lot of memory.
It's hip.
But what is Java REALLY ?
Nothing than an interpreter language that just happens to have support by MS and Netscape (and due to this, now in our Linux kernels).
In the 80s we used (and I still do) REXX.
It's also an interpreter language, can also be compiled to object code.
It runs on MVS, VM, Unix, DOS, MS Windows.....
At that time, there were just no browsers (besides Mosaic).
It can do everything Java can,
the only reason why you guys and girls use Java is that Big Companies ship their interpreters with their browsers
and it looks like, as if Java runs on it's own, like magic.
The only thing that runs on it's own is Assembler.
Now I don't want to say you should use Assembler,
but I think I need to remind us all,
that hardware becomes faster and faster,
and that because of this, programmers get lazy and code stuff that runs on state-of-the-art hardware,
but would run 4 times as fast, if these programmers would first think about RESOURCES.
A good program will run on a 486 in a reasonable speed and on a Piii like "real time".
I want programmers to THINK about how they code.
It IS possible to write applications that do not need extensive memory and a fast CPU (or two),
IF the programmer would first THINK about how to write something and optimize the code himself, not only with a standard optimizer (if he uses that at all).
Read "The Unix Programming Environment" by
Brian W. Kernighan and Rob Pike.
After that, your programs will run 4 times faster.
Replies greatly appreciated,
fine day y'all, george./
Somewhat off-topic. I like to set pointers to NULL after their objects have been freed or are no longer valid. This helps prevent the nasty situation of code referring to an object that is now a chunk of memory on the free list, or worse, allocated to some other use.
Mea navis aericumbens anguillis abundat
There seems to be a number of misconceptions as to what this is all about or how might it be useful in practice. I will offer some opinions on some of the issues raised in other posts.
The first one, as already mentioned by a number of people, is that hardware implementation of malloc and free has nothing to do with GC. The most difficult part of GC is to determine which part of the memory is garbage (this is not as easy as it may seem) without using lots of memory (after all, garbage collectors are typically only activated when free memory blocks are running low), and for those garbage collectors running as an independent thread, avoid possible race conditions with the foreground processes. Other issues a garbage collector faces include how to reduce the impact on overall system performance, and how to decrease memory fragmentation.
A garbage collector is not something easy to design and implement. Making a good garbage collector especially requires the almost-impossible combination of profound understanding in the memory usage behavior of typical programs, logical reasoning abilities, and coding techniques. In addition to the garbage collector itself, you also need support from the language and compiler side, and you have to integrate all of these into a clean design. That is about as hard as things like this can be.
(Of course, you can also write a conservative pointer finding GC like the libgc used by many functional language compilers --- but that is far from the state of art in this business.)
The proposed hardware support has nothing to do with the points we mentioned above, therefore has nothing to do with GC. Then, is it possible to build some hardware support for garbage collection? Maybe, but I am not an expert in this field. Whatever the solution turns out to be, it will never be as simple as hardware implementation of malloc and free.
Second, this also has nothing to do with real time systems. Many people seems to think that being "real time" means you have to code everything in assembly language and make things run as fast as possible, but that is simply not true. Being (hard) real time means operations must respond and complete within bounded time; as long as that bound is satisfied, there is no need for the task be completed "very fast". The trick of building real time systems is in making sure that nothing will delay the operation for a long time, and that requires careful analysis and implementation.
If you remain to be convinced, think of a typical hard real time application: CD-R burning control programs. They must feed a continuous stream of data (say, 600KB/s for 4X burning) into the burner, or the disk will turn into a coaster. Is it necessary to code it in assembly? Absolutely not, because pumping data at 600KB/s is very little work for current architectures with 600MHz processors and 100MHz memory busses. However, does that mean you do not need to pay special attention to make it real time? Wrong, because although the hardware is several orders of magnitudes more powerful than it needs to be, there are still possibilities that something will make the program stop working for a second or two and screw the party. It is the responsibility of real time system designers to make sure that it does not happen.
If you always chase the list, no problem. If you ever need to point into the list from outside, forget it.
--
Infuriate left and right
You need a pointer to the first entry in the list. Assuming that the last "next"-pointer and the first "prev"-pointer is NULL, it's easy to go from there. Sample:
p = first;
prev = NULL;
while (p) {
(do stuff with *p)
tmp = p->xoredstuff ^ prev;
prev = p;
p = tmp;
}
Traversing in the other direction is left as an exercise for the reader.
Well, more or less. As part of a programming languages class I took in school, I actually had to write a garbage collector for a Scheme-like language. While one could implement references in a Java/Scheme/LISP-like language in the above fashion, that's really slow; it adds another layer of indirection that isn't really necessary. If the run-time system provides information about the types of objects in memory (which is required for Java and Scheme anyway -- instanceof and symbol?), then the GC can reach in and directly change the pointer values within the program's stack or whatever.
From the point of view of the applications programmer, though, the two implementations are equivalent.
As far as I'm concerned, the primary difference between pointers and references is that with a reference to an object of type Foo, you are guaranteed that the object referred to is a valid instance of type Foo. This is not the case with pointers. In C/C++ terms, int i; float *pF; pF = (float *)(&i); The object at *pF is not a valid instance of a float.
(Yes, you can in fact break this guarantee in C++ using a series of really crufty typecasts. C++'s references are broken, in a number of respects.)
Of course, as MassacrE points out, references are almost always implemented internally in terms of pointers, because that's what the underlying architecture tends to support.
It's a misperception that garbage collection is about debugging, though- the point is that you can write programs that aren't convoluted through the need to allocate and deallocate memory cells for remembering new things. The C/C++ idiom of things like
void get_string(char * dummy_string)
{
}
makes sense only because you've seen it a million times, and because you're thinking about the way that memory works. The much more intuitive Java style- that is,
String getString()
{
return new String(
}
reflects your intent much more naturally, but is only really possible because of garbage collection. (I know you can allocate and then return in C/C++, but I also know that it can turn out to be a huge mess to figure out exactly when to deallocate, and I no of nobody who writes anything that isn't trivial by returning a newly-allocated pointer).
The point is simply that if you want to be able to design and implement programs correctly and rapidly, a facility that lets you specify only "what my program does" without bothering you with details. One aspect of speed is how fast your program will run, and that's certainly important. Another aspect of speed is how quickly you can write a correct program, and high-level abstractions make application development far quicker. And for most of your code, how rapidly it executes isn't really a big deal- the idea around Rice University, where I go to school, is that you should write your programs in as high-level a language as possible, figure out where it's necessary to make the code faster, and then rewrite those pieces in C or assembly or whatever (that idea is based on the premise that in typical programs, a very small subset of the code takes up the vast majority of the execution time). You get speed where you need it and flexibility (ease of maintenance, reusability, etc) everywhere else- the best of both worlds, and also you wrote your program in half the time it took the other guy to write the C++ version.
-jacob
Guilty as charged- I am "in academic circles," and I haven't ever had to work in an industry setting on really big projects where speed was very important. But I maintain that what I said was still accurate, while conceding your point: it may be true that with current tools, programmers do sometimes need to think about machine-level details (like whether their code is likely to cause page faults) in their high level designs, but it's also still true that they shouldn't have to. I have mentioned elsewhere in this thread the idea (popular among the faculty here at Rice) that you ought to first write your program in a very high-level language (at Rice, it's MzScheme), figure out where it's important for your code to go fast, and then rewrite those pieces in a low-level language where you can control machine-level details. The claim is that you get a speedy enough program, because you've optimized the hell out of the critical sections, and the stuff that it doesn't matter whether you optimize or not, which is most of your code, was written much more rapidly than would be possible with a lower-level language.
-jacob
I should've been more clear- when I talk about the programming-time advantage of higher-level languages vs. lower-level languages, I'm not thinking of Java, I'm thinking of Scheme. Java is syntactically enough like C++ that it seems to me the only thing you're saving yourself from having to do is explicitly deallocate memory (and dealing with C++'s brain-dead unit system- making programmers do the things they have to do with #includes and .h/.cc files ought to be against the law), so it doesn't surprise me that in real-life situations the benefits of Java would be outweighed by the speed of C++. There are other high-level programming languages, though, like Scheme, that have saved me tremendous amounts of time over C++ or especially C in the average case.
-jacob
This article is by now so old that no one will read this comment, but what the hell. Karma whore, etc.
Some of the articles that have been posted seem to miss the point. Several people suggested that this design goes against the principles of RISC. I am puzzled. The RISC philosophy is about maximizing efficiency by reducing CPU complexity. But this is memory management research, i.e. it proposes a new MMU design, not a new CPU. It is like suggesting that 3D accelerator boards are contrary to the principle of RISC design because they involve complex hardware. There's no contradiction in having a simplified CPU and complex off-chip hardware to back it up.
Others have suggested that there's no point to this work because a hardware implementation of malloc() and free() would run only marginally faster than their software counterparts. I suggest reading some of the publications on their Web site, particularly their Introduction to DMMX. They aren't merely trying to implement malloc() and free() in hardware, and the solution they describe would allocate and sweep the heap in constant time. If the scenario described in this paper is feasible, it could be pretty interesting stuff.