Slashdot Mirror


C Coding Tip - Self-Manage Memory Alllocation

An anonymous reader inputs: "The C programming language defines two standard memory management functions: malloc() and free(). C programmers frequently use those functions to allocate buffers at run time to pass data between functions. In many situations, however, you cannot predetermine the actual sizes required for the buffers, which may cause several fundamental problems for constructing complex C programs. This article advocates a self-managing, abstract data buffer. It outlines a pseudo-C implementation of the abstract buffer and details the advantages of adopting this mechanism."

142 comments

  1. Um by be-fan · · Score: 3, Interesting

    *cough* garbage collection *cough*

    --
    A deep unwavering belief is a sure sign you're missing something...
    1. Re:Um by Anonymous Coward · · Score: 0, Insightful

      garbage collection isnt the endall amazing solution that you may think it is.

      garbage collection stops system response while it's cleaning up. At least it does this with Java.

      If you were running a system that required realtime response, like, lets say, an air traffic control tower, garbage collection could delay certain actions from performing in time and cause a disaster.

      c was designed for systems programming, and not having realtime assurance is A Bad Thing.

    2. Re:Um by Screaming+Lunatic · · Score: 4, Interesting
      Which means you don't have predictable destruction. Which means you don't have destructors. Which means you can't use idioms like resource-allocation-is-initialization.

      So in the presence of exceptions, you won't leak memory on the heap. But you will leak mutexes, file handles, etc. You need another idiom to handle those cases.

      In the .NET world, C# introduced synchronization blocks to handle the leaking mutex problem. But it is a pain in Managed C++ and VB.NET.

      Garbage collection is not the be all and end all.

      If I ruled the world, I would create a multi-paradigm (object-oriented, generic, functional, and modular support) strongly-typed low-level language that let you program at a high-level. A second high-level langauge that was loosely-typed, garbage collected, and could be interpreted or natively compiled. Then I would define a standard to interface the two languages.

      In other words, take C++ and add the concept of components/packages. Take Python and add the features (such as generics) that are missing from C++. And then define an interface between components written in both langauges.

      Currently Boost.Python and SWIG exist. But I wish that they would just work automagically, everytime I typed make at the command line or build in VC++.

    3. Re:Um by scrytch · · Score: 5, Insightful

      > garbage collection stops system response while it's cleaning up

      And malloc is of course free, right? ("well no wally, they're opposites" ... yeah yeah you get my drift)

      Good gc's operate incrementally. Good gc's let you turn gc on and off at will and disable it altogether for designated arenas. Good gc's can run in a separate thread on another CPU, whereas malloc/free cannot.

      The reason java's gc goes wiggy is not because the gc is bad (it's just not very tunable except on solaris), it's because it allocates new objects all over the place (and is happily helped at it by the standard libraries). If you go hog wild with resource consumption, yes you're going to pay for it later.

      For the 99.99% of programs that do NOT need hard realtime, you're better off with gc. Cripes, it's like saying homes shouldn't have thermostats because a home thermostat isn't suitable for a reactor sensor.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
    4. Re:Um by be-fan · · Score: 5, Insightful

      If I ruled the world, I would create a multi-paradigm (object-oriented, generic, functional, and modular support) strongly-typed low-level language that let you program at a high-level. A second high-level langauge that was loosely-typed, garbage collected, and could be interpreted or natively compiled. Then I would define a standard to interface the two languages.
      ----------
      You just described Scheme/CL/Dylan.

      --
      A deep unwavering belief is a sure sign you're missing something...
    5. Re:Um by Anonymous Coward · · Score: 0

      no - its like saying your house dosn't need a complete fission power station when all you need is a furnace and a thermostat.

    6. Re:Um by Anonymous Coward · · Score: 0

      No, it's more like saying a program without hard time constraints can do just fine with a good generational GC.

    7. Re:Um by Tom7 · · Score: 4, Informative


      Of course, there are also hard real-time garbage collectors (ie Cheng's), though I don't think you'll find them in general-purpose production compilers. However, you will find good garbage collectors in a number of real production compilers (say, in mlton). It's definitely worth benchmarking.

    8. Re:Um by E_elven · · Score: 2, Funny

      > Which means you don't have predictable destruction. Which means you don't have destructors.

      My GC does something like this:

      1) Allocate memory, reference count 1 ...
      N-1) When ref count reaches 0, we call destroy() on the object ...
      N) Free the memory

      --
      Marxist evolution is just N generations away!
    9. Re:Um by cookd · · Score: 5, Insightful
      It's even better than that. In many cases, GC can be almost as fast as malloc/free. Looking at the way things are going, I am pretty sure that in the near future GC-based code will be faster, not slower, than equivalent malloc/free code:

      • A compacting GC heap can make allocation into a REALLY fast operation (unless it triggers a collection), and there is no time spent deallocating. On normal heaps, malloc/new can take quite a lot of work (thread synchronization, best-fit free list searching, etc.). If you amortize the time the GC spends in collection over the allocations, the average allocation isn't that much slower than the corresponding malloc/free. Best of all, the gap is shrinking. Soon, GC may be FASTER overall than malloc/free in many real-world situations. It obviously depends on memory usage patterns and collection strategies, but it is starting to happen.
      • If you're using GC, your program doesn't have to do all of the bookkeeping anymore. The amount of time spent in resource tracking in big programs is fairly significant -- where the same object is shared by many parts of the program, figuring out when the item can be released is nontrivial. Under a GC, this is handled by the collector. In some cases, the GC can handle the bookkeeping more efficiently than would have been feasible with manual bookkeeping.
      • As CPUs get faster and storage (memory and disk) gets (relatively) slower, it becomes more and more attractive to spend extra CPU cycles to try to make better use of cache and memory. An L1 cache miss costs around 4-10 cycles; an L2 miss can cost 100-400 cycles; a page fault costs millions. The CPU time spent in garbage collection can become insignificant when compared with storage access time. Do you take the cache line size into consideration when you call malloc? I sure don't. But compacting GC implementations are starting to take things like that into consideration when they collect, and they rearrange the memory of the process to maximize cache hits and minimize memory waste. They can help your program make more efficient use of the cache, and perhaps reduce the working set. Just a 5% reduction in page faults would more than offset all but the most CPU-intensive garbage collectors.
      • GCs usually collect on a separate thread. That means that with a properly designed collector, while your program is blocked on IO or waiting for user input, the GC might be cleaning up the heap on a low priority thread. With luck, your main thread might NEVER actually be interrupted for a collection -- all of the collection can be done while your work threads were waiting for something else. Your memory management tax was paid in the background, while the machine was otherwise idle. On the other hand, with malloc and free, the memory management tax is paid up front by the currently executing thread. While this isn't always the case (i.e. CPU-bound processes), most apps are NOT CPU-bound, so this is a likely scenario in the future (if not the present).


      While I don't think GC is quite to the point where it is free or beneficial to the performance of the average application, it is a lot less harmful than most people think. Given that it simplifies the code and eliminates a lot of bugs (usually more than it introduces), it is definitely worthwhile in almost all new application code (kernel-mode code isn't quite there yet, but it's coming), with only a small performance penalty. And I suspect that it won't be too long before it starts to be more of a speed booster, not a perf hit.

      I think this is just another step in the process of handing another menial task over to the CPU. We moved from binary to assembly, assembly to low-level languages, low-level languages to higher-level languages, etc. At each step, the new method had a performance penalty at first, then as the new method matured, it turned out to actually be faster than the old method it replaced, while dramatically increasing programmer productivity (i.e. modern optimizers can usually do a better job than an assembly language programmer; often C++ code is faster than the equivalent C code since the compiler has more information to work with and the programmer can make use of more effective techniques like templates).
      --
      Time flies like an arrow. Fruit flies like a banana.
    10. Re:Um by Anonymous Coward · · Score: 0

      > Which means you don't have predictable destruction. Which means you don't have destructors. Which means you can't use idioms like resource-allocation-is-initialization.

      Well, you can sort of use resource-allocation-is-initialization if you don't mind having a different syntax for the allocation in those situations.

      e.g. in C# you can use the "using" operator (as it were), to do stuff like this...

      using (DbCon db = new DbCon('something')) {
      db.query();
      }

      ...and db's Dispose() method will get called at the block close, and not later.

      I know it's not quite the same, but it's a fairly decent compromise.

    11. Re:Um by egomaniac · · Score: 2, Interesting

      Which means you don't have predictable destruction. Which means you don't have destructors. Which means you can't use idioms like resource-allocation-is-initialization.

      So in the presence of exceptions, you won't leak memory on the heap. But you will leak mutexes, file handles, etc. You need another idiom to handle those cases.


      Java is probably the most widely-used garbage-collected language in existence. I think I speak for all Java programmers when I say "WTF are you talking about?"

      It is true that you can't guarantee timely execution of finalizers. Because of that, Java is no different than any other language in that you should close files when you're done with them. However, if you forget to close a file here and there, it's generally no big deal because they will be closed during finalization. It only becomes a problem when you're leaking tons of file handles very quickly.

      Now, this article is about C, so let's compare the two.

      Java: failing to close a file == usually no problem whatsoever

      C: failing to close a file == permanently leaked handle

      I'd take the garbage-collected case any day of the week, personally.

      As far as the other case you mention, mutexes, goes, Java has two means of providing mutual exclusion. The "synchronized" keyword, which is almost always the way things are done, works properly in the face of exceptions, with no extra work by the user. The wait and notify methods, which are the more advanced case, need to be balanced, so you can end up with deadlocks if you fail to do a notify for some reason (such as an exception).

      Well, that's what the "finally" clause is for. Yes, you need to be careful to do things the right way, but that's true of any language.

      I freely admit that Java is not the be-all and end-all of languages. There may be ways of handling these situations that guarantee these problems can never occur. But you make it sound as if garbage collection is a step backwards from malloc/free, and it most certainly is not. Everywhere you can get into trouble with a garbage collected language, you can get into ten times as much trouble with a manual-free language.

      --
      ZFS: because love is never having to say fsck
    12. Re:Um by BlueBiker · · Score: 1

      Very interesting post, wish I had me some mod points. The point about CPU speeds eclipsing storage access times is especially relevant.

    13. Re:Um by Yokaze · · Score: 1

      And something like this:
      1) Allocate object A(1) (reference count 1), B(1) and C(1)
      3) A(1)->B(2) (Reference B by A)... A(2)->B(2)->C(2)->A(2)
      4) Dispose root references to A(1),B(1),C(1)
      5) Have fun with your memory leak.

      --
      "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
    14. Re:Um by Anonymous Coward · · Score: 2, Informative

      Often C++ code is faster than the equivalent C code since the compiler has more information to work with and the programmer can make use of more effective techniques like templates.

      Forget C++! High-level modern multi-paradigm languages like Common Lisp and OCaml, which can do most things C++ can do and a lot of things it can't, often produce code as fast as the average C implementation of any given algorithm, despite their relying heavily on GC. And while it's generally possible to tweak the C version to be faster, bear in mind that the C already takes longer to write and debug, even without allowing for the time-consuming profile/tweak/rebuild cycle that real optimisation requires.

      Personally I've already relegated C to the position that assembly used to hold - I drop into C to write very tight loops, to perform low-level bit twiddling, and to interface with other languages. I wouldn't really consider using it to write an entire application, and the time I save by not having to manage resources myself is one of the major factors that rule it out.

    15. Re:Um by Mikkeles · · Score: 2, Insightful

      If I ruled the world, I would create a multi-paradigm (object-oriented, generic, functional, and modular support) strongly-typed low-level language that let you program at a high-level. A second high-level langauge that was loosely-typed, garbage collected, and could be interpreted or natively compiled. Then I would define a standard to interface the two languages.

      In other words, take C++ and add the concept of components/packages. Take Python and add the features (such as generics) that are missing from C++. And then define an interface between components written in both langauges.

      I would suggest Ada + scheme/common lisp. I also think plain C + assembly would be very sensible for when one is dealing directly with bits, bytes, and addresses as it simplifies some of the overriding which would be needed in Ada.

      I don't think of C++ as strongly typed. typedefs don't enforce, e.g., different meanings of ints (so number of apples vs. number of oranges can't really be distinguished without creating multiple classes). Even enums aren't distinguishable from ints!
      Also, Ada already has the packaging you mentioned. It also does support interoperability with C, as do most scheme/lisps.
      --
      Great minds think alike; fools seldom differ.
    16. Re:Um by Curien · · Score: 4, Interesting

      Java is probably the most widely-used garbage-collected language in existence. I think I speak for all Java programmers when I say "WTF are you talking about?"

      He's talking about the unpredictability of resource release using GC. If unpredictability isn't a problem, fine. If you need to synchronize your resources carefully (which is what a mutex is /all about/), then you've got a problem.

      Now, this article is about C, so let's compare the two.

      The post you're responding to wasn't about C: it was about a weakness of GC compared to, say, RAII (which is the idiom C++, among others, uses). But just for fun, let's go on to see how little you know about C.

      Java: failing to close a file == usually no problem whatsoever

      Unless you need to open the file again before the garbage collector decides to reclaim the handle.

      C: failing to close a file == permanently leaked handle

      Bzzt... wrong. All file handles are released upon program termination. I like how you used '==' to try impress people with your programming skillz, though.

      As far as the other case you mention, mutexes, goes, Java has two means of providing mutual exclusion. The "synchronized" keyword

      Wait, wait... you're saying... hold on a second, now... that Java uses a different idiom to handle mutexes? That's exactly what the parent post said it would have to do... because GC isn't as useful as RAII when it comes to general resource allocation (not just memory).

      But you make it sound as if garbage collection is a step backwards from malloc/free

      He made no such comparison. He compared it (unfavorably) to RAII.

      --
      It's always a long day... 86400 doesn't fit into a short.
    17. Re:Um by Anonymous Coward · · Score: 0

      > C: failing to close a file == permanently leaked handle

      Bzzt... wrong. All file handles are released upon program termination.


      Bzzt... wrong. The behaviour of a C application upon termination with file handles open is undefined. Your OS might clean up for you, but that doesn't mean you can count on your customer's OS doing it.

    18. Re:Um by Curien · · Score: 1

      It's quite well-defined. When the end of main is reached, it's equivalent to calling the exit function (unless the return type of main is incompatible with int, in which case the value returned to the operating system is implementation-defined). Among other things, exit() has the following affect:

      "Next, all open streams with unwritten buffered data are flushed, all open streams are
      closed, and all files created by the tmpfile function are removed."

      Maybe you should learn the language a little better.

      --
      It's always a long day... 86400 doesn't fit into a short.
    19. Re:Um by Anonymous Coward · · Score: 0

      No it's really more like saying a giant bird jumping off a train can find itself falling into a cliff.

    20. Re:Um by Anonymous Coward · · Score: 0

      Your sig. pisses me off to no end.

    21. Re:Um by Nevyn · · Score: 2, Interesting
      If you amortize the time the GC spends in collection over the allocations, the average allocation isn't that much slower than the corresponding malloc/free. Best of all, the gap is shrinking. Soon, GC may be FASTER overall than malloc/free in many real-world situations. It obviously depends on memory usage patterns and collection strategies, but it is starting to happen.

      I've heard that for years, it's yet to be true for the general case ... and most people doing manual allocation don't call malloc/free for every single grow/shrink operation. The caching slab allocator was put into solaris a long time ago now.

      If you're using GC, your program doesn't have to do all of the bookkeeping anymore.

      The keyword being all, yes you can sometimes just completely forget about designing how you use memory and everything "works". However sometimes those bits of memory have other resources associated with them.

      An L1 cache miss costs around 4-10 cycles; an L2 miss can cost 100-400 cycles; a page fault costs millions. The CPU time spent in garbage collection can become insignificant when compared with storage access time.

      You seem to be under the impression that running the GC isn't going to blow your cache ... why is that?

      But compacting GC implementations are starting to take things like that into consideration when they collect, and they rearrange the memory of the process to maximize cache hits and minimize memory waste.

      And the GC knows when to move memory, and when it's just wasting CPU/cache ... how? It's fairly simple to have double pointers (both win32 and MacOS9 use HANDLE types a lot IIRC). The big problem is working out when you can/should move them.

      GCs usually collect on a separate thread. That means that with a properly designed collector, while your program is blocked on IO or waiting for user input, the GC might be cleaning up the heap on a low priority thread. With luck, your main thread might NEVER actually be interrupted for a collection

      If you actually want threads ... which many of us don't. It also seems to think that cross CPU cache invalidates are free (accessing the same data from more than one CPU -- this can have a massive cycle cost), and that the GC does some kind of magic locking which will never affect the application.

      It's also amusing that, by definition, it means that as your application does more work (and hence needs more CPU) it is guaranteed to not get it ... because you will then suddenly start fighting with the GC thread.

      While I don't think GC is quite to the point where it is free or beneficial to the performance of the average application, it is a lot less harmful than most people think.

      It's my experience that the people who think GC is expensive generally know what they are talking about, and are in the minority. Most people either don't need to care (GC is good enough for them -- fair enough, I've written things where I used perl and didn't need to know how it allocated), or just don't care.

      There are other ways of managing memory than just giving up and saying well the GC will save me. Vstr, which is a decent implementation of what the IBM article was trying to say removes almost all the allocation/deallocation headache when dealing with byte data ... but it's still controllable and predictable.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    22. Re:Um by Anonymous Coward · · Score: 0

      Garbage collection is like insurance -- it's a great idea until you actually need it...

    23. Re:Um by cookd · · Score: 1

      I was using C++ as an example of a language that was "higher level" than C. Lisp and OCaml also fit that description (as well as the "higher level than C++" description). The point was that high-level languages can, in some cases, actually BEAT C in PERFORMANCE, let alone programmer productivity. Look at the "Great Computer Language Shootout" page for more info. OCaml is right up there with C and C++ in performance.

      So the point is that people shouldn't be afraid of "high level" features like GC. While some high level languages have lousy perf, others are just as fast or faster than the "low level, bare metal, super-efficient" languages.

      --
      Time flies like an arrow. Fruit flies like a banana.
    24. Re:Um by E_elven · · Score: 1

      That's a good sign the design isn't very good. Mine is :)

      --
      Marxist evolution is just N generations away!
    25. Re:Um by moocat2 · · Score: 2, Interesting

      Yes, the article is about C, but the poster was talking about predictable destruction, so he was not referring to C when dissing GC.

      My guess is that he or she was referring to C++ using the RAII (resource acquisition is initialization) paradigm. Every resource is wrapped in a object and all objects are put on the stack. Voila, no resource leaks.

      Also, I think your example is far from complete. While failing to close a file handle may be "no problem whatsoever" for the application itself, but can be a problem in the larger context. Some platforms will not allow a file to be deleted when it is open. If a programmer forgets to close a file handle, the user may not be able to delete the file until some time in the future. Far from an ideal situation.

      And consider a different example. Think about trying to create a different synch primitive such as a reader-writer lock. Under a GC system, the programmer is still required to release the lock. If he doesn't, the system could essentially hang.

      As for the finally clause, we all know that guaranteeing that there is no path that allows leaks is just plain damn hard to do. If finally was such a great answer, why even both with GC, just free data in the finally clause.

    26. Re:Um by Anonymous Coward · · Score: 0

      High-level modern multi-paradigm languages like Common Lisp and OCaml

      OCaml is an ugly language that is a dreary shitbird to try to program in. I've never learned Lisp.

    27. Re:Um by cookd · · Score: 2, Insightful

      I've heard that for years, it's yet to be true for the general case ... and most people doing manual allocation don't call malloc/free for every single grow/shrink operation. The caching slab allocator was put into solaris a long time ago now.

      Caching slab, lookaside lists, whatever, you're still calling some kind of allocator with some kind of cost. There are definitely applications where memory allocation/deallocation follows a pattern that allows that kind of optimization to work well. But here is the crux of my argument: 1. the general purpose allocators definitely do have a non-zero cost; 2. special purpose allocators can be significantly faster, but they aren't general purpose, so they don't work in all situations, and they often require significant extra work to make them work properly; 3. special-purpose allocators usually work by speculatively pre-allocating chunks from the general purpose allocator, and this wastes memory; 4. even when you are using special-purpose allocators as much as possible, you still have to spend a lot of time in memory management -- the special-purpose allocator isn't free, and every once in a while it still has to call out to the general purpose allocator, which is expensive. So I stand by the original premise: after all is said and done, the time spent in memory management for a GC-based program isn't much (if any) greater than the time spent in memory management for an equivalient non-GC-based program. When the total cost of the allocations/deallocations is compared with the cost of a similar set of GC-alloc + collections, the difference for most big applications isn't really significant.

      The keyword being all, yes you can sometimes just completely forget about designing how you use memory and everything "works". However sometimes those bits of memory have other resources associated with them.

      Yep. And now you don't have to bother with those either, since the GC can collect them during finalization of their associated bits of memory. The point here is that while you still might have to write a destructor of some sort, it is usually much simpler (no memory to worry about) and you don't have to figure out when that resource is no longer in use -- the GC will figure it out for you. If the resource must be freed deterministically, GC doesn't prevent you from doing that. But in the cases where it doesn't, GC still works great. In other words: don't refuse the solution just because it only makes 90% of your problems go away -- you're still better off, aren't you?

      You seem to be under the impression that running the GC isn't going to blow your cache ... why is that?

      If the garbage collector itself causes additional page faults, then yes, it isn't an advantage. But the argument was that when compared with page faults (something we've all been willing to accept, something that hasn't caused an unacceptable performance hit), GC is small potatoes. In addition, modern GC implementations take the amount of memory available on the machine into account, and they actively work to keep memory usage under the level that would start causing page faults. The garbage collector can spring into action when a request is made that might result in a page fault, and the page fault only actually happens if there really aren't any objects to release. The GC can dynamically determine which objects are unreachable, while the non-GC program must rely on a static analysis that often means objects live longer than is truly necessary (the GC can free something anytime after the last reference is gone, while a non-GC program has to keep the object until it comes to a point in the control flow that is unreachable until the object is no longer needed). While the dynamic analysis has a runtime cost (the static analysis has no runtime cost), the additional information available at runtime can enable much more efficient memory management, which in turn can reduce page faults, which can off

      --
      Time flies like an arrow. Fruit flies like a banana.
    28. Re:Um by cookd · · Score: 1

      Agent Smith and I are happy to know you care.

      --
      Time flies like an arrow. Fruit flies like a banana.
    29. Re:Um by The+Munger · · Score: 1

      If I ruled the world, I would create a multi-paradigm (object-oriented, generic, functional, and modular support) strongly-typed low-level language that let you program at a high-level.

      And if I ruled the world, me and Salma Hayek would... I'm sorry, what was the question?

      --
      Refuse to make a statement in your sig!
    30. Re:Um by key45 · · Score: 1

      Personally I've already relegated C to the position that assembly used to hold ... I wouldn't really consider using it to write an entire application

      You lucky dog. I've currently got the fun job of trying to write a hard realtime application for a DSP chip which does not have a C++ (or high level language except C) compiler. For me, this technique looks pretty interesting.

    31. Re:Um by smallpaul · · Score: 1

      Which means you don't have predictable destruction. Which means you don't have destructors. Which means you can't use idioms like resource-allocation-is-initialization.

      In standard Python you have both garbage collection and predictable destruction.

      In other words, take C++ and add the concept of components/packages. Take Python and add the features (such as generics) that are missing from C++. And then define an interface between components written in both langauges.

      If you are using Python for your application logic it seems overkill to me to also have the full power of C++ for your inner loops. C is fine and PyRex defines a very nice binding mechanism.

    32. Re:Um by Tukla · · Score: 1

      Hmm? Why does insurance become a bad idea once you need it? (By "need it", I assume you mean "need to file a claim".)

    33. Re:Um by Whip-hero · · Score: 2, Informative

      So in the presence of exceptions, you won't leak memory on the heap. But you will leak mutexes, file handles, etc. You need another idiom to handle those cases.

      In the .NET world, C# introduced synchronization blocks to handle the leaking mutex problem. But it is a pain in Managed C++ and VB.NET.


      File/mutex/whatever leakage in the presence of exceptions? Two words:

      unwind-protect

      --
      --WH--
    34. Re:Um by Ryan+Amos · · Score: 1

      Scheme! I had to program in that for an intro CS class. Basically the point was to teach the class in a language nobody knew to filter out all the smartasses who had read an "intro to C++" book. Fun little language, though it's applications are a bit limited...

    35. Re:Um by Anonymous Coward · · Score: 0

      Just a nitpick... your quote "Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed." never mentions closing file handles opened by a means other than stdlib tmpfile() function.

      So if you open() your own files, you're SOL.

    36. Re:Um by Curien · · Score: 1

      No, you paid attention to the wrong part. The relavent part is "all open streams are closed". Remember that a FILE* is a stream.

      --
      It's always a long day... 86400 doesn't fit into a short.
    37. Re:Um by Anonymous Coward · · Score: 1, Insightful

      If you think that was the point, then you completely missed out. I guess that is to be expected from somebody that went to UT.

    38. Re:Um by cheesybagel · · Score: 1

      LISP is a nice language, except for the numerous ()'s. I know you can supposedly get used to them, but I never got used to it.

    39. Re:Um by Nevyn · · Score: 1

      Let me prefix this by saying that I didn't want to get into a GC flamewar, as I said I think GC can be used in some applications where you don't care about the negative side affects. I only replied to you as you seemed to be giving one of the better "Let's burn everything and rebuld using only GC" arguments. Possibly some of the difference is due to you working for MSFT and me being at RHAT, but I doubt it ... and if you don't hold it against me I'll do the same :). So anyway...

      Caching slab, lookaside lists, whatever, you're still calling some kind of allocator with some kind of cost. There are definitely applications where memory allocation/deallocation follows a pattern that allows that kind of optimization to work well.

      All applications follow a pattern that allows custom allocators to work better than general ones, like GC or just calling malloc(). This is because the custom allocator is a very specific version of the former, and so they have more innate knowledge of when to allocate, free, etc. Custom allocators also tend to have programmer control built in, so you can do things like unshare data if the reference count is one ... which is very hard to do easily in a general purpose GC.

      You seem to be under the impression that running the GC isn't going to blow your cache ... why is that?

      If the garbage collector itself causes additional page faults, then yes, it isn't an advantage. But the argument was that when compared with page faults (something we've all been willing to accept, something that hasn't caused an unacceptable performance hit), GC is small potatoes. In addition, modern GC implementations take the amount of memory available on the machine into account, and they actively work to keep memory usage under the level that would start causing page faults.

      I said cache, page faults are a similar but different case. If your GC is on another thread then it can be doing cross CPU invalidation which will stall your application. If it's running in the same thread, when it runs it will evict some of your application text/data from cache ... also possibly introducing latencies.

      Collectors do "waste" a LOT of CPU cycles moving memory around, but it only happens once in a while. Your CPU can move memory at speeds measured in hundreds of megabytes per second, so the compaction only takes a few milliseconds, and it only happens every few seconds at worst.

      That 100s of MB a second assumes one linear piece of memory, this is generally not the case ... but even assuming it can just eat CPU to move it and not affect the cache and you have the CPU to spare ... then you have to have an algorithum to work out where it goes, without blowing the cache, taking a fault or taking too long to introduce latency to the app.

      I suppose that your programs never wait for anything? Never block on IO? You always have something worthwhile to calculate while you're waiting for that IO operation to complete? Somehow I doubt that. The vast majority of applications spend most of their time waiting for events of some sort -- waiting for a socket connection, a keystroke, file IO, etc. Most CPUs are idle 99% or more.

      It obviously depends on the application. But yes I've written/advised a few applications where CPU speed was a factor. Sure my text editor could probably be in Java instead of C and I wouldn't care ... and I probably wouldn't care if my spreadsheet was too, but someone who runs a large calculation might.

      If you actually want threads ... which many of us don't.

      Welcome to the 21st century, buddy. Threads are here, they're useful, and if used properly, they increase performance,

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    40. Re:Um by cookd · · Score: 1
      Been interesting posting with you. Thanks for the discussion. Obviously, there are two sides to this coin, and definitely there are some worthwhile arguments for and against.

      To make sure that I wasn't just blowing steam, I did a little bit of looking around, reading some research papers, etc. Here is what I came up with:
      • If your program requires the allocation and deallocation of a lot of items of specific, predictable sizes, you take cache size and available memory into consideration, and really know what you are doing, you can speed up your program by writing special purpose allocators. However, the papers that I read indicated that in 4 out of 5 cases where special-purpose allocators were used instead of malloc and free, it did little or nothing to improve (and sometimes worsened) the speed of the program, but increased the complexity of memory management. Most modern general-purpose allocators are adaptive and set up lookaside tables for frequently allocated block sizes, and they do almost as good of a job as custom allocators in most cases.
      • Using a conservative GC, such as the Boehem GC for C++ (where there is no language support for GC, so the GC actually has to scan the stack and assume that every bit pattern that matches the heap's address range is a valid reference) is slower for programs that allocate mostly large objects, but actually faster than malloc/free if the program allocates a lot of small objects.
      • Manual memory management gets slower (more cycles spent making sure you've properly kept track of all of your allocated objects) and more difficult to get right as the application grows in size and complexity. GC doesn't. The more complicated the memory management needs, the more likely that GC will be significantly faster than manual management, even without taking the productivity savings into account.
      • If threads are involved, lock contention for the heap becomes a significant factor in program execution time. While GC is not free from this, it doesn't degrade as quickly with increased # of processors.
      • Refcounting is actually expensive, more so than GC. In multi-threaded apps, you have to use an interlocked instruction to update the refcount, or perhaps even make a function call. Benchmarks between refcounted apps and GC apps show GC taking a significant lead. Of course, there are optimizations that can help with this (weighted refcounting), but they aren't used a whole lot.
      • The function calls to malloc and free tend to blow the TLB and cache almost as effectively as collections. The function calls to specialized allocators often do so as well. The difference is that GC is only called once in a while (relatively), so the TLB and cache are only blown a few times a second instead of every few hundred allocations.
      • The locality of reference provided by copying collectors has been shown to speed up execution by a significant factor, often saving more cycles than were spent compacting the heap.

      Anyway, that is what I came up with. I don't have all of the links, but a lot of them were within a few clicks of the so-called The Memory Management Reference website. I'm even more convinced than I was before.

      However, I agree with you that everything has its place. There are places where GC still isn't appropriate. The GC tends to have a larger initial cost than malloc/free (in terms of both code and initial data requirements), so for very simple systems (embedded computers or bootstrappers for more complex systems) the GC just won't work -- there isn't enough room for a program complicated enough to allow the GC to be worth the initial cost. And while "real-time" GC does exist, it isn't yet good/fast enough to be used in the kernel where interrupt handling has to be done within very tight time constraints -- even though a standard GC implementation would likely have better performance (according to what I've read) than the

      --
      Time flies like an arrow. Fruit flies like a banana.
    41. Re:Um by Anonymous Coward · · Score: 0

      Garbage Collection is not necessarily incompatible with RAII.

      If the only way in which objects are destroyed is through an unpredictable garbage collector then yes, RAII isn't possible.

      If you have reference-counting based garbage collection then RAII is possible (though reference-counting brings its own issues).

      If you have garbage collection for heap objects, but automatic collection for stack objects then RAII is possible (and indeed little different to RAII in plain C++).

      C#'s garbage collection rules out RAII per se, but the same techniques can be transferred into its use of "using" statements like in this RAII article's note on .NET

    42. Re:Um by Zan+Zu+from+Eridu · · Score: 1
      I don't think of C++ as strongly typed. typedefs don't enforce, e.g., different meanings of ints (so number of apples vs. number of oranges can't really be distinguished without creating multiple classes).

      Would you call C++ strongly typed if the keyword would be "typealias" instead of "typedef"?

      Even enums aren't distinguishable from ints!

      Bzzt, wrong! Try to assign an int to an enum without a cast... You'll get an error.

    43. Re:Um by Nevyn · · Score: 1
      To make sure that I wasn't just blowing steam, I did a little bit of looking around, reading some research papers ... I don't have all of the links, but a lot of them were within a few clicks of the so-called The Memory Management Reference website

      Well it's hard to argue with no links :). But your point that large objects are easier on a malloc()/free() seems like common sense. It's much more often that you'd write a custom allocator for small objects.

      Refcounting is actually expensive, more so than GC. In multi-threaded apps, you have to use an interlocked instruction to update the refcount, or perhaps even make a function call.

      Again with the threads?:). Yes, I've heard this argument before, however ... 1) You don't need to do locked refcounting. If you do you are almost certainly sharing too much information between the threads. Locking should not be done "automatically" inside objects. This will almost certainly lead to the threads serializing against each other. 2) GC needs something similar so you can have quick destruction. This is often especially noticable when an app. is written assuming GC, at this point the application will assume that doing "x = new foo();" is just as cheap as "foo x();", then objects of very different lifetimes will be intermixed and the GC needs to know to free the "stack like" objects quickly. For instance python has both a GC and a ref counting system, and this seems to make most people happier.

      The function calls to malloc and free tend to blow the TLB and cache almost as effectively as collections. The function calls to specialized allocators often do so as well. The difference is that GC is only called once in a while (relatively), so the TLB and cache are only blown a few times a second instead of every few hundred allocations.

      Yes, malloc/free are not simple functions. However I'd disagree that custom allocators can blow the cache. Often the "allocation" happens with a test and two pointer assignments (for instance Vstr does this -- actually one pointer assingment per object, and another for the entire group), and deallocation with just two pointer assignments. I fail to see how a GC could compete with this. In threaded programs this is even more pronounced as you don't have to do any locking.

      Then you look at things like filesystems, which have had GC like properties for a long time ... and the ones that are the best at managing space are always the ones that do the most work at allocation time. The filesystems that have been design to "make allocation fast" tend to royally screw up management of the space under a bunch of conditions. And I don't see why GC would be any better at this.

      The one paper on stats. that was linked directly from the Boehm C collector was Memory Allocation Costs in Large C and C++ Programs (1994), which is almost 10 years old :(. But I guess Boehm hasn't changed much (although I'm not sure I'd say the same for malloc/free).

      This paper doesn't suggest to me that GC is very close to malloc/free, with it being worse in both space and time (perl taking 125% of the time and 275% of the memory). Xfig was the only app. sfaster CPU wise (by 0.3 of a second) and was almost half a MB bigger.

      Personal observation also suggests this, as when gcc recently move from malloc/free to using Boehm it got both slower and bigger.

      However I would probably be happy to take the hit on some of those packages tested ... mainly due to the fact that they aren't performance sensitive to me.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    44. Re:Um by Ryan+Amos · · Score: 1

      Heh, probably true, as a few classes down the line I decided I didn't want to be a programmer anyway :) I'm a sysadmin dammit :(

    45. Re:Um by Mikkeles · · Score: 1
      "Would you call C++ strongly typed if the keyword would be "typealias" instead of "typedef"?"

      I'm afraid I only know of 'typealias' in Fortran (where it is synonymous with 'typedef') and in the Meta Object Facility (MOF) of CORBA. I have found reference to a python class of that name, but know nothing of it.
      Would you be able to provide me a reference? Ta.


      Even enums aren't distinguishable from ints!

      Bzzt, wrong! Try to assign an int to an enum without a cast... You'll get an error.

      You are correct in that direct assignment is not possible. The cast, however, provides no type checking. For example:

      enum foo {a, b, c}
      :
      .
      int I = 400
      foo F1 = foo(I)
      foo F2 = (foo)(I+1)

      succeeds! If 'printf("%d %d\n", F1, F2)' is called, it prints '400 401'. This is in no way strict type checking. (IMHO :^) I grant that 'distinguishable' was an incorrect choice of word.
      --
      Great minds think alike; fools seldom differ.
    46. Re:Um by Zan+Zu+from+Eridu · · Score: 1
      I have found reference to a python class of that name, but know nothing of it. Would you be able to provide me a reference? Ta.

      I was trying to make it clear that typedef doesn't define or declare a new type, but an alias to an existing type. IOW the keyword "typedef" is misleading, but it doesn't make C++ weakly typed.

      The cast, however, provides no type checking.

      Yes, that's a design feature of the language. The compiler errs out if you implicitly try to cast incompatible types, but when you explicitly force the cast the compiler swallows it (because it assumes you know what you're doing). After all you've got to write extra code (the cast) to make it behave unsafe.

  2. Weeeee Lets implement Slab allocators by MerlynEmrys67 · · Score: 1

    Hmmm... Didn't this go into Unix kernels almost a decade ago - Thank you Sun

    --
    I have mod points and I am not afraid to use them
    1. Re:Weeeee Lets implement Slab allocators by Anonymous Coward · · Score: 0

      Thank you Sun

      Correction: Thank you SCO! :-D

    2. Re:Weeeee Lets implement Slab allocators by Anonymous Coward · · Score: 0

      The funny thing is that while it was used in the kernel, kernel devs didn't find it interesting enough to hint the libc devs about providing an equivalent in User Land. Still, having such tool to build a program instead of the simplistic "malloc/free" scheme could have been quite beneficial in the unix application developpment world.

      Too bad Unix (or Sun if I refer to your post) didn't have enough insight to provide a standardised version of that tool to users.

  3. Gee, isn't that handy by c · · Score: 4, Interesting

    Anyone who's done C coding for more than, oh, a day would have already figured that out. It's not a coincidence that every programming language that doesn't have "smart" arrays built into the language ends up with some sort of buffer class (Java's ByteStream class, C++'s stream IO buffers, etc).

    The fundamental problem is that this sort of thing needs to be done at the C library level. And if it's not done in a flexible fashion, you end up with a library call that rarely gets used. Anyone used hsearch() lately?

    If only clib streams (FILE* and friends) were extensible, this article would never have had to be written.

    c.

    --
    Log in or piss off.
    1. Re:Gee, isn't that handy by Nevyn · · Score: 1
      The fundamental problem is that this sort of thing needs to be done at the C library level. And if it's not done in a flexible fashion, you end up with a library call that rarely gets used. Anyone used hsearch() lately?

      Take a look at, Vstr I think it's pretty flexible ... it certainly has much better researched documentation than the content for this IBM "research" article.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    2. Re:Gee, isn't that handy by c · · Score: 1

      Comprehensive, that's for sure, but the examples look like it's also reinventing FILE* in the io_* API.

      glibc's fmemopen() moots most of the IBM article, I think, but since I don't code exclusively in a glibc environment... Grrr... If only POSIX specced out FILE* a bit tighter...

      c.

      --
      Log in or piss off.
    3. Re:Gee, isn't that handy by Nevyn · · Score: 1
      Comprehensive, that's for sure, but the examples look like it's also reinventing FILE* in the io_* API.

      The io_* API is part of the examples, and not in the library itself. But it's a pretty small wrapper over what's in the library ... and, yes, the library itself was designed so it could do non-blocking IO (which FILE* can't). So in that regard, yeh, I don't tend to use FILE* anymore.

      glibc's fmemopen() moots most of the IBM article, I think, but since I don't code exclusively in a glibc environment... Grrr... If only POSIX specced out FILE* a bit tighter...

      Yeh, fmemopen() would have helped a lot. Esp. if you could have implemented asprintf() via. it (Ie. expanding memory regions). But it's completely unportable (and I wouldn't trust even the glibc implementation ... as hardly anyone uses it). But even then vstr_* is very nice in that it crosses both a IO stream like API with a string API. So you can get data from an fd etc. and immediately split it, or search for something without having to copy it into a string API. Then again, I'm biased :).

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
  4. Please by YellowElectricRat · · Score: 4, Funny

    This article, I believe, has already been published in the well known programmers' journal "No shit Sherlock - monthly"

  5. realloc by norwoodites · · Score: 1

    It is called realloc, that is the real way that people should use to self-manage memory allocation, and something that detects leaks is also needed.

    1. Re:realloc by Josh+Booth · · Score: 1

      How about use another language, like not C? Or use a gc library. I have heard of many, but it really should be part of the language so people will actually use it.

    2. Re:realloc by Permission+Denied · · Score: 4, Interesting
      Regarding realloc:

      When you call realloc, you're very likely to cause the data to be copied from the old buffer to the new buffer. This is very high overhead. The article discusses how to do similar things, but without this unecessary copying (eg, low overhead). It's actually not that interesting of an article as what it describes is hardly new and I believe any competent programmer could come up with that solution when faced with the particular circumstances that inspired it.

      Realloc works by seeing if there is free memory after the end of the allocated block, and changing the block's size if so. Realloc can do this because it knows about the internals of the malloc/free implementation. If there is allocated memory right after the block in question, a new block must be allocated, as you cannot "move" the later block in a language like C where any memory location can be a pointer. You could try this kind of stuff in other languages (or in some bastardized C where you do not have direct access to memory, but go through more indirection, the next logical abstraction after the article), but when you start automatically finding/checking/updating memory pointers, you get into GC.

      You may be able to overcome some overhead on realloc if you move the problem down into the kernel. The kernel could play page table games so there is little or no actual copying involved, just updating of page tables. This would be fairly easy to implement, but I don't think anyone's done it because (a) flushing the relevant TLB entries could hurt performance more than the copying, and (b) the system call overhead might be more overhead than the copying. Realloc is generally only used for small buffers (due to programmers knowing about the copying overhead) and this trick would only have gains for large buffers spanning multiple pages. For small buffers, the library-level realloc could avoid the system call and do the copying itself, avoiding system call overhead and TLB entry flushes.

      This scheme I describe could make for an interesting paper (especially determining for what size of buffer and what type of program it has gains), but I doubt it would make much difference in real system performance as programmers avoid realloc for large buffers, and there are very few cases where one needs direct linear access to a large range of memory rather than being better-served by organizing that memory into some data structure.

    3. Re:realloc by norwoodites · · Score: 1

      But linked lists are worse on performance than realloc.
      One way to use realloc better is to have a current length and max size and also rellocate more memory than need at the needed time. This cause better locality than a linked list. And you can always use realloc to make the list to the current length. Linked lists cause many problems when it comes to prefetching and fast memory access so they are slow when you are accessing things down a list. You can do a tree using one list also, by having an inverse tree. Graphs can uses multiple lists instead of linked ones.

      Also linked lists are very prone to leaking and very hard to figure out which one to free first.

      Note defining who frees what and when is a different issue than what memory manager you are you using. Defining an API is good but unless you define when things can be released/retained you are f**d so the who issue really is more a social one.

    4. Re:realloc by Nevyn · · Score: 2, Insightful
      But linked lists are worse on performance than realloc.

      Care to back that up with some facts.

      Also linked lists are very prone to leaking and very hard to figure out which one to free first.

      So you a) have the library code do that ... and b) have lots of tests. Of course you want to do both of those for something using a single block of memory too, and if you want it to be efficient it usualy does something clever to avoid copies ... and so is probably more likely to screw up and use/deallocate the wrong thing. As for leaking memory, that's pretty much built into the design with a single block of memory model, unless you want to keep calling realloc() to shrink the buffer (then you just blow the fragmentation of malloc to hell -- which if you're lucky will just make things slower).

      The "idea" of the article wasn't bad, it's just a very bad description of the proposed solution ... and a couple of years too late.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    5. Re:realloc by Anonymous Coward · · Score: 0

      Yes, I come back with some numbers on a modern CPU.
      On the PPC 970 there is a hardware perfetch which only works for consecutive reads (most other modern processors are the same). Linked lists cause cache misses which non-linked as the processor is already prefetch them.

      There is a different than wasting space and leaking memory. Leaking means that you have no longer reference to it and there is no way for you to free it. Most of the time when you call realloc with a smaller size, the memory does not move at all but you should not be calling realloc too much as my point of allocating more memory than you need to begin with, this is. Also linked lists you have to keep track of which linked has been freed and when, while a block, you only have to keep track of that one block.

    6. Re:realloc by Anonymous Coward · · Score: 0

      No GC should not be part of the language, it is a runtime issue and should not be seen as really a language issue at all. Also there are ways to get a GC for C, see boeh (sp).
      Another way is to do what GCC does and use a mark and sweep aproach for global variables. This works for most of the time but some times (like the C++ front-end someone stores something in a register or on the stack and it does not so you have to push/pop contexts and you cannot free what you really want to free so you end up with keeping a large amount of wasted space until you can collect).

    7. Re:realloc by Anonymous Coward · · Score: 0

      Did yo read the article ? What about platform that don't share the same memory management API ?

      It's about portability.

    8. Re:realloc by koh · · Score: 1

      Also note that many modern incarnations of linked lists use an allocator that requires sequential memory and make their own chunks from that.

      AFAIK even glib has these.

      --
      Karma cannot be described by words alone.
    9. Re:realloc by Ninja+Programmer · · Score: 1
      ...But linked lists are worse on performance than realloc.
      Care to back that up with some facts.
      Linked lists require links -- this puts pressure on the L1 cache. Linked list access is also necessarily serial, meaning that you lose all parallel execution potential (from high level software all the way down to the CPU core). Sub-blocked operations require full iteration through the intersection. And finally, of course, there is the small matter of random access.
    10. Re:realloc by Ninja+Programmer · · Score: 1
      When you call realloc, you're very likely to cause the data to be copied from the old buffer to the new buffer. This is very high overhead. [...]
      The way you deal with this is by not doing it very often. See: The Better String Library.
    11. Re:realloc by Nevyn · · Score: 2, Informative
      Linked lists require links -- this puts pressure on the L1 cache.

      I said facts, not theories. Yes, I know all about cache and his friend random access. Maybe you'll take a noticable cache hit when moving between nodes ... and yes, certainly doing memcpy() over a single large block X will be faster than doing it X/20 times for 20 byte nodes (20 was the figure given in the IBM "research" article). But that doesn't take into account the time taken to call realloc() each time to expand the large block ... or the amount of space wasted if you have a large amount of data followed by mostly small amounts of data ... or the sharing that is possible in a node based system (thus almost eliminating the cost of copying).

      And I did provide the simple benchmarks (which showed that with sane node sizes the vector approach is better than realloc() + memcpy()), and code for somewhat more complication cases that try to model simple network IO applications (which show the single block of memory is only competitive when you use more memory and add complication to the algorithum to reduce copying ... or use lots more memory).

      But feel free to prove me wrong, I'd certainly study anything you came up with. And yes, I'm taking the csv code into account ... on normal input the node based code was within 50% IIRC, and on "difficult" code it was over 100% faster (again IIRC) ... and your code was a lot more hand optomized.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    12. Re:realloc by sjames · · Score: 1

      Realloc can work well if you always malloc the largest buffer you possibly need, then realloc smaller when the real size is known. For that to work well, the allocations either need to be serial (that is malloc, load, realloc), or you also have a need for smaller blocks that will fit in to the space made by reallocing down.

      In some systems such as where you are the only process on the CPU, you can do compression (copy down) when waiting for external events (like an interrupt). At that point, the cycles and memory bandwidth would otherwise go to waste.

      The right tool for the right job.

  6. Memory Allocation by rmohr02 · · Score: 4, Funny

    Just like slashdot allocated extra space for the third "l" in "alllocation".

    1. Re:Memory Allocation by Anonymous Coward · · Score: 0

      Just like slashdot allocated extra space for the third "l" in "alllocation".

      No, the story submitter is Welsh, boyo.

  7. Hmmm by JMZero · · Score: 4, Interesting

    Many small programs are no longer memory, or even performance, constrained. As such, a reasonable strategy for a lot of desktop software is to allocate a huge buffer at startup, and do repetitive flushes and complete reloads of data (always using the same pre-allocated buffers).

    This is simple to do, and avoids a lot of errors. It's also not much of a headline.

    --
    Let's not stir that bag of worms...
    1. Re:Hmmm by schmaltz · · Score: 2, Insightful

      reasonable strategy: allocate a huge buffer ... do repetitive flushes and complete reloads of data ...

      Um, this is reasonable how? If you're coding on a 3GHz dream machine w/4GB 400mhz RAM, there will be somebody out there who, quite reasonably, will want to run your code on a 64MB Pentium 133... Testing only on unencumbered machines makes for delusional developers. ;)

      I'm split: either yer trollin' /., or you gotta tell us what OSS projects you've contributed to, with the above philosophy!

      --
      Big Daddy, Johnny, Burp, Aunt Zelda, Scott, Slurp, Big Momma ... where's Siggy?
    2. Re:Hmmm by Nevyn · · Score: 2, Insightful
      This is simple to do, and avoids a lot of errors. It's also not much of a headline.

      While I agree the IBM "research" article is terrible, the idea behind it isn't.

      Actually having donetests and benchmarks. I can safely say:

      • It's not the simplest solution.
      • It's certainly not anywhere near fast.
      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    3. Re:Hmmm by cybergrue · · Score: 1

      Many small programs are no longer memory, or even performance, constrained. As such, a reasonable strategy for a lot of desktop software is to allocate a huge buffer at startup
      It has been my experience that this was done when systems were constrained, ie. test to see if there is enough system memory at start-up instead of running out of memory in the middle of execution. This was apparently so prevelent that some vendors (SGI for one) changed malloc to use a "first touched" memory model. In this model, memory is not allocated when you allocate it, but when it is first used. Hogging lots of memory in a multi-process environment that isn't being used means that none of the other prcesses are not using it either. Memory management on modern O/S's is much more efficient when using a just in time approach to memory useage then allocating large blocks because of the expenses related to paging back and forth from and to virtual memory.

    4. Re:Hmmm by sjames · · Score: 1

      Note that in Linux and many other OSes, malloc and friends don't actually cause physical memory to be committed to the process, they just create an unmapped virtual area. RAM is only committed when writing to a page area faults a page in. Reading from such a virtual area will fault THE zero page in (one page in the whole system containing zeros), writing to a zero mapped page will fault in a different page.

      So, you can allocate a 1GB buffer and read it all to verify that it contains only zeros, at a cost of just a few bytes of ram (where the kernel tracks the virtual memory area).

      Of course, in other situations, that behaviour is undesirable since you can't know for sure you actually got the requested memory resource until you try to use it. Where absolute predictability is required, a malloc will actually commit memory immediatly.

  8. uhhhh wtf by Anonymous Coward · · Score: 0

    I use a self-managing memory buffer .. it's called Ruby. Or Python. Or Perl. Or Java. Or C#. Or a freakin' garbage collector.

    Seriously, IT'S A SOLVED PROBLEM!

    PS: is anybody a little unnerved that C is now considered a "legacy" language? C happens to still be my favorite language. Though I guess I do admit that I hardly ever use it ...............

  9. Wow! by arkanes · · Score: 4, Funny
    Man, I never thought of that. An abstract memory buffer. What a concept! I don't need to define the lengths of everything at compile time then!

    Now, I'll need a nice short catchy name for it... oh! I know! I'll call it a heap!

  10. Useful by dtfinch · · Score: 2, Interesting

    But this is like teaching calculus students remedial math. The "Level: Intermediate" at the top of the article should have given that away.

  11. The solution is one of the "bad ones"... by mkeeley · · Score: 1

    One of the interacting parties defines the underlying memory allocation mechanism for data exchange. The other party always uses the published interface to allocate or free buffers to avoid possible inconsistency. This model requires both parties to stick to a programming convention that may not be relevant to the software's basic functionality and, in general, can make the code less reusable.

    And the proposed solution requires both parties to stick to the common adbtract buffer interface.
    Hmmm!

    1. Re:The solution is one of the "bad ones"... by Anonymous Coward · · Score: 0

      Anyway, at some point you have to define something which the whole code has to stick to. But instead of making it a "convention", isn't it better to make it an interface ?

      I think the key point of the article exerpt is "can make the code less reusable".

  12. A memory leak IN THE SPECIFICATION?!? by MarkusQ · · Score: 4, Funny

    From the article:
    • The pLostBlock points to the linked list's last memory block.

    pLostBlock?!? This almost sounds as if it's designed to leak!

    -- MarkusQ

    P.S. Seriously, I think this is a fine idea, if not particularly earth shaking. But the typo was too ironic not to point out.

  13. This is new? by ivern76 · · Score: 0, Redundant

    I'm sorry, but this is just about as complicated as an elementary data structures assignment. Wow, an arbitrary size memory buffer with an underlying linked list. Hot shit!

  14. Vstr by Nevyn · · Score: 5, Informative

    The article basically proposes a very bad implementation of Vstr, most of the advise was extremly simplified at best but more likely just uninformed: an "efficient" abstract buffer that mixes shorts and pointers -- words almost fail me, how to solve the problem of "what do you do with the data when it's all in the buffer" -- "let's just copy it back out again (hey whats a couple of extra copies between friends). Representing in memory object sizes with "long int" *sigh*.

    If you are interested in the article, go read this explanation of why you want it for security and this explanation of why you want it for speed .

    Vstr is LGPL, has actual benchmark data behind the block sizes it picks, has an extensive test suite ... and has documentation for the many functions that come with the library (including a fully compliant printf like function). Of course, I don't have a PhD ... but after reading this, you might well count that as a plus too

    --
    ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    1. Re:Vstr by 0x0d0a · · Score: 1

      Representing in memory object sizes with "long int" *sigh*.

      FWIW, this is not as trivial as one might think (from your project, I suspect you're aware of the issues involved).

      I've been working on a program that ptrace()s another program and has to keep track of ranges of memory. Just because I like doing things properly, I figured that using a void * to store memory range offsets would be sufficient. Problem is, C doesn't define some operations that I needed to do (like modulus) on pointers. So I get the possibility of casting to...an int of some sort. Grr. It took me quite some time to run across intptr_t, which is an integer type guaranteed to be able to hold a pointer.

    2. Re:Vstr by Nevyn · · Score: 1
      Representing in memory object sizes with "long int" *sigh*.
      FWIW, this is not as trivial as one might think [snaip ...] I figured that using a void * to store memory range offsets would be sufficient. Problem is, C doesn't define some operations that I needed to do (like modulus) on pointers. So I get the possibility of casting to...an int of some sort. Grr. It took me quite some time to run across intptr_t, which is an integer type guaranteed to be able to hold a pointer.

      Well I meant the sizes, not the actual pointers ... his skeleton API doesn't seem to need to get the values of pointers (which, as you say, you'd want intptr_t or unsigned long/uintmax_t etc.). But I was talking of the fact that he has "totalLength" defined in the struct, as a "long int" instead of say "size_t" or at the least "unsigned long".

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
  15. Re:2 words for this article.. by netsharc · · Score: 0, Offtopic

    You're just jealous he's smart and hard working enough to have made it to IBM.

    --
    What time is it/will be over there? Check with my iPhone app!
  16. Re:This isn't Slashdot worthy by 0x0d0a · · Score: 1

    Not only is it not new or interesting -- it's effectively already present in pretty much all general purpose systems.

    On my Linux box, if I malloc() a megabyte buffer, but only ever write to the first page of that buffer, the VM system will only ever hand me a page or so to use.

    Probably a bit oversimplified, since overcommits might cause pressure to write out buffer data, but still...WTF is this guy thinking?

  17. Re:2 words for this article.. by Anonymous Coward · · Score: 0

    Hard working? Yes. Smart? Hardly.

  18. Not News by SkewlD00d · · Score: 3, Informative

    Embedded systems do this, they have a pool of Buffers and a BufferManager that allows you to do effectively your own memory mangement (and in some cases, static memory management). malloc() and free() are usually really slow, so if you can save 99% of those calls by reusing memory blocks, you can really speed up your programs.

    --
    The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
    1. Re:Not News by The+Vulture · · Score: 1

      Heck, in some cases, malloc and free don't work very well.

      I do a lot of work in vxWorks. The version that I routinely use in supporting a legacy product (5.4) doesn't do any collection on the free'd spaces. Thus, you can very easily kill the system by doing something like this:
      while (1)
      {
      ptr = malloc(rand());
      free(ptr);
      }

      Eventually you'll wind up where memory is so fragmented that it can't find an appropriately sized chunk for a given size, even though there's plenty of memory available.

      It's easier to just use malloc to grab a bunch of memory for yourself, and then manage it with your own code that divides it up.

      -- Joe

    2. Re:Not News by SkewlD00d · · Score: 1

      i worked w/ a metroworks system that could not use operator new( ) nor malloc(size_t) after system init. There was a line of code that would execute, and all memory had to be statically allocated from the master memory pool. We had a Buffer class and different size Buffer's you could alloc from our own BufferManager class, mainly for IP packet and RS-232 packet buffer passing. Strings were statically compiled. AFAIK, the system was fairly stable, and they're weren't generally any memory leaks *grin*.

      --
      The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
  19. Re:2 words for this article.. by Anonymous Coward · · Score: 0

    No Shit I followed this link thinking there would be some revolutionary article...what a letdown.
    Just like all the other IBM pieces that appear here: blatantly obvious, or wrong, or FUD.

  20. Re:This isn't Slashdot worthy by Anonymous Coward · · Score: 0
    WTF is this guy thinking?

    portability.
  21. Indeed by JMZero · · Score: 1

    I'd suggest this approach only in small footprint sort of apps, or apps where performance footprint or convention means that not much else will likely be running (eg. fullscreen games). In many apps, the memory requirements are consistent enough that total demand is going to be the same either way.

    Mostly I just hate people to be doing lots of work in C to save 30k of system memory - and ending up with a buggy program full of memory leaks. Many apps have data sets this small (30k) and yet are spending lots of time/effort managing memory.

    --
    Let's not stir that bag of worms...
  22. Hmmmm by JMZero · · Score: 1

    You have to understand that most software people write isn't like what you're thinking. If you are writing software for a large audience and long term use, you obviously have to be more careful with your strategies. For many apps, though, you don't require this sort of robustness - and you probably aren't going to spend enough effort to do everything well. As such, if you're micro-managing memory then you are likely also creating memory leaks and bugs.

    Also, I'm not suggesting you allocate a 30 meg buffer at startup, I'm suggesting you allocate a 30k buffer at startup. Many programs are micro-managing this size of dataset, and it's a waste of time.

    or you gotta tell us what OSS projects you've contributed to

    I've written a lot of free software. For example, try Jumpman Zero for the Palm (link in sig). If you looked at the code for that, you'd see that I allocate all the space for level data when the program opens and leave it allocated the whole way through. It's a few K spent, but the program never has memory crashes (assuming the Palm has enough memory to start the thing).

    --
    Let's not stir that bag of worms...
  23. Just to be clear... by JMZero · · Score: 1

    I like the strategy IBM describes. My strategy is obviously suitable in different places. My intended point was that I didn't figure either would be novel ideas (headlines) to most programmers.

    Your benchmarks, on the other hand, are a good headline. Going into a project, you usually have a fair idea of your options for memory management and how long they'll take to implement. However, you don't always have a good grasp of the performance implications - your breakdown is handy.

    --
    Let's not stir that bag of worms...
  24. SCO owns C++ (and C too?) by G3ckoG33k · · Score: 0, Offtopic

    "And C++ programming languages, we own those, have licensed them out multiple times, obviously. We have a lot of royalties coming to us from C++"

    Darl McBride, SCO

    WTF?! It's true, he said that. Read more here and here

  25. GC in one thread is the fatal bottleneck by Anonymous Coward · · Score: 0

    All threads will block whenever they need to allocate when there is garbage to collect.

    1. Re:GC in one thread is the fatal bottleneck by cookd · · Score: 1

      Yep. That was covered in the part where I was talking about how garbage collectors have really improved in the past few years. That is why most GCs nowadays are concurrent, adaptive, and incremental. Otherwise they interfere with the application too much.

      --
      Time flies like an arrow. Fruit flies like a banana.
  26. Re:What a crap story... by Anonymous Coward · · Score: 2, Funny

    How about this: Solve all of your memory management problems by switching to visual basic! All memory management is done automagically. No need to even think about it! Just hook up your data bound controls and write your logic code. No more memory worries :)

  27. Re:What a crap story... by Roydd+McWilson · · Score: 0

    Unfortunately, some people are too busy laughing at languages like VB to see where it is actually useful.

    --
    THE NERD IS THE COMPUTER.
  28. BSD mbufs by Anonymous Coward · · Score: 0

    What he is describing sounds like the mbufs used by the BSD networking code. Fixed-size blocks arranged in linked lists, with the option to have a partial first and/or last block.

    Or am I completely wrong?

    1. Re:BSD mbufs by Anonymous Coward · · Score: 0

      Yes, I thought the same thing when I read the article.

      And mbuf's have the advantage of many hours in operation, hence more mature and better tested.

  29. So use C++ already by Animats · · Score: 3, Insightful
    Watching C programmers create home-brew objects is pathetic. If you need to encapsulate data structures, use C++.

    Yes, C++ has a host of problems and Strostrup and the C++ committee refuse to fix them. But the STL is a huge improvement on malloc/free. (They still can't get auto_ptr right, though.)

  30. I/O buffers for unknown incoming data by Anonymous Coward · · Score: 0

    These concatenated buffers don't solve my problem of needing a buffer to store packets being retrieved from read(). Before an unknown chunk of data can be read(), I must create a buffer of MAXSIZE. As I can't tell the size of the data until I read() it, I have no choice but to allocate large buffers.

    Also, when I want to write() a buffer, it must be in contiguous memory therefore contatenated buffers can't be used.

    Finally, these buffers remind me of the BSD mbuf structures.

  31. I must be missing something by boutell · · Score: 1

    I see some of the point in this, but what's wrong with:

    unsigned char *buffer = 0;
    b(&buffer); /* note: address of */
    free(buffer);

    --
    Check out the Apostrophe open-source CMS: http://www.apostrophenow.com/