C Coding Tip - Self-Manage Memory Alllocation
An anonymous reader inputs: "The C programming language defines two standard memory management functions: malloc() and free(). C programmers frequently use those functions to allocate buffers at run time to pass data between functions. In many situations, however, you cannot predetermine the actual sizes required for the buffers, which may cause several fundamental problems for constructing complex C programs. This article advocates a self-managing, abstract data buffer. It outlines a pseudo-C implementation of the abstract buffer and details the advantages of adopting this mechanism."
*cough* garbage collection *cough*
A deep unwavering belief is a sure sign you're missing something...
Anyone who's done C coding for more than, oh, a day would have already figured that out. It's not a coincidence that every programming language that doesn't have "smart" arrays built into the language ends up with some sort of buffer class (Java's ByteStream class, C++'s stream IO buffers, etc).
The fundamental problem is that this sort of thing needs to be done at the C library level. And if it's not done in a flexible fashion, you end up with a library call that rarely gets used. Anyone used hsearch() lately?
If only clib streams (FILE* and friends) were extensible, this article would never have had to be written.
c.
Log in or piss off.
This article, I believe, has already been published in the well known programmers' journal "No shit Sherlock - monthly"
Just like slashdot allocated extra space for the third "l" in "alllocation".
Many small programs are no longer memory, or even performance, constrained. As such, a reasonable strategy for a lot of desktop software is to allocate a huge buffer at startup, and do repetitive flushes and complete reloads of data (always using the same pre-allocated buffers).
This is simple to do, and avoids a lot of errors. It's also not much of a headline.
Let's not stir that bag of worms...
Now, I'll need a nice short catchy name for it... oh! I know! I'll call it a heap!
But this is like teaching calculus students remedial math. The "Level: Intermediate" at the top of the article should have given that away.
When you call realloc, you're very likely to cause the data to be copied from the old buffer to the new buffer. This is very high overhead. The article discusses how to do similar things, but without this unecessary copying (eg, low overhead). It's actually not that interesting of an article as what it describes is hardly new and I believe any competent programmer could come up with that solution when faced with the particular circumstances that inspired it.
Realloc works by seeing if there is free memory after the end of the allocated block, and changing the block's size if so. Realloc can do this because it knows about the internals of the malloc/free implementation. If there is allocated memory right after the block in question, a new block must be allocated, as you cannot "move" the later block in a language like C where any memory location can be a pointer. You could try this kind of stuff in other languages (or in some bastardized C where you do not have direct access to memory, but go through more indirection, the next logical abstraction after the article), but when you start automatically finding/checking/updating memory pointers, you get into GC.
You may be able to overcome some overhead on realloc if you move the problem down into the kernel. The kernel could play page table games so there is little or no actual copying involved, just updating of page tables. This would be fairly easy to implement, but I don't think anyone's done it because (a) flushing the relevant TLB entries could hurt performance more than the copying, and (b) the system call overhead might be more overhead than the copying. Realloc is generally only used for small buffers (due to programmers knowing about the copying overhead) and this trick would only have gains for large buffers spanning multiple pages. For small buffers, the library-level realloc could avoid the system call and do the copying itself, avoiding system call overhead and TLB entry flushes.
This scheme I describe could make for an interesting paper (especially determining for what size of buffer and what type of program it has gains), but I doubt it would make much difference in real system performance as programmers avoid realloc for large buffers, and there are very few cases where one needs direct linear access to a large range of memory rather than being better-served by organizing that memory into some data structure.
From the article:
pLostBlock?!? This almost sounds as if it's designed to leak!
-- MarkusQ
P.S. Seriously, I think this is a fine idea, if not particularly earth shaking. But the typo was too ironic not to point out.
The article basically proposes a very bad implementation of Vstr, most of the advise was extremly simplified at best but more likely just uninformed: an "efficient" abstract buffer that mixes shorts and pointers -- words almost fail me, how to solve the problem of "what do you do with the data when it's all in the buffer" -- "let's just copy it back out again (hey whats a couple of extra copies between friends). Representing in memory object sizes with "long int" *sigh*.
If you are interested in the article, go read this explanation of why you want it for security and this explanation of why you want it for speed .
Vstr is LGPL, has actual benchmark data behind the block sizes it picks, has an extensive test suite ... and has documentation for the many functions that come with the library (including a fully compliant printf like function). Of course, I don't have a PhD ... but after reading this, you might well count that as a plus too
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Care to back that up with some facts.
So you a) have the library code do that ... and b) have lots of tests. Of course you want to do both of those for something using a single block of memory too, and if you want it to be efficient it usualy does something clever to avoid copies ... and so is probably more likely to screw up and use/deallocate the wrong thing. As for leaking memory, that's pretty much built into the design with a single block of memory model, unless you want to keep calling realloc() to shrink the buffer (then you just blow the fragmentation of malloc to hell -- which if you're lucky will just make things slower).
The "idea" of the article wasn't bad, it's just a very bad description of the proposed solution ... and a couple of years too late.
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Embedded systems do this, they have a pool of Buffers and a BufferManager that allows you to do effectively your own memory mangement (and in some cases, static memory management). malloc() and free() are usually really slow, so if you can save 99% of those calls by reusing memory blocks, you can really speed up your programs.
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
How about this: Solve all of your memory management problems by switching to visual basic! All memory management is done automagically. No need to even think about it! Just hook up your data bound controls and write your logic code. No more memory worries :)
I said facts, not theories. Yes, I know all about cache and his friend random access. Maybe you'll take a noticable cache hit when moving between nodes ... and yes, certainly doing memcpy() over a single large block X will be faster than doing it X/20 times for 20 byte nodes (20 was the figure given in the IBM "research" article). But that doesn't take into account the time taken to call realloc() each time to expand the large block ... or the amount of space wasted if you have a large amount of data followed by mostly small amounts of data ... or the sharing that is possible in a node based system (thus almost eliminating the cost of copying).
And I did provide the simple benchmarks (which showed that with sane node sizes the vector approach is better than realloc() + memcpy()), and code for somewhat more complication cases that try to model simple network IO applications (which show the single block of memory is only competitive when you use more memory and add complication to the algorithum to reduce copying ... or use lots more memory).
But feel free to prove me wrong, I'd certainly study anything you came up with. And yes, I'm taking the csv code into account ... on normal input the node based code was within 50% IIRC, and on "difficult" code it was over 100% faster (again IIRC) ... and your code was a lot more hand optomized.
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Yes, C++ has a host of problems and Strostrup and the C++ committee refuse to fix them. But the STL is a huge improvement on malloc/free. (They still can't get auto_ptr right, though.)