C Coding Tip - Self-Manage Memory Alllocation
An anonymous reader inputs: "The C programming language defines two standard memory management functions: malloc() and free(). C programmers frequently use those functions to allocate buffers at run time to pass data between functions. In many situations, however, you cannot predetermine the actual sizes required for the buffers, which may cause several fundamental problems for constructing complex C programs. This article advocates a self-managing, abstract data buffer. It outlines a pseudo-C implementation of the abstract buffer and details the advantages of adopting this mechanism."
*cough* garbage collection *cough*
A deep unwavering belief is a sure sign you're missing something...
Hmmm... Didn't this go into Unix kernels almost a decade ago - Thank you Sun
I have mod points and I am not afraid to use them
Anyone who's done C coding for more than, oh, a day would have already figured that out. It's not a coincidence that every programming language that doesn't have "smart" arrays built into the language ends up with some sort of buffer class (Java's ByteStream class, C++'s stream IO buffers, etc).
The fundamental problem is that this sort of thing needs to be done at the C library level. And if it's not done in a flexible fashion, you end up with a library call that rarely gets used. Anyone used hsearch() lately?
If only clib streams (FILE* and friends) were extensible, this article would never have had to be written.
c.
Log in or piss off.
This article, I believe, has already been published in the well known programmers' journal "No shit Sherlock - monthly"
It is called realloc, that is the real way that people should use to self-manage memory allocation, and something that detects leaks is also needed.
Just like slashdot allocated extra space for the third "l" in "alllocation".
Many small programs are no longer memory, or even performance, constrained. As such, a reasonable strategy for a lot of desktop software is to allocate a huge buffer at startup, and do repetitive flushes and complete reloads of data (always using the same pre-allocated buffers).
This is simple to do, and avoids a lot of errors. It's also not much of a headline.
Let's not stir that bag of worms...
I use a self-managing memory buffer .. it's called Ruby. Or Python. Or Perl. Or Java. Or C#. Or a freakin' garbage collector.
...............
Seriously, IT'S A SOLVED PROBLEM!
PS: is anybody a little unnerved that C is now considered a "legacy" language? C happens to still be my favorite language. Though I guess I do admit that I hardly ever use it
Now, I'll need a nice short catchy name for it... oh! I know! I'll call it a heap!
But this is like teaching calculus students remedial math. The "Level: Intermediate" at the top of the article should have given that away.
One of the interacting parties defines the underlying memory allocation mechanism for data exchange. The other party always uses the published interface to allocate or free buffers to avoid possible inconsistency. This model requires both parties to stick to a programming convention that may not be relevant to the software's basic functionality and, in general, can make the code less reusable.
And the proposed solution requires both parties to stick to the common adbtract buffer interface.
Hmmm!
From the article:
pLostBlock?!? This almost sounds as if it's designed to leak!
-- MarkusQ
P.S. Seriously, I think this is a fine idea, if not particularly earth shaking. But the typo was too ironic not to point out.
I'm sorry, but this is just about as complicated as an elementary data structures assignment. Wow, an arbitrary size memory buffer with an underlying linked list. Hot shit!
The article basically proposes a very bad implementation of Vstr, most of the advise was extremly simplified at best but more likely just uninformed: an "efficient" abstract buffer that mixes shorts and pointers -- words almost fail me, how to solve the problem of "what do you do with the data when it's all in the buffer" -- "let's just copy it back out again (hey whats a couple of extra copies between friends). Representing in memory object sizes with "long int" *sigh*.
If you are interested in the article, go read this explanation of why you want it for security and this explanation of why you want it for speed .
Vstr is LGPL, has actual benchmark data behind the block sizes it picks, has an extensive test suite ... and has documentation for the many functions that come with the library (including a fully compliant printf like function). Of course, I don't have a PhD ... but after reading this, you might well count that as a plus too
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
You're just jealous he's smart and hard working enough to have made it to IBM.
What time is it/will be over there? Check with my iPhone app!
Not only is it not new or interesting -- it's effectively already present in pretty much all general purpose systems.
On my Linux box, if I malloc() a megabyte buffer, but only ever write to the first page of that buffer, the VM system will only ever hand me a page or so to use.
Probably a bit oversimplified, since overcommits might cause pressure to write out buffer data, but still...WTF is this guy thinking?
May we never see th
Hard working? Yes. Smart? Hardly.
Embedded systems do this, they have a pool of Buffers and a BufferManager that allows you to do effectively your own memory mangement (and in some cases, static memory management). malloc() and free() are usually really slow, so if you can save 99% of those calls by reusing memory blocks, you can really speed up your programs.
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
No Shit I followed this link thinking there would be some revolutionary article...what a letdown.
Just like all the other IBM pieces that appear here: blatantly obvious, or wrong, or FUD.
portability.
I'd suggest this approach only in small footprint sort of apps, or apps where performance footprint or convention means that not much else will likely be running (eg. fullscreen games). In many apps, the memory requirements are consistent enough that total demand is going to be the same either way.
Mostly I just hate people to be doing lots of work in C to save 30k of system memory - and ending up with a buggy program full of memory leaks. Many apps have data sets this small (30k) and yet are spending lots of time/effort managing memory.
Let's not stir that bag of worms...
You have to understand that most software people write isn't like what you're thinking. If you are writing software for a large audience and long term use, you obviously have to be more careful with your strategies. For many apps, though, you don't require this sort of robustness - and you probably aren't going to spend enough effort to do everything well. As such, if you're micro-managing memory then you are likely also creating memory leaks and bugs.
Also, I'm not suggesting you allocate a 30 meg buffer at startup, I'm suggesting you allocate a 30k buffer at startup. Many programs are micro-managing this size of dataset, and it's a waste of time.
or you gotta tell us what OSS projects you've contributed to
I've written a lot of free software. For example, try Jumpman Zero for the Palm (link in sig). If you looked at the code for that, you'd see that I allocate all the space for level data when the program opens and leave it allocated the whole way through. It's a few K spent, but the program never has memory crashes (assuming the Palm has enough memory to start the thing).
Let's not stir that bag of worms...
I like the strategy IBM describes. My strategy is obviously suitable in different places. My intended point was that I didn't figure either would be novel ideas (headlines) to most programmers.
Your benchmarks, on the other hand, are a good headline. Going into a project, you usually have a fair idea of your options for memory management and how long they'll take to implement. However, you don't always have a good grasp of the performance implications - your breakdown is handy.
Let's not stir that bag of worms...
"And C++ programming languages, we own those, have licensed them out multiple times, obviously. We have a lot of royalties coming to us from C++"
Darl McBride, SCO
WTF?! It's true, he said that. Read more here and here
All threads will block whenever they need to allocate when there is garbage to collect.
How about this: Solve all of your memory management problems by switching to visual basic! All memory management is done automagically. No need to even think about it! Just hook up your data bound controls and write your logic code. No more memory worries :)
Unfortunately, some people are too busy laughing at languages like VB to see where it is actually useful.
THE NERD IS THE COMPUTER.
What he is describing sounds like the mbufs used by the BSD networking code. Fixed-size blocks arranged in linked lists, with the option to have a partial first and/or last block.
Or am I completely wrong?
Yes, C++ has a host of problems and Strostrup and the C++ committee refuse to fix them. But the STL is a huge improvement on malloc/free. (They still can't get auto_ptr right, though.)
These concatenated buffers don't solve my problem of needing a buffer to store packets being retrieved from read(). Before an unknown chunk of data can be read(), I must create a buffer of MAXSIZE. As I can't tell the size of the data until I read() it, I have no choice but to allocate large buffers.
Also, when I want to write() a buffer, it must be in contiguous memory therefore contatenated buffers can't be used.
Finally, these buffers remind me of the BSD mbuf structures.
I see some of the point in this, but what's wrong with:
/* note: address of */
unsigned char *buffer = 0;
b(&buffer);
free(buffer);
Check out the Apostrophe open-source CMS: http://www.apostrophenow.com/