Secure, Efficient and Easy C programming
cras writes "Feeling a bit of masochist today.. First in the morning I wrote Secure, Efficient and Easy C Programming Mini-HOWTO. And since I already spent a few hours with it, I figured I might just as well see what Slashdot people would think about it."
WARNING: this is not a fp
Move along.
I think we already know how to do that. Don't use WinApi.
----
Go canucks, habs, and sens!
Pick any two.
first post! :D
:) Nice try though.
how much do these slashvertisements cost?
"First in the morning I wrote Secure, Efficient and Easy C Programming Mini-HOWTO..."
Damn. What are your plans for the rest of the day?
Perl is Secure, Efficient and Easy.
"First in the morning I wrote"
So did you wake up early this morning, or are you still up from the night before, like me?
-- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
you talk about "memory pools" without saying the word "heap"!
NOTE: This is a temporary location for this document, selected with slashdot people specifically in mind.
Copyright (C) 2002 Timo Sirainen <tss@iki.fi>
Index-
Introduction
-
Memory Allocations
- The Old Ways
- Data Stack
- Memory Pools
- Memory Pool API
-
String Handling
- The Old Ways
- String API
- String API with Memory Pool API Support
-
Buffer Handling
-
Real World Usage
IntroductionEveryone knows about buffer overflows nowadays. Everyone knows some ways to prevent them. But most people still do it in a way that requires them to be unnecessarily careful while coding - even a single carelessly written part of code can be a security disaster.
None of these ideas I've written about is new, but most of them are very rarely used with C programs. I think it's mostly because many people don't know about them or just haven't realized how they could be easily used. Sure, there are also people who will never change their way of coding. And there's also the problem that using non-libc functions make the program bigger, which is pretty annoying with otherwise very small programs.
Besides not only making your code more resistant to buffer overflows, I think many of these features will actually make writing C-code a lot easier and more fun.
I'm not a writer and I'm not too good at english, so sorry about all the spelling and grammar errors :) All of this stuff was written at sunday
morning, tired after being awake the whole night and not being able to do
anything useful..
Memory Allocations The Old Ways:- Use static buffers - fast, but lack the ability to grow when needed.
- malloc() - slighly slower than static buffers, possible memory
fragmentation, and most importantly it requires the memory to be freed,
which can be sometimes very annoying. It's easy to forget to free
the memory and cause memory leaks. free()ing already freed memory may
also be an exploitable a security flaw.
- Garbage collector - this would be the best way to manage memory.
For example OCaml's garbage collector is quite smart by treating
long-living allocations differently than temporary allocations. However
with C it's not really possible - you can't go moving the allocated
memory elsewhere unless you write your program in special way. So there's
a few simple non-portable garbage collector implementations for C, but
they're not much more different from malloc()ing other than that you
don't need to free() memory. They're not fool proof either.
Data StackWhat I haven't yet seen used anywhere outside my own software and some programming languages internals (eg. calling Perl code from C), is using data stack for temporary memory allocations. It has the most important advantage of garbage collectors; allocate memory without worrying about freeing it. It also has a few gotchas, but I'd say it's advantages are well worth it.
The way it works is simply letting the programmer define the stack frames. All memory allocated within the frame are freed at once when the frame ends. This works best with programs running in some event loop so you don't have to worry about the stack frames too much. Here's an example program:
Advantages over control stack:
Advantages over malloc():
Disadvantages:
The second disadvantage is the most problematic one. It's actually two problems; first it shouldn't be mixed with permanent data, and second it shouldn't be temporarily stored to location where it could be accessed outside the stack frame.
The first one is easier to handle. The ideal solution would be to have some tags in C language that would give warning if being lost, for example temporary char *str = t_strdup("str"); char *str2 = str; where the str2 assignment would give a compiler warning about the missing temporary tag. But since this isn't possible we can do almost as well using const keyword, which has exactly that behaviour but restricts us from modifying the memory. Luckily that's not usually needed.
The second one is much more difficult, I haven't found any good way to handle it other than by a) don't even try it, b) fill the freed memory when freeing stack frame so the corrupted value is easily noticed at runtime.
And why ever even do that? Because it's simpler in some situations. For example I have this message header parser, which calls a callback function for each header line. Suppose I only want to find the message's Content-Type, so save it with context->content_type = t_strdup(content_type); Then later I just read it without need to worry about freeing it. Now, the only problem with this is that it relies on the parser not to create a stack frame around the callback which would invalidate our saved return value. While writing the callback it's easy to check if it's possible, but what if the parser was later modified without remembering the special needs of the one callback..
Memory PoolsAlloc-only pools can be quite useful for storing larger amount of return values from some function, especially when data stack can't be used (the second problem case above). They still provide relatively easy memory management since you only need to free the pool once.
Alloc-free pools could be useful with some applications to prevent memory fragmentation and memory leaks by grouping related data together into single malloc() block. They could also make the program run faster due to (possibly) better CPU cache utilization. Besides performance reasons, they could also be useful for statistics to find out where exactly the allocated memory goes. They'd however need their own internal malloc() implementation so they're not very simple.
Memory Pool APIWhen you have a function that could return a large return value, it's not very efficient to first store it into data stack and then copy it somewhere else. The traditional way would be to just give a (void *result, size_t size) as parameters. This again has the problem that we might not know at all what buffer size is large enough.
So, one way to deal with this is to give the function a memory pool object which can be used by the function to allocate the correct amount of data. Besides the dynamically created alloc-only and alloc-free pools there could be global pools as well: data_stack_pool and system_malloc_pool. Here's an example:
extern Pool data_stack_pool, system_malloc_pool;This is where most of the buffer overflows have happened. Nowadays people are more aware of the problem, but many still they do it in unnecessarily hard way.
The Old Ways:- strncpy(), strncat(), snprintf() - only snprintf() of these is easy
to use safely but it's still somewhat unportable (Windows). strncpy()
doesn't necessarily NUL-terminate and many people misunderstand how
strncat() works (ie. in very stupid and difficult to use way).
- strlcpy(), strlcat() - much better replacements to above by OpenBSD.
Very unportable, but you can easily create your own ones. But these
can still be used unsafely if the buffer size parameter is wrong or
if the programmer goes playing around with the buffer indirectly, by
eg. appending single characters and missing size checks (yes, I've seen
this in software that contained "secure" in it's name).
- Dynamically allocating the amount of wanted memory and then using
strcpy(), strcat(), sprintf() and direct accessing. This requires you
to be very careful with the string size calculations. I don't understand
why so many people think that's not a problem, they have this "If you
can't calculate the sizes correctly, you're stupid and you shouldn't be
coding at all" attitude. Why bother wasting time with that at all when
you could be doing more important things?
- Dynamically growing buffers, used by for example GLIB, vsftpd, qmail,
djbdns and Postfix. This is definitely the right way; string
manipulation is done through API which discourages - or even disallows -
direct buffer manipulation.
String APIThere's several slight variations how to implement the dynamically growing buffers. Most work by allocating a new string object, using it and then freeing it. qmail and djbdns uses statically created buffers which are reused and never freed - that's pretty dangerous unless you can be sure the buffer isn't in use anymore. It's actually pretty much the same as having static char bigbuf[8192] which is used by several functions.
There's two ways how to manage a buffer - first being the explicit way done by eg. djb:
struct stralloc str = {0}; stralloc_copys(&str, "string"); stralloc_cats(&str, str_variable); stralloc_catulong0(&str, int_variable);GLIB supports doing this much more easily:
Or if you wanted a modifyable buffer:
GString *str = g_string_new(NULL); g_string_append(str, "string"); g_string_append(str, str_variable); g_string_sprintfa(str, "%d", int_variable);Some people don't seem to like the printf-style or believe it's unsafe. GCC however gives very good warnings about parameters with incorrect type so I don't think there's any need to worry about.
String API with Memory Pool API SupportNothing fancy here really. Just by being able to specify the used memory pool we can easily allocate strings from data stack. I've made several wrapper functions so that instead of p_strdup_printf(data_stack_pool,...) I can just use t_strdup_printf(). Here's a list of few useful functions:
const char *t_strdup(const char *str); char *t_strdup_noconst(const char *str); const char *t_strdup_empty(const char *str);t_strdup_noconst is there mostly to avoid casting const away in those few situations where it's needed. Because it returns char * it could be more easily mixed with permanent data so it's usage should be kept minimal.
t_strconcat() is one function that I also copied from GLIB. It's a bit dangerous though, the terminating NULL is too easy to forget. I've been thinking about removing it entirely, but it's much more efficient than t_strdup_printf() so I haven't yet had the heart :)
Buffer HandlingMany people concentrate only on string related buffer overflows. Yes, they've been the most common ones but they're quite known now and there's more to buffer overflows than just them. The most obvious one is just another type of buffer handling. All the other data that couldn't be called strings. Often people just hack away separate memory allocations and/or size checks for them. The problems are exactly the same as with strings.
If we create a buffer API and write to buffers only through it, we would prevent almost all buffer overflows, even integer related ones, simply because the buffer API implementation is the only part of code that directly writes to memory, and that of course should be well audited not to overflow in any situation.
Buffer API would be somewhat similar to string API, ideally the string API should be created using the buffer API, or they could even be the same. My buffer API currently looks like this:
Then there are several write, append and truncate functions and a few functions to temporarily limit the accessible buffer range. An example:
int fill_buffer(Buffer *buf, struct data *data) { struct bufdata *bufdata;There is of course also the possibility of buffer overflows while reading data. That's a bit more difficult to prevent except by careful coding, but usually it's not such a big deal anyway. It might crash the program, or at worst could expose some private information to attacker. If the later is a real problem, you should use multiple processes and IPC to hide the private data.
Real World UsageI've written a fully featured IMAP server using pretty much these techniques. The code still needs some cleanups and there's probably a couple of bugs left, but mostly I think it's not too bad :) The library functions in it are MIT licenced, so
go ahead and rip them to your own programs. I could try to create some
minimal library out of them if there's enough interest.
vsftpd uses it's own string API for pretty much everything. In only few of the files it's even possible to create a buffer overflow with the used coding style.
It does look like a good start, add a few more chapters and you will be halfway there...
Secure, Efficient and Easy C programs YOU!
"From the oxymorons dept."
Why does Secure, Efficient, and Easy have to be oxymoronic? That they can't exist together? Seems this guy has done just those things...
Seriously, sometimes the snide little remarks from the editors are worse than the trolls and flamebaits.
1) Use python with C bindings
Why not fork?
It's kind of funny how this guy voluntarily slashdotted himself by submitting an article with a link to his own site, crashing it instantly. :)
I found strlcat and strlcpy easily ported - simply toss them in the same .c file and dump it into the makefile!
;) Eh, I suppose we all have a certain way of doing things that we don't wish to part with. (*points at the unsafe buffer people*)
On a more serious note, why in Bob's name don't these two functions exist, standard, in Linux? IMO, they should be added, and gcc should give deprecation warnings about the use of non-safe buffer handling functions - sprintf, strcat, strcpy, etc. No offense to purists, but screw the standard. I'll sacrifice some portability of software and such for security.
Oh, and on a side note, you may take my malloc() when you pry it from my cold dead fingers.
in that folks who use C can avoid common pitfalls. But so much of this seems like it has been tackled by C++. Only C++ did it cleaner. C++ is complex though. So this only leaves (horrors) a higher level language that removes all of these implementation details that lead to insecure programs.
Do it in a higher-level language first. Make sure your algorithms are clean and efficient. If and only if you see a performance or resource problem do you rework portions(!!!) in C. As a bonus, the higher level language acts as a code template for faster C development.
Once you are at that point, this Mini-HOWTO will definitely be a great resource to use.
- I don't need to go outside, my CRT tan'll do me just fine.
The way it works is simply letting the programmer define the stack frames. All memory allocated within the frame are freed at once when the frame ends. This works best with programs running in some event loop so you don't have to worry about the stack frames too much. Here's an example program:
That sounds a little like the NSAutoReleasePool in Cocoa/OpenStep. Objects use reference counting, when the count reaches 0, they deallocate themselves. When an object is created, it can get added to the most recent pool. When the pool is deleted, it decrements the reference count of all the objects within it, causing deallocation unless it needs to be kept around longer.
Do you even lift?
These aren't the 'roids you're looking for.
when it just won't flush. Just don't get me started about overflow.
--this post has caused an invalid page fault--
FUCK YOU
My name is Bond, Troll Bond.
Slashdot [slashdot.org]
Dunno, like any other good Slashdotter I reply before reading anything.
I'm going to start putting that at the end of everything I write so that people can't criticize anything I do. As a matter of fact... I think I'll only write on Sunday mornings after not sleeping the night before. It seems like it's always Sunday morning anyways.
Sex - Find It
cras, why did you choose to write the IMAP server in C ? You mentioned OCaml in passing in the article; would that have worked ?
I gave it a quick read, and it's a good start. Most important thing is to keep working on it. There are already so many problems with people that never take the time to learn how to code securely, you don't want people stopping by, reading only what you have, and thinking they can now pump out error-proof code. And that is exactly what will happen if you leave just that part up.
Not attacking you, I really do think it's great work so far. Make sure you do the world justice, however, and continue to work on the project.
Sig.i>
But I can't follow this HOWTO very well. Does he have a global variable stored in the file with t_push and t_pop so that t_sprintf can use that variable? But if he has a global variable, than all he's really doing is allocating the maximum amount of memory his program will ever need at the beginning, and managing his memory.
Perhaps working until 4 in the morning on C code has drained my ability to understand.
I expect you to die.
You must be to ask slashdot's opinion of your toils!
--Keeping the flame wars alive, one post at a time
Some guy spent a couple of hours writing a first draft of a Howto. Thanks Slashdot, I'm sure glad you didn't let this one slip through the cracks! Besides, who cares about these kludgy ways of handling memory. If you don't wan't to worry about memory allocation use C# or java or something. Otherwise, stop eating quiche and write solid code.
"There's a madness to my method." -mthed
it starts off with denouncing GC as oldfashioned, and then proceeds to tout stack-based allocation, which has been available for ages as the alloca() function (which also has portability problems.)
imho, you should use the Boehm Garbage collector, unless you have code that must be guaranteed to be free of space leaks.
Han-Wen Nienhuys -- LilyPond
"Secure, Efficient and Easy C programming in 24hrs"
Secure, Efficient and Easy C Programming Mini-HOWTO
.
Was anybody else thinking the next line was going to say . .
Pick any two.
Did you really read the strncpy and strncat manpages?
To both zero-terminate and check for truncation is arcane, that's why the OpenBSD ppl made strlcat and strlcpy in the first place.
There are already other secure programming faqs, though AFAIR, they suck too. If I were you, I'd put a HUGE disclaimer to take this page as work-in-progress.
(before flaming, write down the correct code to check for truncation for both funcs)
I read the FAQ but can not program any better than before. is something wrong with me or is I just dumn ?
No "Imagine a beawolf cluster of these C minifaq" obligatory 'jokes' yet?
Im dissapointed...
Okay, let's preface. This guy has a good idea in the memory allocation department.
Problem 1:
It's not easy, nor fast to write. Errors are severe if present and undetected. Code required to be reliable might not be a good place to test this allocation method.
Problem 2:
I'm not entirely sure these concepts are very portible outside of GCC. May not be a big deal to most, but uh, multiplatform code is required in some enviroments.
Problem 3:
Any speed increase without massive resource wasting is pure dumb luck during heavy usage, unless used in an application that takes little user input or has limits on the ammount of input.
Just my $0.02.
It's that way everywhere else, too.
I still have yet to write a single useful C program that I couldn't have done in Perl.
--
the strongest word is still the word "free"
Some of my personal favorites include:
- Exceptions in C. You can get quite natural-looking exception handling in C, with some convoluted macros. I'm sure most hardcore C coders have come up with their own implementations. Many security bugs happen in parts of the code that handle errors, precisely because errors are rare, and those parts of the code don't get tested well. Using a unified, exception-driven approach to error handling can cut down the risks. IF you do it right.
- The alloca() function. This allocates memory directly off the stack, which is freed when the function returns. Very useful for cases where you want a stack buffer but aren't sure how big it needs to be. Like any other stack buffer, you need to take care not to overflow it. There are portability concerns with this function, but it can still be useful.
- Variable-sized block-chained allocators, which pull chunks of memory out of preallocated segments. The segments are chained together in a linked list. Very effective when you need to make a lot of variable-sized allocations, and do it fast, dammit. It also makes freeing the allocated memory blazingly fast, although it's a "free all or none" approach.
- "Hardened" allocators, which allocate blocks in multiples of the page size, and set memory protections in such a way that buffer overruns cause crashes. This is the easiest way to prevent ANY kind of buffer overrun vulnerability, but wastes memory. See Electric Fence.
Look people.. It takes a keen eye and major discipline to write secure C code. It is not impossible. You have to get in the habit of subconsciously checking yourself at EVERY turn. "Am I accessing a stack variable? Am I doing it CORRECTLY?"DISCIPLINE, DISCPLINE, DISCIPLINE. I fully expect to see the usual barrage of comments to the tune of: "C is outdated, insecure, brittle, yadda yadda..." No. Some PROGRAMMERS are "outdated, insecure, and brittle."
The C language doesn't write bugs. Programmers write bugs. If the programmer can't handle C, then take it away from him. But don't try to take it away from ME.
See HeapAlloc and friends in Win32 for proper implementation.
At any rate, there are better ways to make sure one never leaks memory problems:
1) always set a freed pointer to 0. Most architectures have a predictable behavior in dereferencing a 0 (throws an exceptions).
2) Limit all malloc/free pairs to the same function. If a function just has to allocate and return some buffer, give it a meaningful name to that effect and all a corresponding free version. Then, you can follow the above rule.
3) assert()s are your friend. Use them religiously. They can always be shut off.
4) Use memory tracking software (purify) before ship.
Yes, it's easier to shoot yourself in the foot with C, but you'll gain a huge performance increase. It's all about using the right tool for the right job.
int func(int a);
func((b += 3, b));
If someone wrote a compiler to compile this shit
/* Shit.c, If this compiles, then thanks for writting a shitty compiler */
#include <stdio.h>
#define shitty code
int(main)
{printf("shitty code")}
why not just use visual basic?
say what you want about it, you don't have to use stupid hacks to avoid buffer overflows.
This looks like a good idea for a lib (or more) that covers those issues. It might be already exists, but it's not very well known. (To beginners)
The other problem is that security issues usually aren't mentioned in general programming tutorials (and books).
If beginners would be pointed to techniques like this (with explanations why) lots of typical mistakes would not happens.
--
Stefan
Looking for Developers, new project members, testers or help? Want to provide your abilities ?
DevCounter ( http://devcounter.berlios.de/ )
An open, free & independent developer pool.
He thinks standards should be thrown out.
First off, C++ objects can force the use of all data access through assert()-filled methods, then in optimized mode can be inlined and thus reduced to their C equivalents.
Second, destructors in C++ guarantee clean up of objects, regardless of how you leave scope (natural, return, exception, etc).
Finally, you couple destructors and reference counting auto-pointers, and you have yourself a very nice allocation API that's as easy as Java, but without the performance or unnatural destruction logistics.
pools take heaps in you.
You heard me.
Bite me!
13 year old white supremacists are shitty web designers.
... a beowulf cluster of these "why didn't anyone post an imagine-a-beowulf-cluster-of-these post" posts!
In my last project, I used glib from the ground up. I wrote several thousand lines of code before testing it. I made some very aggressive use of glib and gobject. After the code compiled and did not give any runtime warnings anymore it did not contain a single memory leak (verified using valgrind).
glib containts a lot of useful things: lists, trees, hash tables, memory pools, string handling functions and a lot more, everything thread safe.
gobject contains tools on top of glib like "classes" and "objects". It's not the same as in C++ or java, but also very useful. Runtime classes oder data types, generic object properties, reference counting, signal callback, runtime type checking, etc...
The code ist now full of g_... and it took longer than usual because I had to read the documentation, but I think these libraries are very great, and provide a solution for nearly everything that has to do with abstract data types and dynamic memory allocation.
And it's very lightweight, fast and efficient.
I remember seeing you in a raging homosexual porn movie! You were taking a seven inches cock down your throat!
What if the result would be bigger than the output buffer? strncat() does the "right" thing, and doesn't overflow the buffer. But your string just got truncated! That's probably bad. So, suppose you check for this problem, by examining the string lengths beforehand. You verify that the result will fit, and not be truncated.
But now that you've gone to that trouble, and you know that the result will fit, why bother with the strncat()? Since you already know there is no overflow, you can go ahead and use the (faster) function strcat().
Now, in order to avoid these problems, you might write your own string concatenation function, that first computes the total size needed for the result, allocates it, and then copies the strings into this new buffer. Now, the issue of buffer ownership comes up, and you introduce a new class of possible bugs: memory leaks.
The fact is, in any non-garbage-collected language like C, string handling is a pain in the ass. The problem runs deep, and can't be solved by any quick hack like strlcat().
You watch homosexual porn movies?
Yes, that was me, but it wasn't a homosexual movie. I'm a girl. And yes, I can take a seven inch cock down my throat. I love giving head. If you can find somebody with a seven inch cock, bring him over and I'll demonstrate for you.
Since every male who posts to slashdot is a stunted teenager with an itty bitty penis who feels funny in his bathing suit area whenever he sneaks a look at the brassiere section of the Montgomery Ward's catalog, I'm not really worried about your showing up at my door any time soon.
The gist of this article seems to be just about preventing buffer overruns. Granted, buffer overflows are the source of a great number of security issues, but with the right arsenal of helper functions (see the StrSafe API, these functions are all guaranteed to always null-terminate, from their name you can determine if they take byte counts or character counts, etc.) you can easily prevent 99% of all buffer overruns. The remaining are all the weird edge conditions (I've seen buffer overruns that only came about when there was race condition between two threads, for example.) At this point of time, I complain loudly about any code I review that uses strcpy, sprintf, gets, or any other function that can writes to a memory buffer but doesn't take a bound. If you write code that is susceptible to these easily preventable bugs, you deserve all the wrath you will incur.
What about all the other aspects of writing secure code? They don't even get mentioned. What about canonicalization errors, trusting input that comes from external sources, or failing to a non-secure state? Sticking to just memory, what about double frees, which was the cause of a security hole in a common library about a year ago?
Even within buffer overflows, what if you pass the wrong buffer size to your safe memory handling function? This may seem like an easy enough bug, but when people are quickly writing code, they may use the correct function but pass sizeof(str) when str is a char *, and is four bytes on most systems, or the wcslen of a unicode string, which returns characters, not bytes, to functions that require a byte length.
This may be going a bit off an article which the poster admits was written briefly, but what people have to realize is there is no magic bullet to system security. Using some set of functions instead of another will not prevent all security issues, it will only prevent certain security issues. There will always be some new bug or class of attack you will have to guard against. Pretty soon, you're going to find someone is deleting tables from your database because you have a format string like: "select * from foo where name=%s". You may have guaranteed that the user cannot pass an extremely large string to you that will overflow your buffer, but you've just made easy for them to inject arbitrary SQL into your query. And when you start fixing those problems, you're going to see that attackers are getting around your URL filtering by passing you URL's in alternate formats. There will always be new types of attacks, and new types of coding bugs you have to watch out for.
So what do I suggest?
First, keep up to date on attacks and guard your code against them.
Your applications should be secure by default. Don't allow insecure defaults and don't run with features enabled that don't have to be. Every new feature in your program is an area that can be attacked. There was one site I heard of that had a DB server running with a blank admin password. Users would interact with the DB through a web page. However, attacks just bypassed the web page and connected to the database server directly. If the database didn't open up an internet port by default and didn't allow blank passwords by default, the attackers would have had a harder time.
Presume your security measures will fail, because eventually, they will. Don't presume that the attacker can't get through certain boundaries you've set up. Just because the attackers breached your firewall, that doesn't mean they should be able to access your database.
Run everything with the lowest privilege that is required. The above SQL format string can have statements injected into it to delete records or drop tables. What if the app connected to the database with a read-only account, or with a read-only connection? The statement would just fail.
And always remember, you are never finished, because there is no such thing as a secure system. You can unplug your server, drop it into a concrete bunker and have it guarded by men with guns, and it is still not 100% secure.
I still have yet to write a single useful C program that I couldn't have done in Perl.
Can you write a video driver with acceptable performance in Perl? Can you write programs that do things other than text manipulation, such as (say) a 3D engine and make them faster in Perl than in C? Remember that in the real world, time is money because a shorter execution time means lower system requirements and thus a larger market for mass-market desktop applications.
Will I retire or break 10K?
was creating a new c compiler that was built from top to bottom for security. How about creating a new language built just for security? C is very low level and devs have difficulty creating software thats secure.
Well, not quite. An NSAutoReleasePool does not allocate a large region of memory and suballocate objects out of that. What an NSAutoReleasePool does is make it possible to avoid explicitly sending the release message for temporary objects.
For example, from Foo() I allocate an NSObject with [[NSObject alloc] init] and pass that as an argument to Bar() which takes ownership of it. However, I must then ensure that I release the object because Bar() is following good coding practices and retains it, so thus with alloc+retain it's reference count is now 2. So instead what I do is Bar([[[NSObject alloc] init] autorelease]) which allocates NSOjbect (with ref count one) initializes it, marks it for autorelease, and passes sends it to Bar() which retains it (ref count 2) and keeps a pointer to it (presumably it is a method of a class). Coming out of bar the ref count is now 2, and perhaps Foo() proceeds to do some other things. Presumably at some point higher up the call stack (or perhaps at the beginning of Foo()) an NSAutoReleasePool was allocated. At the corresponding exit point (either at the end of Foo() or the end of whatever higher up function) [whateverpool release] will be called. When the pool is released, it will call release on any objects it has been asked to take ownership of. At this point one of two things it true. Either the class that Bar() belongs to has already released the object and thus its reference count went back down to one, and now is going to zero (so bye-bye), or the class that Bar() belongs to has not released the object and doing this release merely brings the refcount back to one such that when the other owner releases the object, its refconut will be zero and it will be freed.
Sorry if that was confusing, but in reality it's really not. It also really helps out when you are coding functions that allocate ObjectA, then allocate ObjectB, then ObjectC, and then find out something is wrong and need to "roll back" to the begining. If you allocate an NSAutoReleasePool at the beginning, and autorelease everything you alloc then if you error out you can free the release pool and everything gets released. If you don't you can simply retain what you need and then free the autorelease pool.
Anyway.. what this guy is REALLY talking about is NSZone. NSZone allocates a chunk of memory which other objects will be allocated from. The caveat being that while the memory will be freed, the objects will not be properly destroyed. Now this guy was talking about holding C strings and the like, so this is not a problem. However, had he been holding some C++ or objective-C objects this would be a problem as none of the destructors/deallocators would ever be called.
I think what it all boils down to is that programmers need to read more code than they write and that we should really be getting Masters of Fine Arts in Programming. I completely agreed with what Dr. Gabriel said. Programming is about as much like building a bridge as writing poetry is. That is to say.. not much.
Going along with that thought, I think it should be pointed out that /EVERYONE/ here who programs in any language (but specifically C programmers, and ESPECIALLY C++ coders) needs to learn Cocoa and Objective-C. I imagine some of the C++ whiny bitches are going to continue to whine about how much easier and better C++ is, but for those of us who actually prefer to wrangle pointers, Objective-C is where it's at. It's like C with JUST enough object orientation, but not overdone in some committee like C++. Also, one should note that I do like C++ quite a bit, but sometimes there's too many provided ways to do things. With Objective-C, the provided ways are almost all good. In addition, like C or C++ you are not limited to doing it that way, it's just that Objective-C only makes it easy to do good things.
Think for example of wxWindows vs. Microsoft MFC. wxWindows is suprisingly similar to Cocoa (although wxWindows does not do ref counting so making sure that one and only one class ever owns an object can be problematic at times). MFC, on the other hand, is rather a bear to work with as Microsoft has written it such that an MFC programmer /can/ do things multiple ways, none of which work very well. Obviously this is a generalization, but I think the average MFC programmer will understand where I'm coming from here. That is, again, except for the whiny C++ and MFC bitches who can't figure pointers out. Go home!
How to program in an amazing unthread safe manner!
I think the HOWTO should have a reference to obstacks, rather than claiming data stacks are a new invention. (Hint: data stacks have been used many, many times in many, many projects. GNU obstacks are the only one for which I can find a URL at the moment.)
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
We would have been better off with assembly for the stuff that needs speed and higher languages for anything else. C is an abomination. K&R were smoked up when they came up with C. I guess that's why I can't stand Unix or Unix like operating systems.f el,
Assembly,
Cobol,
Java,
VB,
C#,
Eif
Lisp
Get rid of C, it's insecure and hung on too long.
Use C++, Auto pointers and STL.
Seriously though, why are most developers sticking with C at all? I know there are some portability issues with complicated templates and namespace mangling in C++, but I've never run into any problems from them porting code from Linux->Unix->Win32->Mac. What is the allure to sticking with C over C++?
I suppose most people think of C++ and they immediately think that everything has to be purist OOP development, but if you can forget that C++ is basically exactly what its title says, C plus extra flexability.
Besides, classes are effectively the same thing as having a pointer to a struct and passing it into procedures to do something with. A really common style in C programs. Except the passing around of the structure, and the new and delete's on them are all hidden in de/constructors and the 'this' pointer.
Aaron
AaronCameron.net
The main problems with it versus broader garbage collection schemes are circular references and overhead.
If two (or more) objects have a reference to one another, the count can never reach zero even if nothing in the main logic points to those objects anymore.
Also, every time an object gains or loses a reference, a check for a count of zero is made. In fuller garbage collection setups, periodic checks are made to all of the objects in a low-priority thread. In some cases, memory usage can be higher, but performance is also higher sometimes and it can handle circular references.
Both are better than repeated use of malloc/free and new/delete though.
--
C also muddies this concept because there are no objects in C.
- I don't need to go outside, my CRT tan'll do me just fine.
From the FreeBSD manpage:
ALLOCA(3) FreeBSD Library Functions Manual ALLOCA(3)
NAME
alloca - memory allocator
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <stdlib.h>
void *
alloca(size_t size);
DESCRIPTION
The alloca() function allocates size bytes of space in the stack frame of
the caller. This temporary space is automatically freed on return.
RETURN VALUES
The alloca() function returns a pointer to the beginning of the allocated
space. If the allocation failed, a NULL pointer is returned.
SEE ALSO
brk(2), calloc(3), getpagesize(3), malloc(3), realloc(3)
BUGS
The alloca() function is machine dependent; its use is discouraged.
FreeBSD 5.0 June 4, 1993 FreeBSD 5.0
You should more clearly mark, what gain can be expected by which measure. Allocating on the stack (with alloca() or something similiar) gains you speed, some convenience, but no security (buffer overflows are more readily exploited to inject harmful code, if the buffer is allocated on the stack).
;-} .
You failed to describe what's wrong with strncat(), strncpy() etc. IMHO people who can't comprehend the man pages for those functions probably should avoid C altogether, but definitively must be hindered to write security relevant software (as should sleep-deprived coders who try to do it on a Sunday morning
Said that, I can only appreciate your attempt to raise this issue (once more, maybe for a new generation of C coder).
spellcheck.slashdot.org!
Montgomery Ward has been out of business for about 5 years now. Where have you been?
Seems a bit on the "strings" side, so I assume the text is not complete.
What I wanted to read on was how to create modular programs with C, as in function pointer arrays and how to generally modularize the application. My attempts at building larger apps have resulted in instability, and I do not want to get into C++. Maybe some details on howto allocate mem less frequently in larger chunks would be also useful..
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
Regarding scanf(3), many people don't realize this is Bad:
scanf("%s %s", cmd, arg);
This is Good:
scanf("%79s %79s", cmd, arg);
This prevents a buffer overrun if a word contains 80 or more consecutive non-white characters.
Ditto for sscanf(3) and fscanf(3). Never forget the N+1 when declaring the arrays (eg. char s[80] vs %79s) to leave room for the NULL.
Here's a good command to run on all your .c files to find such problems:
And in a document like this, *definitely* point out the whole gets(3) problem; the granddaddy of them all. Never use gets(3), period. Use fgets(3) instead.
The gets(3) interface is inherently insecure; a problem waiting to happen by its mere existence. Any code that uses it is broken.
There are probably some others (someone mentioned strcpy) I'll try to post more if I think of them.
Java!!
I tried for 5 years to come up with a clever sig...only to realize that I am not clever.
Java??
I had a quick peruse at the web site. I must admit the vector class in the C++ STL is well worth learning. It's not as quick as the usual error checking you get with arrays, but it is very secure. And once you know that you can move onto lists and maps.
;o)
But hey, it's not C. Ohhhh for a program that is so power hungry I have to write it in pure C.
strncpy(), strncat(), snprintf() - only snprintf() of these is easy to use safely but it's still somewhat unportable (Windows).
What? You're saying windows doesn't have those functions?! What a load.
After that you might want to consider publishing something. This kind of 'howto' is of no use to anyone but the writer, as a reminder. No excuses about "this was done in only X hours" or "it's sunday morning and I'm bored". If you need to include those, think think THINK whether you really have anything to say to anyone.
I've seen too many documents and pieces of software that are done with the sole idea of making yourself visible. Sometimes it is fruitful to release unpolished software or documents, but usually it isn't. As you and so many others seem to think, it's always easier to rewrite everything over and over again rather than do it right the first time. I don't think so.
You go on and on about secure software and all you have to say on a topic as broad as this is a couple of lines of stuff every half-decent programmer should know and any decent programmer knows how to avoid. And then you just mention another document that discusses these things in depth. Why write anything then? Just to get your superior memory management (that's been done for decades now) some publicity?
Also your prejudice against all other languages (like C++) is quite hilarious. Of course no-one would use an IMAP server that was done in C++. The first thing people think is the language it was programmed in. Sure. If you'd said only "this is the only language I know" or "it was not written in language X because...", it would be a bit different.
Sorry if this seems too judgemental or personal, but we all see too much of this kind of stuff floating everywhere and not enough of real documents about important topics.
So... amidst the accusations of small penises and furtive masturbation, this is what you choose to argue about?
Whether the final product can be interpreted or not isn't really releveant, nor is whether you need the vm code to make native calls...
It's prototyping. That means it's not the real thing.
a Secure, Efficient, and Easy to use Windows OS.
Rich.
libguestfs - tools for accessing and modifying virtual machine disk images
Some of the idea's aren't bad (and those have been done before), but mostly it's just another simple dynamic string library in C.
As for efficency...
...this pretty much speaks for itself. Why Is strconcat() so efficient compared to just doing strcat() multiple times? Because you've got a model for representing the data that has ZERO metadata, and a model for storing the data that requires you to reallocate bits of memory all the time.
Assuming you can just disacount all this overhead by using memory pools, is a simplistic outlook (for instance even if you waste gobs of memory so you don't have to call malloc that much you'll still need to do copies all the time)
There are more than a few much better string libraries out there for C. Probably the best for an IMAP server is probably Vstr as that was deigned to work well in an I/O context (For instance it doesn't need strconcat() like calls in the API because doing repeat adds is just as fast).
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Postfix (IBM Secure Mailer, written by Wietse Venema) is a very good example of how secure, modular, and elegant programming can be acomplished with C.
It's probably some of the best written C code I've every seen.
This is not a "nice try". It is a FAILURE. There are no shades of gray with first posts, only FIRST and FAILURE, and he has FAILED.
Nothing new here, APR has been doing all this for quite a while now.
Actually it's a rewrite of the theodice problem (spelling anyone?) or The Problem of Evil, or at least that's my interpretation of it.
about 15 years ago IIRC, see http://herd.plethora.net/~seebs/c/10com.html Still very good advice.
Never forget that C is just a machine independent assembler, you need to have a good understanding of how machines really work to be able to write good C programs.
Also: plan the code and code the plan. C is a language that bites you if you are sploppy.
Why is this bollocks?
To summarise the articles: a bunch of small libraries providing object-based memory allocation and string handling.
Kudos to the poster for enabling himself to write code in a way that's good for him. But that doesn't mean it's good for anyone else.
For example, I'm not going to go and learn 20 more function names and have more library dependencies and I wouldn't recommend anyone else does either.
Finally, suppose one wants a better string library or memory library. There are already plenty of good, with-much-work-done-on-them, open source libraries out there. Tried and tested. Not to mention the C++ STL.
Pick one that means many other people will also be able to read your code and be familiar with the libraries you use! There's nothing I'd hate more than working on a project written by someone using these libraries. Not only do I have to analyze the code, I have to analyze these libraries, and also manage to keep them and their quirks in my head while I am reading the program. Yuck.
Hacking away using C
Of course Bjarne Stroustrup would say this, but he has some nice examples backing his statements up, too. See his FAQ and his paper on "Learning Standard c++ as a New Language".
Stroustrup explains some nice details on especially this issue of memory constructs. He makes a convincing argument for why C++ is easier for C-style programming... Especially for those of you (One I saw below) who "Don't want to get into c++", realize that you can edge into it pretty easily, and accomplish your tasks more easily and quickly -- give it a try!
Guess what? I got a fever! And the only prescription.. is more cowbell!
- O'Caml is a marvel of strongly typed object orientation, but you'd hardly know it from using it -- there are almost no C-style type declarations; as a ML child, O'Caml uses type inferencing to prove powerful assertions about program validity and improve programmer convenience. It's compiled! And if you watch the ICFP's, you might note that it consistently beats C compilers for speed of execution. '92, if I recall.
- I never really bought OO, so S/ML is fine by me. Still compiled, since 1984.
- And they both descend from ML, started in 1973.
- Lisp was compiled in 59 or 62 (mccarthy or 1.5, chose your valid date). But then, I suppose it'd have to be compiled, since the notion of interpreted code hadn't been concieved of yet!
- Erlang is the last, best, word in concurrent programming. If you want to write a high throughput, reliable threaded application, you shouldn't even think of the word 'C'. This broke out of its lab in '87, first compiler in '91.
- Scheme is often thought of as a testbed for interpreted language concepts, but even it can be compiled, and with concepts such as continuations that can actually make a C programmer's head explode! Since 1982, commercial grade compilers have been available.
- Even haskell is compiled, but as monadic programming is less than 10 years old, no one knows how to always write really fast code in it yet. Leave your number, we'll call you in 2034, right before you gear up to deal with your year 2038 rollover crisis.
Welcome to the late 1970's! We look forward to your eventual arrival in the 80's and early 90's. Please enjoy your stay!ps. As modern coding is more about the manipulation of very complex structures, rather than how to say, walk a linked list; a higher level language, with native support for more complex constructs, has the potential for creating much faster applications than something on the level of C. The reason being is that the h/l compiler can reason about, and thus optimize over, larger components than the C compiler.
You can write C code to extend Python. It's a very common technique. The advantage of C, and it is a big advantage, is speed... but most programs don't need that speed advantage everywhere, only in a few intense and heavily-used operations. (Optimize the innermost loop...)
The advantages of Python for almost every other operation are really too numerous to list.
Your point about "right tool for the right job" is well taken. _Good_ Python programmers learn the C extension API, and use it when appropriate. Guido van Rossum, the creator of Python, even states in one of his papers "If you feel the need for speed, go for built-in functions - you can't beat a loop written in C."
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
Why do you say the Boehm GC can't do much with C? Have you actually tried it? Operation with C is one of its major strengths, and reasons for existence. It automatically collects allocated data that's no longer referenced. For memory management, what more do you need? Sizing buffer allocations is a separate issue, which can be dealt with separately.
The one problem Boehm GC can have is if data on the stack or in the heap happens to look like it contains pointers to allocated data, but doesn't actually, which can lead to space leakage. In practice, in most applications, this isn't a problem. If it is, in many cases, there are ways it can be dealt with.
The author doesn't address efficiency at all.
The author misses a crucial point: overflows and underflows are caused by false assumptions about the INTEGRITY OF THE CONTENTS OF THE INPUT to the program, either from files, sockets, arguments, etc.
For example, if a web server sends a false HTTP "Content-length" header, then my browser had better not trust this information. It has nothing todo with miscalculating array sizes. It's not really a bug in the browser either, since the browser code ADHERES TO THE HTTP SPECIFICATION and assumes that the server does, too.
The problem is in the assumption about the integrity of the data passed to the program. Get it?
Why use C when you can use Java/C# and other evoluated languages?
I mean, is C more "eleet"? If so, why don't you write ASM then?? Stop wasting your time with all those old shit, go forward!
Secure, Easy, Efficient C? Dont they call that...
Close but sadly lacking the last adjective
A girl named Stefan? Watch out fellow ACs, this is one of them evil NAMBLA freaks lurking in a 'chat room' like we keep hearing about...
As a matter of interest, how do you combine cleanup and exceptions in C? I mean if you throw an exception, you want this memory deallocated and this file closed and this lock released and so on.
I can think of a few ways to do it, but none of them are anywhere near "natural-looking".
Do you have any references?
(No, this is not a pro-C++ troll. I'm genuinely curious.)
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
The point of K&R was to jam programs onto paper books. Using K&R is to a computer what QWERTY is. It's utterly stupid to use K&R for actual programs on a computer -- BSD style is clearly the way to go. :-)
May we never see th
If you use glib (that blessed library that makes C programming pleasant instead of miserable, not to be confused with glibc), you get a couple of extremely handy functions:
g_strconcat() (mallocs a new string that's just large enough, concatenates all passed in strings to it)
g_strdup_printf() (mallocs a new string that holds the result of a sprintf()).
Most people's exposure to glib is usually through gtk+, which uses it heavily.
May we never see th
Wee bit of a functional bias, eh?
You're right that C limits large-scale optimization too much, though...
May we never see th
Many C++ programs and class libraries suffer from Smalltalk-envy. They want to be completely general frameworks, solving any possible future problem you might ever encounter. You find madness, bloat, and slipped deadlines if you go down that path.
But if you stick to the standard C library (which is part of C++) and approach C++ like you would C, you can write programs that work with C-like efficiency yet are considerably easier to make safe and secure.
For example, instead of the bloated and hard to debug STL, just define a 15 line template array class that does array bounds checking and deallocates its pointer when it's done.
OK, so that subject's a little inflammatory. But my god, I don't see why anybody is still writing new code in C in this day and age. C++ has been a fast, stable, standardized language for what - 10 years now? All the problems with buffer overflows that require hokey, kludgey workarounds in C are cleanly solved with any well-written string library (like, say, the one in the STL). Memory pools can be nicely wrapped with a class that pushes in the ctor and pops in the dtor, so you don't have to remember to call them in the right order everywhere (just declare an object at the top on the block).
The arguments I've seen against C++ seem to fall into the following categories:
* It adds bloat and it's slow
No, not since optimizing compilers were perfected in the 90s. You can add a lot of overhead to your app by abusing the STL, but for non-trivial applications, you'll never notice it. GCC (at least for the pre-3.0 series) has a really unoptimized template implementation, where "Hello, world" using cout would make a multi-megabyte executable (and be forever compiling it), but more modern compilers, like VC++ and Intel's compiler, do a lot better. Either way, for a real-world app, any size increase will be unnoticable. As for speed, with an optimizing compiler and judicious use of inlining, a C++ program will run just as fast as one written in C.
These complaints may have been true in the days of the Cfront preprocessor, but not today. I don't know about you, but I no longer write code for a 386 with 4 meg of memory.
* I don't like/need/want to learn OOP
You don't need OOP to use C++, but it helps. A class is just a struct where everything's private by default. If you know C, it takes about a day to learn the basics of constructors and destructors, references, and exceptions. Templates and STL will take a bit longer. One great about C++ is that you can just use small bits here and there if you don't want a full-blown OO program.
* It's not as good as Perl/Python/Ocaml/Eiffel/Java/whatever
That's not the point. It's not supposed to be. It's supposed to be as good or better than C. If you want a standalone-executable without linking in a complete interpreter and you don't need a lot of string parsing or regexps, you were using C anyway.
* It won't let me write libraries that work with other languages
Just declare all of your external APIs using 'extern C' and make sure they only use C types in their signatures. Done.
The main reason not to use C++ in new development seems to be "I don't want to learn it" or "I don't know anything about it". If you use either one, I don't ever want to work with you.
What if life is just a side effect of some other process and God has no idea we exist?
The howto would be much more helpful to people reading if you had implentations illustrating the concepts instead of just showing code using the APIs
...and while doing so explain how to hide data, make friendly header files, properly expose data, in C.
Treatment, not tyranny. End the drug war and free our American POWs.
See my user info for links.
I don't mean to sound pissy about this, but I've been a professional programmer since 1979, and I've seen more sins committed with C than any other programming tool. I've written a few hundred thousand lines of C myself, and avoided the infamous memory and string problems, as have a lot of coders. But as long as people insist on using tools like C that require that level of diligence to create robust code, software will continue to suck.
1. There are more ways to exploit C code than looking for buffer overflows. race conditions are a more prevalent and portably exploitable vulnerability of a large body of C code (eg. config file integrity). Following the author's guidelines barely improves the security of any program. If you want to make your code secure, spend a month reading the 1000s of articles online (or at the ACM or IEEE or CERT/CC) about how software is compromised. This is so much crap.
2. This is so not news for so many reasons. Slashdot is becoming so out of touch (yes, isn't it cool knowing some freak shoved a Pentium giga-twat up his ass and replaced his eyeballs with WiFi LCD projectors?) I sometimes get nauseated reading these stupid "news" items.
Slashdot: News for the Herd. Stuff that goes "Baaaaaa"
is questionable. This is yet another example of something with more geek value than actual utility. Just because you like to use and old obsolete tools doesn't mean it's the most productive. It's the year 2002. This has been said before. Get Java, even C#. Of course the C fanatics will spew about the "Performance penalities" of Java. Get over it. Have you seen the latest benchmarks? JVM's have made a lot of progress. If you don't value your time, then by all means, stick with C. Heck, if you're really crazy about performance, use assembly! Otherwise, don't waste your time and use a modern language.
One serious problem I see with all modern C implementations and C in general is the lack of a robust next generation standard C library. The big reason languages like Perl and PHP became popular in the first place was their ease of implementation of common routine tasks. Things like regular expression pattern matching (a good extended regex split(), a good extended regex replace, things that dont exist anywhere in C), good hashing, etc. Things that perl excels at but C just lacks. No one cared enough to update the core standard reference. In my opinion its long due for an update. I love C because its compiled and you dont have to pass around the source code for everything you write, super fast and very portable. I just wish it had a good standard C library. The standard C reference library was written like 30 years ago and has not been updated since. Its just a damn shame that everyone out there is forced to re-invent the wheel with C every time they want to do the most simplest of tasks, because C doesnt have any good standard C library.
While compiler support is not complete, compilers (including GCC and ICC) provide a powerful subset of ISO C already.
Basically, now you can do things like:
size = compute_size();
char buffer[size];
use_buffer(buffer, sizeof buffer);
You can even do this in a loop, having the stack space reused (reallocated from the stack) in each iteration.
ISO C even allows passing these beasts as arguments to functions (with the sizeof operator working in the function):
void some_function(char buffer[*])
{ use_buffer(buffer, sizeof buffer); }
(Unfortunately, GCC does not support this yet.)
I recommend taking a look at ISO C!
regards, CJ
have you every tried to convert a C source code to all-caps? It is more secure this way and suitable for business programming.
qmail and djbdns uses statically created buffers which are reused and never freed - that's pretty dangerous unless you can be sure the buffer isn't in use anymore.
How is this a problem in single-threaded code?
It's actually pretty much the same as having static char bigbuf[8192] which is used by several functions.
Except, of course, that strallocs automatically grow, which is the whole point.
Its nice to see Im not the only person who has read glibc-doc. I was very interested in obstacks when I read about them last month but havent used them yet and this HOWTO is a nice read also.
:P
I think glibc-doc needs more humour so more people read it.
Pixels keep you awake!
Secure, Efficient and Easy C Programming
* Not possible. Stop using C for the love of god.
---
But seriously, am I the only one that thinks that circumventing C's lack of sophisticated memory management by using a hack involving surreptitiously using stack frame memory is BAD BAD BAD? Certainly not secure or easy?
How about the next howto:
"Shooting yourself in the foot, securely, efficiently and easily!"
It's 10 PM. Do you know if you're un-American?
The best exceptions-in-C implementation I've ever used is buried in SIOD by George Carrette. Despite using setjmp/longjmp, it's portable as all hell.
To a Lisp hacker, XML is S-expressions in drag.
kamy@mbnet.fi
anttika@mail.suomi.net
Everybody says this, but who has the data to back it up? What 10% are you talking about, the system code for fgets(), or your rendering algorithm?
Or is this another one of those fallacies that authors copy from each other's books (to paraphrase Wolfgang Langwieshe)?
And yet agean we can witness this wonderful rule at work...
it's called Net::TCP
*duck*
Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?