Slashdot Mirror


Secure, Efficient and Easy C programming

cras writes "Feeling a bit of masochist today.. First in the morning I wrote Secure, Efficient and Easy C Programming Mini-HOWTO. And since I already spent a few hours with it, I figured I might just as well see what Slashdot people would think about it."

25 of 347 comments (clear)

  1. Re:a little short?? by cras · · Score: 5, Informative
    It does look like a good start, add a few more chapters and you will be halfway there...

    Sorry, but I think this is about all I have to say. Secure Programming HOWTO should take care of the rest.

  2. stack allocation?? by hanwen · · Score: 5, Informative
    (yawn)

    it starts off with denouncing GC as oldfashioned, and then proceeds to tout stack-based allocation, which has been available for ages as the alloca() function (which also has portability problems.)

    imho, you should use the Boehm Garbage collector, unless you have code that must be guaranteed to be free of space leaks.

    --

    Han-Wen Nienhuys -- LilyPond

  3. strncat/strncpy are *NOT* intuitive by Alejo · · Score: 4, Informative

    Did you really read the strncpy and strncat manpages?
    To both zero-terminate and check for truncation is arcane, that's why the OpenBSD ppl made strlcat and strlcpy in the first place.
    There are already other secure programming faqs, though AFAIR, they suck too. If I were you, I'd put a HUGE disclaimer to take this page as work-in-progress.
    (before flaming, write down the correct code to check for truncation for both funcs)

  4. C++ can do this, and it'll look cleaner by ademko · · Score: 4, Informative

    First off, C++ objects can force the use of all data access through assert()-filled methods, then in optimized mode can be inlined and thus reduced to their C equivalents.

    Second, destructors in C++ guarantee clean up of objects, regardless of how you leave scope (natural, return, exception, etc).

    Finally, you couple destructors and reference counting auto-pointers, and you have yourself a very nice allocation API that's as easy as Java, but without the performance or unnatural destruction logistics.

  5. Re:Maybe being in the business world by cras · · Score: 2, Informative

    Think about stack and how that works. t_push() and t_pop() basically create and destroy a stack frame, just like your control stack does at the beginning and end of function. So sure it needs to use some global memory for it (not in global variable though), just like control stack does. t_sprintf() simply returns a pointer inside the stack frame.

  6. Re:These are common tricks by pclminion · · Score: 3, Informative
    This is what my "data stack" is trying to fix and do it fully portably. And alloca() still can't be used to allocate return values from it, which I think is the most useful feature gained by using data stack. I don't know about you, but I use a lot of functions that need to return dynamically allocated memory.

    I'm not seeing the difference between your data stack and a memory pool, except that you can divide it into frames which you can collectively pop off, and free entire contexts at once. But by making the data stack independent of the call stack, you introduce the possibility of the two getting out of sync. A context frame should probably always map to a single function invocation. Or to put it another way, a data frame pushed in a particular function call should always be popped by that same function call. And that kind of defeats the purpose of being able to return stack-allocated data UP the call stack.

    In contrast alloca() is a simple manipulation of the hardware stack pointer, which will be automatically undone by the hardware itself at the end of the call frame (on any sane architecture, that is). There's no possiblity for abuse.

    Any strlcat(), strlcpy(), etc. don't solve the underlying problem in all string operations, which is making sure you always have enough room. They prevent overflows, but they can still truncate data without you realizing it's happened. Unless you check first. See my other comment.

  7. Obstacks by p3d0 · · Score: 3, Informative
    The "data stacks" described in the article sound like a slightly less-evolved version of GNU obstacks. The main difference seems to be that the article uses a single global data stack.

    I think the HOWTO should have a reference to obstacks, rather than claiming data stacks are a new invention. (Hint: data stacks have been used many, many times in many, many projects. GNU obstacks are the only one for which I can find a URL at the moment.)

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  8. Re:Overly simplistic by cras · · Score: 2, Informative
    Granted, buffer overflows are the source of a great number of security issues, but with the right arsenal of helper functions (see the StrSafe API..

    That sounds like a yet another solution for safe string handling. Like I said, I think they're too highly advertised as being the only way to overflow buffers.

    The remaining are all the weird edge conditions (I've seen buffer overruns that only came about when there was race condition between two threads, for example.)

    Threads? Yeah .. I wouldn't really even bother thinking about security with threads.

    What about all the other aspects of writing secure code? They don't even get mentioned.

    There's now a link to Secure Programming HOWTO which talks about most of the other things just fine. Maybe I could write about a few other things that aren't too well discussed in that HOWTO, like integer overflows (although it's next version will contain several of my examples about them).

    Presume your security measures will fail, because eventually, they will.

    Not necessarily. Or are you talking about complete systems here instead of individual applications? If your application doesn't have external dependencies other than libc and you write it fully up to ANSI-C specs (that's a bit difficult actually) and in general you're careful enough, it's theoretically possible your program is secure now and forever. libc, kernel, user, etc. bugs are different things then, although you could try to prevent some of them as well (don't give dangerously parameters, don't use dangerous functions).

  9. reference counting by ttfkam · · Score: 4, Informative

    The main problems with it versus broader garbage collection schemes are circular references and overhead.

    If two (or more) objects have a reference to one another, the count can never reach zero even if nothing in the main logic points to those objects anymore.

    Also, every time an object gains or loses a reference, a check for a count of zero is made. In fuller garbage collection setups, periodic checks are made to all of the objects in a low-priority thread. In some cases, memory usage can be higher, but performance is also higher sometimes and it can handle circular references.

    Both are better than repeated use of malloc/free and new/delete though.

    --

    C also muddies this concept because there are no objects in C.

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
  10. What's the gain? by theSilentOne · · Score: 2, Informative

    You should more clearly mark, what gain can be expected by which measure. Allocating on the stack (with alloca() or something similiar) gains you speed, some convenience, but no security (buffer overflows are more readily exploited to inject harmful code, if the buffer is allocated on the stack).

    You failed to describe what's wrong with strncat(), strncpy() etc. IMHO people who can't comprehend the man pages for those functions probably should avoid C altogether, but definitively must be hindered to write security relevant software (as should sleep-deprived coders who try to do it on a Sunday morning ;-} .

    Said that, I can only appreciate your attempt to raise this issue (once more, maybe for a new generation of C coder).

  11. scanf and friends by usrerco · · Score: 5, Informative
    No such document should be without mention of scanf(3) misuse, and gets(3) use at all.

    Regarding scanf(3), many people don't realize this is Bad:

    • char cmd[80], arg[80];
      scanf("%s %s", cmd, arg); // BAD

    This is Good:

    • char cmd[80], arg[80];
      scanf("%79s %79s", cmd, arg); // GOOD

    This prevents a buffer overrun if a word contains 80 or more consecutive non-white characters.

    Ditto for sscanf(3) and fscanf(3). Never forget the N+1 when declaring the arrays (eg. char s[80] vs %79s) to leave room for the NULL.

    Here's a good command to run on all your .c files to find such problems:

    • egrep 'scanf(.*%s' *.c

    ..any lines that match are a potential problem.

    And in a document like this, *definitely* point out the whole gets(3) problem; the granddaddy of them all. Never use gets(3), period. Use fgets(3) instead.

    The gets(3) interface is inherently insecure; a problem waiting to happen by its mere existence. Any code that uses it is broken.

    There are probably some others (someone mentioned strcpy) I'll try to post more if I think of them.

  12. Just another C string library by Nevyn · · Score: 3, Informative

    Some of the idea's aren't bad (and those have been done before), but mostly it's just another simple dynamic string library in C.

    As for efficency...

    t_strconcat() is one function that I also copied from GLIB. It's a bit dangerous though, the terminating NULL is too easy to forget. I've been thinking about removing it entirely, but it's much more efficient than t_strdup_printf() so I haven't yet had the heart :)

    ...this pretty much speaks for itself. Why Is strconcat() so efficient compared to just doing strcat() multiple times? Because you've got a model for representing the data that has ZERO metadata, and a model for storing the data that requires you to reallocate bits of memory all the time.

    Assuming you can just disacount all this overhead by using memory pools, is a simplistic outlook (for instance even if you waste gobs of memory so you don't have to call malloc that much you'll still need to do copies all the time)

    There are more than a few much better string libraries out there for C. Probably the best for an IMAP server is probably Vstr as that was deigned to work well in an I/O context (For instance it doesn't need strconcat() like calls in the API because doing repeat adds is just as fast).

    --
    ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
  13. Re:future plans? by CaseyB · · Score: 3, Informative

    The One True Brace Style is K&R, by definition.

  14. Re:Unportable? by Anonymous Coward · · Score: 1, Informative

    Er, it's strncpy() and strncat().

    Those have been around for ages and ARE standard on most systems... No porting need... The "l" version is just the same 'ole thing.

  15. Bjarne says "Use C++" :) by wray · · Score: 3, Informative

    Of course Bjarne Stroustrup would say this, but he has some nice examples backing his statements up, too. See his FAQ and his paper on "Learning Standard c++ as a New Language".

    Stroustrup explains some nice details on especially this issue of memory constructs. He makes a convincing argument for why C++ is easier for C-style programming... Especially for those of you (One I saw below) who "Don't want to get into c++", realize that you can edge into it pretty easily, and accomplish your tasks more easily and quickly -- give it a try!

    --
    Guess what? I got a fever! And the only prescription.. is more cowbell!
  16. I send you this post to have your advice by denshi · · Score: 5, Informative
    Prototyping in a higher-level language (c# is easy, java everyone knows) is a superb idea, provided you
    - can release the final product as interpreted, with slow execution speed
    - can afford the time to port all to C, in which case DO, this is an excellent way to make a watertight C program
    Sir! I wish to introduce to you to the strange new-fangled notion of compiled high-level languages! Yes, languages with higher-than-C-level abstractions have been sneakily producing native machine code for some time now. Some of the most popular are listed below:

    • O'Caml is a marvel of strongly typed object orientation, but you'd hardly know it from using it -- there are almost no C-style type declarations; as a ML child, O'Caml uses type inferencing to prove powerful assertions about program validity and improve programmer convenience. It's compiled! And if you watch the ICFP's, you might note that it consistently beats C compilers for speed of execution. '92, if I recall.
    • I never really bought OO, so S/ML is fine by me. Still compiled, since 1984.
    • And they both descend from ML, started in 1973.
    • Lisp was compiled in 59 or 62 (mccarthy or 1.5, chose your valid date). But then, I suppose it'd have to be compiled, since the notion of interpreted code hadn't been concieved of yet!
    • Erlang is the last, best, word in concurrent programming. If you want to write a high throughput, reliable threaded application, you shouldn't even think of the word 'C'. This broke out of its lab in '87, first compiler in '91.
    • Scheme is often thought of as a testbed for interpreted language concepts, but even it can be compiled, and with concepts such as continuations that can actually make a C programmer's head explode! Since 1982, commercial grade compilers have been available.
    • Even haskell is compiled, but as monadic programming is less than 10 years old, no one knows how to always write really fast code in it yet. Leave your number, we'll call you in 2034, right before you gear up to deal with your year 2038 rollover crisis.
    Welcome to the late 1970's! We look forward to your eventual arrival in the 80's and early 90's. Please enjoy your stay!

    ps. As modern coding is more about the manipulation of very complex structures, rather than how to say, walk a linked list; a higher level language, with native support for more complex constructs, has the potential for creating much faster applications than something on the level of C. The reason being is that the h/l compiler can reason about, and thus optimize over, larger components than the C compiler.

  17. Not to belabor the point by xant · · Score: 3, Informative

    You can write C code to extend Python. It's a very common technique. The advantage of C, and it is a big advantage, is speed... but most programs don't need that speed advantage everywhere, only in a few intense and heavily-used operations. (Optimize the innermost loop...)

    The advantages of Python for almost every other operation are really too numerous to list.

    Your point about "right tool for the right job" is well taken. _Good_ Python programmers learn the C extension API, and use it when appropriate. Guido van Rossum, the creator of Python, even states in one of his papers "If you feel the need for speed, go for built-in functions - you can't beat a loop written in C."

    --
    It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
  18. Re:aggressive use of glib by chtephan · · Score: 3, Informative
    > But portable??? I'm serious, does it currently run on BSD, OS X, Windows, and Linux? And if so, then how much bloat does it add?

    Sure. The libraries are basically standard C. The optional thread library has implementations for pthreads and win32 CriticalSections. And there are even some additional compatibility wrappers to make some things more portable (e.g. listing directories).

    They are just utility libraries or foundation libraries to build more functionality on top of it. Especially the gobject library is great as a foundation to do modular programs.

  19. Re:Exceptions in C? by pclminion · · Score: 3, Informative

    You use a hybrid stack. You push two kinds of objects onto this stack: environment frames and cleanup frames. When you use the TRY macro to open a new exception frame, you push an environment frame onto the stack. This frame includes a jmp_buf struct which is filled in by a call to setjmp(). If an exception is thrown, control will transfer back via longjmp(), using this jmp_buf.

    Then, as you allocate memory, open files, lock mutexes, or whatever, you register cleanup functions with another macro. This pushes a cleanup frame onto the stack indicating a function to call along with a single argument.

    Then, if an exception is thrown, you just pop cleanup frames off the stack and execute them, until you hit an environment frame. At that point you call longjmp() to transfer control to the user-code exception handler, inside the CATCH macro block. That user code can choose to do cleanup or error recovery, or it can THROW again and continue propagating the exception up the stack.

    Figuring out how to implement the TRY, CATCH, ENDCATCH, and THROW macros is fun, so I won't give away exactly how to do it. It's simple, and involves a creative use of a do-while statement.

  20. Re:Unportable? by Anonymous Coward · · Score: 1, Informative

    No no no...
    strncpy and strncat do not guarantee to produce a null-terminated string. Instead of overflowing at the strn* call, they return an unterminated string, likely to cause trouble when it next gets used.

    The "l" versions guarantee they return null-terminated string.

  21. Re:+1 Insightful, PERL can do more! by Anonymous Coward · · Score: 2, Informative


    When I was a production coder for a decade writing internal apps, I could always find things that I could do in PERL that were almost impossible in C.

    My personal favorite was a typical "data pump" app that did some frontend work for a larger (15 GB+) data stream in order to massage it inline as it went from tape to an Oracle load (we didn't want to stage it on disk first).

    The PERL apps could *ALWAYS* soak a CPU on our Sun box while transforming the data.

    Try as I might, I could never get a C replacement program to do the same. At best, the C code posted percent utilizations in the low double digits.

    Pity, I like C, but if we keep using it, we will never be able to force management to buy another twin-CPU module. PERL rocks!
  22. Re:Unportable? by thogard · · Score: 3, Informative

    char buf0[4]="Ok?";
    char buf1[4];
    char buf2[4]="Hi!";

    strncpy(buf1,"Bonk",4);

    printf(buf1);

    You will usually get either "BonkOk?" or "BonkHi!"

    With strlcpy you will get "Bon"

    strlcpy will put the null in the last position, strncpy won't if the last spot has a value it in.

    That can result in bad code later with:
    char other_buf[4];
    strcpy(other_buf,buf1); // strcpy is safe because buf is only 4 :-)

    Or even:
    strncpy(other_buf,buf1,strlen(buf1)); //buf1 is only 4 char

    Strncpy isn't much safer than strcpy and gets and scanf are right out for the same reasons.

  23. Re:Unportable? by e-Motion · · Score: 2, Informative

    void my_strncpy(char *s, const char *t, int n){ while((*s++ = *t++) && --n) ; *s = '\0'; }

    If written

    void my_strncpy(char *s, const char *t, int n){ while(n-- && (*s++ = *t++)) ; *s = '\0'; }

    Then it will work when n==0, and continue to have the same behavior for all n > 0 as well. I like strlcpy()'s interface better, though, because the "number" parameter is the number of elements in the destination array, whereas my_strncpy takes one less than the number of elements in that array.

  24. Re:Definitely useful by Hezaurus · · Score: 2, Informative

    Kragg wrote: - can release the final product as interpreted, with slow execution speed

    Hmmm... goto the http://www.bagley.org/~doug/shootout/bench/heapsor t/ - you'll see that the gcc version is about 8 times faster. Now, get the sources and compile them yourself. Run three to five times to let the system stabilize then write the timings down. This is what you get:

    gcc with O6 (heavy optimizations) is almost 20% faster than the equivalent java code! Yet the author of those pages reports that the difference is more like 800%?! BTW: I tried that with three different VM's the best was IBM's 1.4.0, second one JRockit's 1.4 beta and the last Sun's 1.4.1 (each about 5% slower than the previous) the java code was run with '-server' the platform (linux/windows) didn't have any significant effect on the results.

    You are free to draw your own conclusions.

    --
    No matter how fast light travels it finds the darkness has always got there first, and is waiting for it. (T. Pratchett)
  25. Re:K&R designed for paper, not for monitors by psamuels · · Score: 2, Informative
    Yeesh - I never even knew that bracketing styles had their own fucking terminology before.

    So what do you type when M-x c-set-style asks you to pick a style? (:

    Your choices, btw, are: "bsd", "cc-mode", "ellemtel", "gnu", "java", "k&r", "linux", "python", "stroustrup", "user", and "whitesmith". Many of these are only subtly different from each other - frex, "linux" is basically "k&r" with an 8-space basic offset.

    That's why these styles need actual names - you can just blame it all on Emacs. (:

    --
    "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README