Slashdot Mirror


Secure, Efficient and Easy C programming

cras writes "Feeling a bit of masochist today.. First in the morning I wrote Secure, Efficient and Easy C Programming Mini-HOWTO. And since I already spent a few hours with it, I figured I might just as well see what Slashdot people would think about it."

31 of 347 comments (clear)

  1. Secure, Efficient and Easy by Anonymous Coward · · Score: 5, Funny

    Pick any two.

  2. future plans? by napoleonin · · Score: 5, Funny

    "First in the morning I wrote Secure, Efficient and Easy C Programming Mini-HOWTO..."

    Damn. What are your plans for the rest of the day?

    1. Re:future plans? by Reality+Master+101 · · Score: 5, Funny

      Maybe make it a HOWTO rather than a Mini-HOWTO? Hell, I could write a mini-HOWTO right here...

      SECURE

      1) Don't use strcpy.

      2) Don't assume data coming in from the world is within valid limits

      EFFICIENT

      1) Avoid moving/copying large amounts of data whenever possible. Work in place.

      EASY

      1) Don't redefine the language using macros (e.g., define BEGIN {, #define END })

      2) Comment your source

      3) Use The One True Brace Style. All others are heretical crap.

      Damn, now what do I do with the rest of my day?

      --
      Sometimes it's best to just let stupid people be stupid.
    2. Re:future plans? by 0x0d0a · · Score: 4, Funny

      "First in the morning I wrote Secure, Efficient and Easy C Programming Mini-HOWTO..."

      Damn. What are your plans for the rest of the day?


      "If you've done six impossible things this morning, why not round it off with breakfast at Milliways, the Restauraunt at the End of the Universe?"

      -- Douglas Adams

  3. Hmm.. Question by Loki_1929 · · Score: 4, Funny

    "First in the morning I wrote"

    So did you wake up early this morning, or are you still up from the night before, like me?

    --
    -- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
  4. a little short?? by AmigaAvenger · · Score: 4, Interesting
    I think this might be a little bit too soon to have it posted on slashdot, you only have a few pages covering memory and string handling, and nothing ground breaking at that.

    It does look like a good start, add a few more chapters and you will be halfway there...

    1. Re:a little short?? by cras · · Score: 5, Informative
      It does look like a good start, add a few more chapters and you will be halfway there...

      Sorry, but I think this is about all I have to say. Secure Programming HOWTO should take care of the rest.

    2. Re:a little short?? by crimsun · · Score: 4, Interesting

      I agree that it's a good start. I would also add in links to vsftpd's design & implementation documentation; see here, here, and here. For what it's worth, Maradns's string library is worth examining as well.

    3. Re:a little short?? by dvdeug · · Score: 5, Funny

      Damn true, using C for other thing than low-level stuff really is a bad habit.

      Oh, God, another Visual Basic user who writes code with a mouse. Spare me.

      Yes, because it's better to spend weeks and months carefully constructing a GUI by hand then to put it together in a couple days with a mouse. Especially if it's going to be used by three or four people; by God, it's more than worth it to the company for me to spend two or three months on the project (@ $60,000 a year) so those people can get their results back in a couple seconds rather than a couple minutes.

      It's also better to spend weeks and months writing an efficent text processing program in C and worrying about buffer overflows and memory leaks, rather then writting it in a couple days in Perl or Snobol. Who cares that the results will inevitably be piped to less and studied for a few minutes; the fact that we shaved off 40% of 2 seconds (and added an obscure error case) is more than worth it!

      Actually: Oh, God, another C programmer that will make me suffer through anonymous core dumps because his programming language is so much more macho, and so much more efficent (really wish he understand how to use Big-O notation and switch algorithms, but he spent so much time programming this one and dubugging it that he can't afford to switch. Too bad he doesn't use a language with efficent control structures predebugged and optimized.)

  5. Mirror of HOW-TO in case it gets slashdotted by CableModemSniper · · Score: 5, Funny

    1) Use python with C bindings

    --
    Why not fork?
  6. Unportable? by Anonymous Coward · · Score: 5, Interesting

    I found strlcat and strlcpy easily ported - simply toss them in the same .c file and dump it into the makefile!

    On a more serious note, why in Bob's name don't these two functions exist, standard, in Linux? IMO, they should be added, and gcc should give deprecation warnings about the use of non-safe buffer handling functions - sprintf, strcat, strcpy, etc. No offense to purists, but screw the standard. I'll sacrifice some portability of software and such for security.

    Oh, and on a side note, you may take my malloc() when you pry it from my cold dead fingers. ;) Eh, I suppose we all have a certain way of doing things that we don't wish to part with. (*points at the unsafe buffer people*)

  7. Definitely useful by ttfkam · · Score: 5, Interesting

    in that folks who use C can avoid common pitfalls. But so much of this seems like it has been tackled by C++. Only C++ did it cleaner. C++ is complex though. So this only leaves (horrors) a higher level language that removes all of these implementation details that lead to insecure programs.

    Do it in a higher-level language first. Make sure your algorithms are clean and efficient. If and only if you see a performance or resource problem do you rework portions(!!!) in C. As a bonus, the higher level language acts as a code template for faster C development.

    Once you are at that point, this Mini-HOWTO will definitely be a great resource to use.

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
    1. Re:Definitely useful by Kragg · · Score: 5, Insightful

      Wow. This is one of the few genuinely insightful comments I've come across.

      Prototyping in a higher-level language (c# is easy, java everyone knows) is a superb idea, provided you
      - can release the final product as interpreted, with slow execution speed
      - can afford the time to port all to C, in which case DO, this is an excellent way to make a watertight C program
      - are happy to learn how to make managed code/vm code call to native and vice-versa (this is far from a trivial problem)

      There are apps that fit into all 3 categories, and if your end-result should be a watertight C program, it may even be faster to prototype.

      Fight the conventional wisdom! make good code by doing it right, not by being a genius who can hold 4000 variables in his mind over a month-long project (because you aren't one anyway).

      --
      If you can't see this, click here to enable sigs.
    2. Re:Definitely useful by hobuddy · · Score: 5, Interesting

      ttfkam wrote:
      Do it in a higher-level language first. Make sure your algorithms are clean and efficient. If and only if you see a performance or resource problem do you rework portions(!!!) in C. As a bonus, the higher level language acts as a code template for faster C development.

      Amen.

      Kragg wrote (in his reply to ttfkam):
      Prototyping in a higher-level language (c# is easy, java everyone knows) is a superb idea, provided you
      - can release the final product as interpreted, with slow execution speed


      Most programs spend 90% of their CPU time executing 10% of their code. If that 10% is optimized in a low-level language such as C, a large-scale interpreted program can boast performance that's virtually indistinguishable from an equivalent program written entirely in a low-level langauge. However, there's likely to be a huge difference in programmer productivity.

      As a reference, see this Dr. Dobbs article, which states:
      """ ... 90 percent of the software's running time occurs in only 10 percent of the code. This is the whole basis for virtual memory: Potentially, a program can run at full speed with only 10 percent of itself--or whatever the working set is--loaded into memory at any given time. Unlike that nasty segment stuff, the programmer does not specify any of this in advance. The operating system "discovers" a program's working set on-the-fly, through page faults.
      """

      - can afford the time to port all to C, in which case DO, this is an excellent way to make a watertight C program

      Why port 90% of the application's code to a low-level and less productive programming language, when that 90% will inevitably evolve and require maintenance as the program is utilized in unforeseen ways? I've never written a large program that didn't end up having features added incrementally over a long period of time after the initial release.

      - are happy to learn how to make managed code/vm code call to native and vice-versa (this is far from a trivial problem)

      If it's "far from a trivial problem", you're using the wrong tool.

      Take Python, for example: it's simple to interface between Python and C using Python's C API. Recently, a tool named Pyrex has appeared that makes it almost trivial. Pyrex is amazing.

      Kragg suggested prototyping in C# or Java, but Python surpasses both of those as a prototyping tool. Python is higher-level than C# or Java (and thus better suited to prototyping and/or malleable fusion with C) because it features:
      - dynamic typing ("dynamic", not "weak" like Perl)
      - no obession with a particular programming paradigm; use procedural, functional, or OO as appropriate
      - high-level data structures built into the language
      - more convenient dynamic code loading
      - interactive development at a "Python prompt" (the value of this cannot be overestimated)
      - no separate compilation step in the edit-test-debug cycle
      - more concise syntax
      - excellent interface capabilities to C (or C++ via Boost.Python, or Java via Jython)

      I suggest that the fusion of a truly high-level (higher than Java-level) language with C is far more broadly applicable than Kragg claims.

      --
      Erlang.org: wow
  8. data stacks by larry+bagina · · Score: 5, Interesting
    What I haven't yet seen used anywhere outside my own software and some programming languages internals (eg. calling Perl code from C), is using data stack for temporary memory allocations. It has the most important advantage of garbage collectors; allocate memory without worrying about freeing it. It also has a few gotchas, but I'd say it's advantages are well worth it.

    The way it works is simply letting the programmer define the stack frames. All memory allocated within the frame are freed at once when the frame ends. This works best with programs running in some event loop so you don't have to worry about the stack frames too much. Here's an example program:

    That sounds a little like the NSAutoReleasePool in Cocoa/OpenStep. Objects use reference counting, when the count reaches 0, they deallocate themselves. When an object is created, it can get added to the most recent pool. When the pool is deleted, it decrements the reference count of all the objects within it, causing deallocation unless it needs to be kept around longer.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  9. It's a Sunday morning; So don't criticize. by dagg · · Score: 4, Funny
    I'm not a writer and I'm not too good at english, so sorry about all the spelling and grammar errors :) All of this stuff was written at sunday morning, tired after being awake the whole night and not being able to do anything useful...

    I'm going to start putting that at the end of everything I write so that people can't criticize anything I do. As a matter of fact... I think I'll only write on Sunday mornings after not sleeping the night before. It seems like it's always Sunday morning anyways.

    --
    Your sex on a Sunday Morning
    --
    Sex - Find It
  10. Is this news? by mthed · · Score: 5, Insightful

    Some guy spent a couple of hours writing a first draft of a Howto. Thanks Slashdot, I'm sure glad you didn't let this one slip through the cracks! Besides, who cares about these kludgy ways of handling memory. If you don't wan't to worry about memory allocation use C# or java or something. Otherwise, stop eating quiche and write solid code.

    --
    "There's a madness to my method." -mthed
  11. stack allocation?? by hanwen · · Score: 5, Informative
    (yawn)

    it starts off with denouncing GC as oldfashioned, and then proceeds to tout stack-based allocation, which has been available for ages as the alloca() function (which also has portability problems.)

    imho, you should use the Boehm Garbage collector, unless you have code that must be guaranteed to be free of space leaks.

    --

    Han-Wen Nienhuys -- LilyPond

  12. You Forgot: by asv108 · · Score: 5, Funny
    You forgot to add the obligatory "in XX days" or "XX hours" to your title. So a better title for this story would be:

    "Secure, Efficient and Easy C programming in 24hrs"

  13. strncat/strncpy are *NOT* intuitive by Alejo · · Score: 4, Informative

    Did you really read the strncpy and strncat manpages?
    To both zero-terminate and check for truncation is arcane, that's why the OpenBSD ppl made strlcat and strlcpy in the first place.
    There are already other secure programming faqs, though AFAIR, they suck too. If I were you, I'd put a HUGE disclaimer to take this page as work-in-progress.
    (before flaming, write down the correct code to check for truncation for both funcs)

  14. Devils' Advocate by windex · · Score: 4, Interesting

    Okay, let's preface. This guy has a good idea in the memory allocation department.

    Problem 1:

    It's not easy, nor fast to write. Errors are severe if present and undetected. Code required to be reliable might not be a good place to test this allocation method.

    Problem 2:

    I'm not entirely sure these concepts are very portible outside of GCC. May not be a big deal to most, but uh, multiplatform code is required in some enviroments.

    Problem 3:

    Any speed increase without massive resource wasting is pure dumb luck during heavy usage, unless used in an application that takes little user input or has limits on the ammount of input.

    Just my $0.02.

  15. These are common tricks by pclminion · · Score: 5, Insightful
    It's a good start at a HOWTO, but needs some serious fleshing out. These are common tricks that most serious, experienced C programmers have in their bag.

    Some of my personal favorites include:

    • Exceptions in C. You can get quite natural-looking exception handling in C, with some convoluted macros. I'm sure most hardcore C coders have come up with their own implementations. Many security bugs happen in parts of the code that handle errors, precisely because errors are rare, and those parts of the code don't get tested well. Using a unified, exception-driven approach to error handling can cut down the risks. IF you do it right.
    • The alloca() function. This allocates memory directly off the stack, which is freed when the function returns. Very useful for cases where you want a stack buffer but aren't sure how big it needs to be. Like any other stack buffer, you need to take care not to overflow it. There are portability concerns with this function, but it can still be useful.
    • Variable-sized block-chained allocators, which pull chunks of memory out of preallocated segments. The segments are chained together in a linked list. Very effective when you need to make a lot of variable-sized allocations, and do it fast, dammit. It also makes freeing the allocated memory blazingly fast, although it's a "free all or none" approach.
    • "Hardened" allocators, which allocate blocks in multiples of the page size, and set memory protections in such a way that buffer overruns cause crashes. This is the easiest way to prevent ANY kind of buffer overrun vulnerability, but wastes memory. See Electric Fence.
    Look people.. It takes a keen eye and major discipline to write secure C code. It is not impossible. You have to get in the habit of subconsciously checking yourself at EVERY turn. "Am I accessing a stack variable? Am I doing it CORRECTLY?"

    DISCIPLINE, DISCPLINE, DISCIPLINE. I fully expect to see the usual barrage of comments to the tune of: "C is outdated, insecure, brittle, yadda yadda..." No. Some PROGRAMMERS are "outdated, insecure, and brittle."

    The C language doesn't write bugs. Programmers write bugs. If the programmer can't handle C, then take it away from him. But don't try to take it away from ME.

  16. Bad implementation of a heap... by lkaos · · Score: 4, Insightful

    See HeapAlloc and friends in Win32 for proper implementation.

    At any rate, there are better ways to make sure one never leaks memory problems:

    1) always set a freed pointer to 0. Most architectures have a predictable behavior in dereferencing a 0 (throws an exceptions).

    2) Limit all malloc/free pairs to the same function. If a function just has to allocate and return some buffer, give it a meaningful name to that effect and all a corresponding free version. Then, you can follow the above rule.

    3) assert()s are your friend. Use them religiously. They can always be shut off.

    4) Use memory tracking software (purify) before ship.

    Yes, it's easier to shoot yourself in the foot with C, but you'll gain a huge performance increase. It's all about using the right tool for the right job.

    --
    int func(int a);
    func((b += 3, b));
  17. C++ can do this, and it'll look cleaner by ademko · · Score: 4, Informative

    First off, C++ objects can force the use of all data access through assert()-filled methods, then in optimized mode can be inlined and thus reduced to their C equivalents.

    Second, destructors in C++ guarantee clean up of objects, regardless of how you leave scope (natural, return, exception, etc).

    Finally, you couple destructors and reference counting auto-pointers, and you have yourself a very nice allocation API that's as easy as Java, but without the performance or unnatural destruction logistics.

  18. aggressive use of glib by chtephan · · Score: 4, Insightful

    In my last project, I used glib from the ground up. I wrote several thousand lines of code before testing it. I made some very aggressive use of glib and gobject. After the code compiled and did not give any runtime warnings anymore it did not contain a single memory leak (verified using valgrind).

    glib containts a lot of useful things: lists, trees, hash tables, memory pools, string handling functions and a lot more, everything thread safe.

    gobject contains tools on top of glib like "classes" and "objects". It's not the same as in C++ or java, but also very useful. Runtime classes oder data types, generic object properties, reference counting, signal callback, runtime type checking, etc...

    The code ist now full of g_... and it took longer than usual because I had to read the documentation, but I think these libraries are very great, and provide a solution for nearly everything that has to do with abstract data types and dynamic memory allocation.

    And it's very lightweight, fast and efficient.

  19. Re:+1 Insightful by Twirlip+of+the+Mists · · Score: 5, Funny

    Perl is for idiots who think regexps can solve all problems.

    s/idiots/wise souls/
    s/think/know/

    Problem solved.

    --

    I write in my journal
  20. What C can do that Perl can't by yerricde · · Score: 5, Insightful

    I still have yet to write a single useful C program that I couldn't have done in Perl.

    Can you write a video driver with acceptable performance in Perl? Can you write programs that do things other than text manipulation, such as (say) a 3D engine and make them faster in Perl than in C? Remember that in the real world, time is money because a shorter execution time means lower system requirements and thus a larger market for mass-market desktop applications.

    --
    Will I retire or break 10K?
  21. Re:data stacks (NSAutoReleasePool vs. NSZone) by Jimithing+DMB · · Score: 5, Insightful

    Well, not quite. An NSAutoReleasePool does not allocate a large region of memory and suballocate objects out of that. What an NSAutoReleasePool does is make it possible to avoid explicitly sending the release message for temporary objects.

    For example, from Foo() I allocate an NSObject with [[NSObject alloc] init] and pass that as an argument to Bar() which takes ownership of it. However, I must then ensure that I release the object because Bar() is following good coding practices and retains it, so thus with alloc+retain it's reference count is now 2. So instead what I do is Bar([[[NSObject alloc] init] autorelease]) which allocates NSOjbect (with ref count one) initializes it, marks it for autorelease, and passes sends it to Bar() which retains it (ref count 2) and keeps a pointer to it (presumably it is a method of a class). Coming out of bar the ref count is now 2, and perhaps Foo() proceeds to do some other things. Presumably at some point higher up the call stack (or perhaps at the beginning of Foo()) an NSAutoReleasePool was allocated. At the corresponding exit point (either at the end of Foo() or the end of whatever higher up function) [whateverpool release] will be called. When the pool is released, it will call release on any objects it has been asked to take ownership of. At this point one of two things it true. Either the class that Bar() belongs to has already released the object and thus its reference count went back down to one, and now is going to zero (so bye-bye), or the class that Bar() belongs to has not released the object and doing this release merely brings the refcount back to one such that when the other owner releases the object, its refconut will be zero and it will be freed.

    Sorry if that was confusing, but in reality it's really not. It also really helps out when you are coding functions that allocate ObjectA, then allocate ObjectB, then ObjectC, and then find out something is wrong and need to "roll back" to the begining. If you allocate an NSAutoReleasePool at the beginning, and autorelease everything you alloc then if you error out you can free the release pool and everything gets released. If you don't you can simply retain what you need and then free the autorelease pool.

    Anyway.. what this guy is REALLY talking about is NSZone. NSZone allocates a chunk of memory which other objects will be allocated from. The caveat being that while the memory will be freed, the objects will not be properly destroyed. Now this guy was talking about holding C strings and the like, so this is not a problem. However, had he been holding some C++ or objective-C objects this would be a problem as none of the destructors/deallocators would ever be called.

    I think what it all boils down to is that programmers need to read more code than they write and that we should really be getting Masters of Fine Arts in Programming. I completely agreed with what Dr. Gabriel said. Programming is about as much like building a bridge as writing poetry is. That is to say.. not much.

    Going along with that thought, I think it should be pointed out that /EVERYONE/ here who programs in any language (but specifically C programmers, and ESPECIALLY C++ coders) needs to learn Cocoa and Objective-C. I imagine some of the C++ whiny bitches are going to continue to whine about how much easier and better C++ is, but for those of us who actually prefer to wrangle pointers, Objective-C is where it's at. It's like C with JUST enough object orientation, but not overdone in some committee like C++. Also, one should note that I do like C++ quite a bit, but sometimes there's too many provided ways to do things. With Objective-C, the provided ways are almost all good. In addition, like C or C++ you are not limited to doing it that way, it's just that Objective-C only makes it easy to do good things.

    Think for example of wxWindows vs. Microsoft MFC. wxWindows is suprisingly similar to Cocoa (although wxWindows does not do ref counting so making sure that one and only one class ever owns an object can be problematic at times). MFC, on the other hand, is rather a bear to work with as Microsoft has written it such that an MFC programmer /can/ do things multiple ways, none of which work very well. Obviously this is a generalization, but I think the average MFC programmer will understand where I'm coming from here. That is, again, except for the whiny C++ and MFC bitches who can't figure pointers out. Go home!

  22. reference counting by ttfkam · · Score: 4, Informative

    The main problems with it versus broader garbage collection schemes are circular references and overhead.

    If two (or more) objects have a reference to one another, the count can never reach zero even if nothing in the main logic points to those objects anymore.

    Also, every time an object gains or loses a reference, a check for a count of zero is made. In fuller garbage collection setups, periodic checks are made to all of the objects in a low-priority thread. In some cases, memory usage can be higher, but performance is also higher sometimes and it can handle circular references.

    Both are better than repeated use of malloc/free and new/delete though.

    --

    C also muddies this concept because there are no objects in C.

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
  23. scanf and friends by usrerco · · Score: 5, Informative
    No such document should be without mention of scanf(3) misuse, and gets(3) use at all.

    Regarding scanf(3), many people don't realize this is Bad:

    • char cmd[80], arg[80];
      scanf("%s %s", cmd, arg); // BAD

    This is Good:

    • char cmd[80], arg[80];
      scanf("%79s %79s", cmd, arg); // GOOD

    This prevents a buffer overrun if a word contains 80 or more consecutive non-white characters.

    Ditto for sscanf(3) and fscanf(3). Never forget the N+1 when declaring the arrays (eg. char s[80] vs %79s) to leave room for the NULL.

    Here's a good command to run on all your .c files to find such problems:

    • egrep 'scanf(.*%s' *.c

    ..any lines that match are a potential problem.

    And in a document like this, *definitely* point out the whole gets(3) problem; the granddaddy of them all. Never use gets(3), period. Use fgets(3) instead.

    The gets(3) interface is inherently insecure; a problem waiting to happen by its mere existence. Any code that uses it is broken.

    There are probably some others (someone mentioned strcpy) I'll try to post more if I think of them.

  24. I send you this post to have your advice by denshi · · Score: 5, Informative
    Prototyping in a higher-level language (c# is easy, java everyone knows) is a superb idea, provided you
    - can release the final product as interpreted, with slow execution speed
    - can afford the time to port all to C, in which case DO, this is an excellent way to make a watertight C program
    Sir! I wish to introduce to you to the strange new-fangled notion of compiled high-level languages! Yes, languages with higher-than-C-level abstractions have been sneakily producing native machine code for some time now. Some of the most popular are listed below:

    • O'Caml is a marvel of strongly typed object orientation, but you'd hardly know it from using it -- there are almost no C-style type declarations; as a ML child, O'Caml uses type inferencing to prove powerful assertions about program validity and improve programmer convenience. It's compiled! And if you watch the ICFP's, you might note that it consistently beats C compilers for speed of execution. '92, if I recall.
    • I never really bought OO, so S/ML is fine by me. Still compiled, since 1984.
    • And they both descend from ML, started in 1973.
    • Lisp was compiled in 59 or 62 (mccarthy or 1.5, chose your valid date). But then, I suppose it'd have to be compiled, since the notion of interpreted code hadn't been concieved of yet!
    • Erlang is the last, best, word in concurrent programming. If you want to write a high throughput, reliable threaded application, you shouldn't even think of the word 'C'. This broke out of its lab in '87, first compiler in '91.
    • Scheme is often thought of as a testbed for interpreted language concepts, but even it can be compiled, and with concepts such as continuations that can actually make a C programmer's head explode! Since 1982, commercial grade compilers have been available.
    • Even haskell is compiled, but as monadic programming is less than 10 years old, no one knows how to always write really fast code in it yet. Leave your number, we'll call you in 2034, right before you gear up to deal with your year 2038 rollover crisis.
    Welcome to the late 1970's! We look forward to your eventual arrival in the 80's and early 90's. Please enjoy your stay!

    ps. As modern coding is more about the manipulation of very complex structures, rather than how to say, walk a linked list; a higher level language, with native support for more complex constructs, has the potential for creating much faster applications than something on the level of C. The reason being is that the h/l compiler can reason about, and thus optimize over, larger components than the C compiler.