I'm not seeing the difference between your data stack and a memory pool,
You could think of it that way if you wanted. I actually called them "temporary memory pools" before learning it had an existing name.
A context frame should probably always map to a single function invocation. Or to put it another way, a data frame pushed in a particular function call should always be popped by that same function call. And that kind of defeats the purpose of being able to return stack-allocated data UP the call stack.
Yes, but there's no need to create a new frame for each function call. You may not need to create more than one frame in the entire program if you know you're not allocating too much memory out of it. That's what makes it better than alloca(). You can do for example:
t_push(); ret = alloc_some_data_from_stack();/* do stuff with ret */ t_pop();
All very simple. Sure there's still possibility breakages but they're not very common, and you know when you're doing it wrong. Simply forgetting a t_pop() call will be noticed at the bottom level t_pop() which would kill the program then - nothing got overflown but it might have allocated memory excessively.
In contrast alloca() is a simple manipulation of the hardware stack pointer, which will be automatically undone by the hardware itself at the end of the call frame (on any sane architecture, that is). There's no possiblity for abuse.
alloca() simply doesn't do what I want. I want to return dynamically allocated memory from functions without worrying about freeing it. Data stack and GC are the only possibilities for that.
Any strlcat(), strlcpy(), etc. don't solve the underlying problem in all string operations, which is making sure you always have enough room.
I'm not propsing strlcpy() either. I only mentioned them as being much better than strncpy/strncat which they definitely are. I've never used them though.
BGC recollects non-live objects in memory. It does so very efficiently: it's efficiency is comparable with malloc().
Exactly. Which is not good enough.
Reclaiming dead objects is what GC is for. If you claim BGC can't do much with C, then you have a different definition of GC than I have. Please be more specific: where is BGC lacking, eg. as opposed to stack based memory allocation?
It leaves the memory scattered all around. Yes, malloc()+free() does exactly the same. Stack based doesn't. Sure, that may not be such a big deal and malloc() implementations have gotten better these days but it still doesn't beat data stack in speed and produces some fragmentation vs none with data stack. Those may not matter to you of course, but it's still a point to data stack.
OCaml's GC can move data around, packing long-living data together to reduce fragmentation. That's how a good GC implementation would work. But that's not happening with C since moving data around has too many side effects to handle.
You should more clearly mark, what gain can be expected by which measure. Allocating on the stack (with alloca() or something similiar) gains you speed, some convenience, but no security (buffer overflows are more readily exploited to inject harmful code, if the buffer is allocated on the stack).
Well, I'm mostly talking about allocating memory from heap and data stack, which itself is allocated from heap. But in any case, the code should be written so that buffer overflows don't happen anywhere making it irrelevant where the memory is actually stored. That's what I'm aiming for.
You failed to describe what's wrong with strncat(), strncpy() etc. IMHO people who can't comprehend the man pages for those functions probably should avoid C altogether, but definitively must be hindered to write security relevant software (as should sleep-deprived coders who try to do it on a Sunday morning;-}
Oh. I felt disadvantages of them were self obvious. With strncpy() you must always make sure you add the extra NUL character. strncat() was very likely never even meant to prevent buffer overflows, in which task it's difficult to use properly. Or would you call this easy to use: strncat(buf, "foo", sizeof(buf)-strlen(buf)-1); Hmm. I guess I should add that to the HOWTO:)
How to program in an amazing unthread safe manner!
But then again, threads are useless for most applicatios, especially the ones I've written so far. Besides, it's easy to make it thread safe with per-thread data stacks and adding locks to other stuff.
Granted, buffer overflows are the source of a great number of security issues, but with the right arsenal of helper functions (see the StrSafe API..
That sounds like a yet another solution for safe string handling. Like I said, I think they're too highly advertised as being the only way to overflow buffers.
The remaining are all the weird edge conditions (I've seen buffer overruns that only came about when there was race condition between two threads, for example.)
Threads? Yeah.. I wouldn't really even bother thinking about security with threads.
What about all the other aspects of writing secure code? They don't even get mentioned.
There's now a link to Secure Programming HOWTO which talks about most of the other things just fine. Maybe I could write about a few other things that aren't too well discussed in that HOWTO, like integer overflows (although it's next version will contain several of my examples about them).
Presume your security measures will fail, because eventually, they will.
Not necessarily. Or are you talking about complete systems here instead of individual applications? If your application doesn't have external dependencies other than libc and you write it fully up to ANSI-C specs (that's a bit difficult actually) and in general you're careful enough, it's theoretically possible your program is secure now and forever. libc, kernel, user, etc. bugs are different things then, although you could try to prevent some of them as well (don't give dangerously parameters, don't use dangerous functions).
Think about stack and how that works. t_push() and t_pop() basically create and destroy a stack frame, just like your control stack does at the beginning and end of function. So sure it needs to use some global memory for it (not in global variable though), just like control stack does. t_sprintf() simply returns a pointer inside the stack frame.
The alloca() function. This allocates memory directly off the stack, which is freed when the function returns. Very useful for cases where you want a stack buffer but aren't sure how big it needs to be. Like any other stack buffer, you need to take care not to overflow it. There are portability concerns with this function, but it can still be useful.
This is what my "data stack" is trying to fix and do it fully portably. And alloca() still can't be used to allocate return values from it, which I think is the most useful feature gained by using data stack. I don't know about you, but I use a lot of functions that need to return dynamically allocated memory.
Look people.. It takes a keen eye and major discipline to write secure C code. It is not impossible. You have to get in the habit of subconsciously checking yourself at EVERY turn. "Am I accessing a stack variable? Am I doing it CORRECTLY?"
Yeah. That's how most people do that (and most fail). I prefer to use functions that I know are safe to use without excessive thinking.
It's not easy, nor fast to write. Errors are severe if present and undetected. Code required to be reliable might not be a good place to test this allocation method.
It probably isn't good idea to replace existing code with any of this. But new code is much easier to write using those ideas. And there's already free MIT licenced and tested implementations which you can download.
I'm not entirely sure these concepts are very portible outside of GCC. May not be a big deal to most, but uh, multiplatform code is required in some enviroments.
All concepts are fully ANSI-C compatible.
Any speed increase without massive resource wasting is pure dumb luck during heavy usage, unless used in an application that takes little user input or has limits on the ammount of input.
I don't really get this. Speed increasing requires massive resource wasting? Isn't that exactly the opposite of what should happen? With large user input you have to store the data somewhere in any case, it doesn't matter where and using what way. My "performance" point was mostly against malloc()+free().
I actually like C. And a lot of people want to use only C code. Writing IMAP server with any other language would quite likely have gotten only a couple of users. But that wasn't really the reason. C is currently pretty much the only language I'm quite good at. My OCaml studies are only beginning.
it starts off with denouncing GC as oldfashioned, and then proceeds to tout stack-based allocation, which has been available for ages as the alloca() function (which also has portability problems.)
It does? I wonder what you've been reading then. To me it looks very much like I'm saying that the GC would be the best way to manage memory, except if it wasn't so crappy to use with C. Boehm GC can't do much with C. For example OCaml's GC does a lot better and that's simply not possible to do with C unless you wrote your whole program in very special way.
Well, I didn't link it to original site, but from the look of this so called slashdot effect I think I just as well might have. Come on, I don't see any kind of load at all!
Just look at the code. It's all "check that we have enough size", "copy that much data there", etc. It's all too easy to make mistakes with that. I found two missing checks, non-exploitable though. Also I found another almost-exploitable overflow with it's IPC, it used malloc(len) instead malloc(len*4). Only reason it's not exploitable is that the length was always either 0 or 1.
vsftpd handles buffers the right way - quoting security/implementation doc:
The problem is that people insist on replicating buffer size handling code
and buffer size security checks many times (or, of course, they omit size
checks altogther). It is little surprise, then, that sometimes errors creep
in to the checks.
The correct solution is to hide the buffer handling code behind an API. All
buffer allocating, copying, size calculations, extending, etc. are done by
a single piece of generic code. The size security checks need to be written
once. You can concentrate on getting this one instance of code correct.
I've tried 2.5.48 and 2.5.49 and gave up both mostly because when compiling software Galeon got horribly slow especially with scrolling, MUCH worse than 2.4 kernels. Giving smaller priority to compiling job made it better but I'm too lazy to type the extra nice command before make..
See Non-blocking I/O is good [acme.com] for more background on what multiplexing is and why it is good.
Yes, it's fine as long as there aren't any long running tasks. For example cgi-bins, PHP, etc. may block the process for a long time. Of course you could use it for sending static pages and use processes/threads for dynamic pages. That'd be the best way to do it I think.
The page you linked contains pretty old information BTW. Especially about portability. Posix threads and fd transferring with UNIX sockets are quite portable nowadays.
It actually only held a single copy of any email sent to a distribution list, and everyone's inbox would just have pointers to it.
This is easily done today with filesystem hardlinks with Maildir and Maildir-like systems which use individual files as messages. I know Cyrus supports this, don't know about others but I'd like to get Postfix to do this.
I've never had any problems installing base Debian plus the few packages I want, but configuring everything to work well can take a lot of time. X, truetype fonts, fixing wanted modules to load automatically plus all the little details you notice only after using it a while.
I don't know if configuration is any better with other distros, and it's not that difficult with Debian either once you know/remember how to do all the things. Configuring the fonts were quite a pain until I learned about msttcorefonts package..
I've thought a few times of network like this. It could even try to be very stealthy in communicating with others by transferring data along normal traffic, automatic mutation, infect also files to be carried outside internet, etc.
Once almost all computers in the world have been infected by the worm, the guy in the charge of them could just decide to make the worms format hard disks, see if they can delete other files from network, and finally try to physically destroy the computer. I find that pretty interesting scenario:)
Reading the docs briefly tells that this works by connecting through "proxies" before the actual servers. The proxies will provide the anonymity because they don't know what the transferred data is and servers don't know what the client's IP is, only the proxy's.
I guess this is fine as long as anonymity is all you want, but I don't see this getting mass attention. It's just yet another IRC network. Don't know about you but I'm sick of having different IRC networks, it'd be so much easier to just connect to "IRC" and be able to talk to everyone. Allowing everyone to run servers which all could talk to each others would effectively do this, just like SMTP protocol with emails. There's a few projects that have been meaning to do this, but none of them is anywhere close to a working implementation AFAIK.
Some links: irc+,
irc++. Also jabber does pretty much the same, but it seems much more about instant messaging than containing all IRC's functionality.
Well, probably too late for anyone to actually read this but lets try anyway.
I've been writing for the last few months a secure and fast IMAP server, named Dovecot. Suggestions and other feedback appreciated:)
Other projects seem to be creating all-in-one products which probably are easy to install and maybe to maintain, but much less powerful than the products that focus on just one thing. I have no plans on creating yet another useless SMTP daemon, Postfix and Qmail will do very well already. Of course, some people can merge those into some packages that are easy to install and administrate.
What if every package maintainer signed the packages with his own signature, and that was added into Packages file? Then some.deb which contains everyone's signatures and dpkg would verify them. This would sound pretty good to me at least.
You could think of it that way if you wanted. I actually called them "temporary memory pools" before learning it had an existing name.
Yes, but there's no need to create a new frame for each function call. You may not need to create more than one frame in the entire program if you know you're not allocating too much memory out of it. That's what makes it better than alloca(). You can do for example:
t_push(); ret = alloc_some_data_from_stack(); /* do stuff with ret */ t_pop();
All very simple. Sure there's still possibility breakages but they're not very common, and you know when you're doing it wrong. Simply forgetting a t_pop() call will be noticed at the bottom level t_pop() which would kill the program then - nothing got overflown but it might have allocated memory excessively.
alloca() simply doesn't do what I want. I want to return dynamically allocated memory from functions without worrying about freeing it. Data stack and GC are the only possibilities for that.
I'm not propsing strlcpy() either. I only mentioned them as being much better than strncpy/strncat which they definitely are. I've never used them though.
Exactly. Which is not good enough.
It leaves the memory scattered all around. Yes, malloc()+free() does exactly the same. Stack based doesn't. Sure, that may not be such a big deal and malloc() implementations have gotten better these days but it still doesn't beat data stack in speed and produces some fragmentation vs none with data stack. Those may not matter to you of course, but it's still a point to data stack.
OCaml's GC can move data around, packing long-living data together to reduce fragmentation. That's how a good GC implementation would work. But that's not happening with C since moving data around has too many side effects to handle.
Well, I'm mostly talking about allocating memory from heap and data stack, which itself is allocated from heap. But in any case, the code should be written so that buffer overflows don't happen anywhere making it irrelevant where the memory is actually stored. That's what I'm aiming for.
Oh. I felt disadvantages of them were self obvious. With strncpy() you must always make sure you add the extra NUL character. strncat() was very likely never even meant to prevent buffer overflows, in which task it's difficult to use properly. Or would you call this easy to use: strncat(buf, "foo", sizeof(buf)-strlen(buf)-1); Hmm. I guess I should add that to the HOWTO :)
No I didn't. alloca() belongs to same part as "control stack". OK, let's paste it here again with alloca() added to it - thanks for the update.
Advantages over control stack:
But then again, threads are useless for most applicatios, especially the ones I've written so far. Besides, it's easy to make it thread safe with per-thread data stacks and adding locks to other stuff.
Quoting the howto:
Thanks for the obstacks reference though, I hadn't heard of it before.
That sounds like a yet another solution for safe string handling. Like I said, I think they're too highly advertised as being the only way to overflow buffers.
Threads? Yeah .. I wouldn't really even bother thinking about security with threads.
There's now a link to Secure Programming HOWTO which talks about most of the other things just fine. Maybe I could write about a few other things that aren't too well discussed in that HOWTO, like integer overflows (although it's next version will contain several of my examples about them).
Not necessarily. Or are you talking about complete systems here instead of individual applications? If your application doesn't have external dependencies other than libc and you write it fully up to ANSI-C specs (that's a bit difficult actually) and in general you're careful enough, it's theoretically possible your program is secure now and forever. libc, kernel, user, etc. bugs are different things then, although you could try to prevent some of them as well (don't give dangerously parameters, don't use dangerous functions).
Think about stack and how that works. t_push() and t_pop() basically create and destroy a stack frame, just like your control stack does at the beginning and end of function. So sure it needs to use some global memory for it (not in global variable though), just like control stack does. t_sprintf() simply returns a pointer inside the stack frame.
This is what my "data stack" is trying to fix and do it fully portably. And alloca() still can't be used to allocate return values from it, which I think is the most useful feature gained by using data stack. I don't know about you, but I use a lot of functions that need to return dynamically allocated memory.
Yeah. That's how most people do that (and most fail). I prefer to use functions that I know are safe to use without excessive thinking.
Sorry, your comment makes absolutely no sense. You saying exactly the same as I did.
It probably isn't good idea to replace existing code with any of this. But new code is much easier to write using those ideas. And there's already free MIT licenced and tested implementations which you can download.
All concepts are fully ANSI-C compatible.
I don't really get this. Speed increasing requires massive resource wasting? Isn't that exactly the opposite of what should happen? With large user input you have to store the data somewhere in any case, it doesn't matter where and using what way. My "performance" point was mostly against malloc()+free().
I actually like C. And a lot of people want to use only C code. Writing IMAP server with any other language would quite likely have gotten only a couple of users. But that wasn't really the reason. C is currently pretty much the only language I'm quite good at. My OCaml studies are only beginning.
It does? I wonder what you've been reading then. To me it looks very much like I'm saying that the GC would be the best way to manage memory, except if it wasn't so crappy to use with C. Boehm GC can't do much with C. For example OCaml's GC does a lot better and that's simply not possible to do with C unless you wrote your whole program in very special way.
Well, I didn't link it to original site, but from the look of this so called slashdot effect I think I just as well might have. Come on, I don't see any kind of load at all!
[cras@foo] ~$ ps ax|grep apache|wc -l
60
[cras@foo] ~$ uptime
20:32:54 up 127 days, 10:58, 56 users, load average: 0.23, 0.41, 0.37
Those loads were pretty much the same before slashdotting.
Sorry, but I think this is about all I have to say. Secure Programming HOWTO should take care of the rest.
Read the article ;) 27h uptime or so now. I woke up around 6pm though.
Just look at the code. It's all "check that we have enough size", "copy that much data there", etc. It's all too easy to make mistakes with that. I found two missing checks, non-exploitable though. Also I found another almost-exploitable overflow with it's IPC, it used malloc(len) instead malloc(len*4). Only reason it's not exploitable is that the length was always either 0 or 1.
vsftpd handles buffers the right way - quoting security/implementation doc:
I've tried 2.5.48 and 2.5.49 and gave up both mostly because when compiling software Galeon got horribly slow especially with scrolling, MUCH worse than 2.4 kernels. Giving smaller priority to compiling job made it better but I'm too lazy to type the extra nice command before make..
Yes, it's fine as long as there aren't any long running tasks. For example cgi-bins, PHP, etc. may block the process for a long time. Of course you could use it for sending static pages and use processes/threads for dynamic pages. That'd be the best way to do it I think.
The page you linked contains pretty old information BTW. Especially about portability. Posix threads and fd transferring with UNIX sockets are quite portable nowadays.
It actually only held a single copy of any email sent to a distribution list, and everyone's inbox would just have pointers to it.
This is easily done today with filesystem hardlinks with Maildir and Maildir-like systems which use individual files as messages. I know Cyrus supports this, don't know about others but I'd like to get Postfix to do this.
I've never had any problems installing base Debian plus the few packages I want, but configuring everything to work well can take a lot of time. X, truetype fonts, fixing wanted modules to load automatically plus all the little details you notice only after using it a while.
I don't know if configuration is any better with other distros, and it's not that difficult with Debian either once you know/remember how to do all the things. Configuring the fonts were quite a pain until I learned about msttcorefonts package..
I've thought a few times of network like this. It could even try to be very stealthy in communicating with others by transferring data along normal traffic, automatic mutation, infect also files to be carried outside internet, etc.
:)
Once almost all computers in the world have been infected by the worm, the guy in the charge of them could just decide to make the worms format hard disks, see if they can delete other files from network, and finally try to physically destroy the computer. I find that pretty interesting scenario
Reading the docs briefly tells that this works by connecting through "proxies" before the actual servers. The proxies will provide the anonymity because they don't know what the transferred data is and servers don't know what the client's IP is, only the proxy's.
I guess this is fine as long as anonymity is all you want, but I don't see this getting mass attention. It's just yet another IRC network. Don't know about you but I'm sick of having different IRC networks, it'd be so much easier to just connect to "IRC" and be able to talk to everyone. Allowing everyone to run servers which all could talk to each others would effectively do this, just like SMTP protocol with emails. There's a few projects that have been meaning to do this, but none of them is anywhere close to a working implementation AFAIK.
Some links: irc+, irc++. Also jabber does pretty much the same, but it seems much more about instant messaging than containing all IRC's functionality.
Well, probably too late for anyone to actually read this but lets try anyway.
I've been writing for the last few months a secure and fast IMAP server, named Dovecot. Suggestions and other feedback appreciated :)
Other projects seem to be creating all-in-one products which probably are easy to install and maybe to maintain, but much less powerful than the products that focus on just one thing. I have no plans on creating yet another useless SMTP daemon, Postfix and Qmail will do very well already. Of course, some people can merge those into some packages that are easy to install and administrate.
What if every package maintainer signed the packages with his own signature, and that was added into Packages file? Then some .deb which contains everyone's signatures and dpkg would verify them. This would sound pretty good to me at least.