Domain: and.org
Stories and comments across the archive that link to and.org.
Comments · 76
-
Re:Might be a good idea
Remember the fact that he refused to incorporate the safe string handling functions [...] like strlcat()/strlcpy()
-1, Fail at C. strl*() are not safe, they are safer than strn*() sure
... but that isn't the same thing. Hell it's not even like they are efficient, which is the usual C hacks fallback for using 1980s interfaces today.If you want a small, faster, easy to use and safe C string API go use one, or any number of others that are designed for other criteria. With this new fangled shared library technology, we can now put shared functions in more than one file call libc.
Also when he refused to add them, their defintion was different on both platforms that implemented them
... not that it makes Drepper a nice person, but he wasn't all wrong about strl*(). -
Re:Might be a good idea
Remember the fact that he refused to incorporate the safe string handling functions [...] like strlcat()/strlcpy()
-1, Fail at C. strl*() are not safe, they are safer than strn*() sure
... but that isn't the same thing. Hell it's not even like they are efficient, which is the usual C hacks fallback for using 1980s interfaces today.If you want a small, faster, easy to use and safe C string API go use one, or any number of others that are designed for other criteria. With this new fangled shared library technology, we can now put shared functions in more than one file call libc.
Also when he refused to add them, their defintion was different on both platforms that implemented them
... not that it makes Drepper a nice person, but he wasn't all wrong about strl*(). -
Re:Why are we still dealing with this?
As someone who's written 2 pieces of OSS with 100% code coverage in unit tests, and probably the most secure C http server (comes with over 75% coverage). I have to say: "It's not quite that simple". Testing does not negate design, and designing for security is non-trivial and takes a certain mind-set
... and while a lot of people say they want security, almost none are actually prepared to buy it (with either money, lack of features, whatever).Hell, one of the biggest advancements in security in recent years is SELinux, and I see almost nothing but complaints about how it "is less usable, so we just turn it off". Summary: We are still dealing with security problems because that's what the majority of the market wants, welcome to democracy and the free market.
strncpy(), not strcpy()!
Actually usage of strncpy() almost certainly guarantees you have bugs, IMNSHO. You need a real managed string API. Assuming the programer can keep track of three distinct pieces of information like "size, length and pointer" is just a losing bet. All of the applications (including mine) that have had security guarantees with money have internally used a real managed string API.
-
Re:$16,000
In my opinion the problem isn't really that it doesn't pay for someone to do the work to find the exploit that's there, it's that it's not enough to be painful if there is one there.
For instance if I put a "security exploit bounty" on my code of $1 (probably less than I pay for donuts weekly)
... how secure does that say the code is? Now if I put the same bounty on it of $2,000 (yes I'm not amazingly rich, so that's a very painful amount), this is a very different equation.It's the difference between saying "I'm very confident that X is true" and saying "Meh, who knows
... I'll give you a buck if it isn't". -
Re:well...
saying that software is 100% bug free, or not exploitable is a complete fallacy.
all software has bugs in it, there is no such thing as a completely secure application.Yes, and no. You can't make "bug free" software, because one persons feature (or lack of) is another's bug. However, I believe, you can make secure (read: no remote exploits) software. That's a much smaller scope you have to defend against, and it's mostly testable. Also multiple people have done it, or claim to have done it
... including myself. -
Re:well...
saying that software is 100% bug free, or not exploitable is a complete fallacy.
all software has bugs in it, there is no such thing as a completely secure application.Yes, and no. You can't make "bug free" software, because one persons feature (or lack of) is another's bug. However, I believe, you can make secure (read: no remote exploits) software. That's a much smaller scope you have to defend against, and it's mostly testable. Also multiple people have done it, or claim to have done it
... including myself. -
Re:It Seemed to Work for Bletchley Park
Writing a simple web server is trivial. It doesn't even need to be multi-threaded, though it wouldn't be difficult to make it serve multiple connections at once with select.
Speakling as someone who has written a web server (and guarantees that it's secure)
... First, there is no such thing as a "simple web server". Even if you limit yourself to HTTP/1.0 and don't parse any headers, it's not a 3-4 hour problem. And you really need to parse at least Host: to even think about calling it a real webserver, and not just worthless junk.Also, none of the good web servers are multi-threaded (RHCA doesn't really qualify as multi-threaded, as there are no processes in kernel land). So "doesn't even need" is very misleading IMNSHO.
Someone who can't write the code to parse out a GET request is pretty lame.
I assume you have this idea of what HTTP is that bears no resemblance to reality. Parsing the GET line well is non-trivial, parsing the GET line only doesn't qualify as a web server IMO.
Yes, I've seen something that firefox can talk to which is 1 line of C (roughly 12 if you put a return after each function call). While it's "cute" that's not a real webserver.
-
Re:It Seemed to Work for Bletchley Park
Writing a simple web server is trivial. It doesn't even need to be multi-threaded, though it wouldn't be difficult to make it serve multiple connections at once with select.
Speakling as someone who has written a web server (and guarantees that it's secure)
... First, there is no such thing as a "simple web server". Even if you limit yourself to HTTP/1.0 and don't parse any headers, it's not a 3-4 hour problem. And you really need to parse at least Host: to even think about calling it a real webserver, and not just worthless junk.Also, none of the good web servers are multi-threaded (RHCA doesn't really qualify as multi-threaded, as there are no processes in kernel land). So "doesn't even need" is very misleading IMNSHO.
Someone who can't write the code to parse out a GET request is pretty lame.
I assume you have this idea of what HTTP is that bears no resemblance to reality. Parsing the GET line well is non-trivial, parsing the GET line only doesn't qualify as a web server IMO.
Yes, I've seen something that firefox can talk to which is 1 line of C (roughly 12 if you put a return after each function call). While it's "cute" that's not a real webserver.
-
Re:Except for the fact
Actually, it all depends on the workload. Some would say that Processes are a Unix hack, because they didn't think about threads.
Actually, not so much. Saying you "didn't think about threads" is like arguing that you went with protected multi-tasking OS and "didn't think about DOS". Adding memory protection and compartmentalisation is the only difference between a thread and a "process". In most cases, you just don't care anyway
... all you want is to not block, and threads are the worst fix for that problem. -
Re:Lack of threading is a benefit.
Well, call people a liar. If that's what you want, fine, go hate people who don't have trouble being effective with mutexes and semaphones. If that makes you feel better, go ahead.
Take it personally if you want, I don't care. You are saying that water flows uphill when all evidence is to the contrary
... feel free to not like me, or reallity.PP of the thread suggested that using threads is bad, he/she said never to use threads and to always separate it in processes. That is just a completely ill-advised assertion
...
The deadlocks and races that you refer to didn't exist until after they moved from a global kernel lock to very fine grained locking. ...
Point is: It's as complicated as you make it, and for very decent synchronization and communication performance at a level much higher than what you can get between processes, you don't have to make it even close to as complicated as in-kernel synchronization. Using mutexes and semaphores in threads is actually very simple.
For threads, the locking/synchronization requirements are such that using some very simple synchronization techniques, it is very simple to get great performance that beats any IPC method that you can do between processes (except shared memory).You start with: "deadlocks and races didn't exist until they used threading"
... Duh, but fine, I'll give you that. But then you magically get to "just a little bit of threading is amazingly fast, and totally reliable" ... do you have any data for that?You can keep saying water goes uphill
... but until I see some data both myself and reality will keep pointing out the obvious:- 1. FACT: Oracle and postgres both use a process model with explicit sharing, and are considered extremely fast (much faster, more reliable and have more features than MySQL which uses the threading model, in fact).
- 2. FACT: thttpd / lighttpd / and-httpd are extremely fast web servers (beating netscape, IIS, etc. which are threaded).
- 3. FACT: apache-httpd uses a single process per. connection and even that is "fast enough" most of the time, but again it has beaten those "extremely fast" threaded designs in at least some benchmarks.
- 4. FACT: While some Coverity errors are false-positives, it is still finding a huge amount of real errors in projects it is running on
... and one of it's biggest assets is that it can "understand" locking rules and show flaws in them (and companies are buying it just for this ... because no human can get it right all the time). - 5. FACT: kernel developers have a huge advantage for threading, in that they can create their own primitive interfaces (you're stuck with pthread, unless you are totally insane) and have a much higher collective threading knowledge than pretty much any other project
... and still they are failing, again and again. - 6. FACT: out of the "user space" applications that I use which are threaded (firefox, pan), I repeatedly see threading related bugs
... these just don't need "amazing" performance, but it'd be nice if they didn't crash/deadlock. - 7. FACT: as soon as you "just add a little bit" of threading to an application, you can't just consider the code inside the mutex you have to consider one thread inside and as many other threads as possible in as many other places as possible in your code.
On top of that, I've spoken to enough younger programers who have been told "threading is easy" and believed it but produce crap, and of course it "seems to work" so they happily continue with their fantasy that threading is easy and tell even more people of their new toy.
-
Re:I preferred the old odd/even split
The main difference was that if 2.4.x was good for you there was a very good chance that 2.4.(++x) would be good for you as well. Now, however, nothing is off-limits; so that is less true.
You are, of course, forgetting about the (at least one) huge VM change. And then that even though it was somewhat "true", it also meant that the huge pile of people who needed/wanted, one of the many year+ old features like O1 scheduler, NPTL, epoll and AIO you were just screwed (although it was somewhat fun speaking to the newbies who wanted to run NPTL on their debian stable 2.4.x and refusing to be told the truth).
2.6.x is much like 2.4.x, you either need a team of highly skilled Linux kernel developers following vanilla
... or you hope/assume/outsource it to Red Hat/SuSE/Ubuntu/etc.[1] If you aren't doing one of the above you are just playing russian roulette with your boxes.[1] Anyone with arguments of the form "*BSD is better as they have a real stable" is referred to fig. 1
... Red Hat etc. are providing the stable branches, and having choice isn't a bad thing. -
Re:Unsafe Languages?
He does have a point, though. It *is* possible to use the standard C library string functions in a safe manner
It's possible for non-trivial applications? Can you provide an example then? I can provide a few examples that basically never use the std. str* functions, and are considered safe by their authors.
Saying C itself is unsafe, can be argued against (although what they generally mean is things like the apache-httpd malloc() attack are possible, while they aren't in Java -- although, again, that was inside a crappy str* string helper so would have gone away if apache-httpd used a half decent string API).
-
Re:The eyes are looking at the edges
You are talking about things that _you_ want, and I can even mostly agree. However it has been proven repeatedly that only a very small minority are willing to put security/quality before other design trade offs
... things might well get better in the next 10 years due to python/C#/etc. not having as significant downsides. But even so we are going to be stuck with C based programs for a long time, and there are still very few people who want to pay to do even the minimal fixes.Again, I'm not saying I wouldn't like to do that trade off, indeed I wrote my own secure Web server (with a monetary guarantee you won't be owned) because I didn't like apache-httpd
... but there just isn't the general buying power to bring secure software out of the niche. -
Re:The eyes are looking at the edges
You are talking about things that _you_ want, and I can even mostly agree. However it has been proven repeatedly that only a very small minority are willing to put security/quality before other design trade offs
... things might well get better in the next 10 years due to python/C#/etc. not having as significant downsides. But even so we are going to be stuck with C based programs for a long time, and there are still very few people who want to pay to do even the minimal fixes.Again, I'm not saying I wouldn't like to do that trade off, indeed I wrote my own secure Web server (with a monetary guarantee you won't be owned) because I didn't like apache-httpd
... but there just isn't the general buying power to bring secure software out of the niche. -
Re:Non-object oriented test tools?
-
Re:Non-object oriented test tools?
-
Re:configuring apache #1 complaint, still unaddres
Saying that web server performance is better than Apache-httpd is like saying fish can swim better than dogs, it's true
... but pretty meaningless. Apache-httpd developers have publicly stated that they don't consider performance a design goal, and their email server is actually Apache-httpd in disguise.As for lighttpd being "secure" it had a problem this year where you could bypass checks by using the NIL byte encoded. As a web server author I can only say "Like, Duh!"
... and after taking a 10 minute grep[1] of 1.4.7 I can see a buffer overflow already, I'm not sure how easy it is for a normal user to do (and it's only a couple of bytes) but I don't care too much either. The obvious DOS is there for the Range header, and the symlink "protection" is smoke and mirrors (stat check followed by a non-checked open()).It also doesn't seem to support Accept/Accept-Language negotiation, which I thought was pretty weird (feel free to disagree).
Obviously I'm somewhat biased, given that I've writen my own, but then I do have a "security guarantee" with it.
[1] For anyone who does care do a strncpy grep in http_auth.c (I looked at version 1.4.7)
-
Re:Programming Standards
Don't blindly try for ~100% coverage (you'll never get it anyway)
Neer, neeer
... *cough*, sorry about that. Of course, I also have a monetary reward for remote holes in my webserver ... so I'm probably not that normal :)). -
Re:Programming Standards
Don't blindly try for ~100% coverage (you'll never get it anyway)
Neer, neeer
... *cough*, sorry about that. Of course, I also have a monetary reward for remote holes in my webserver ... so I'm probably not that normal :)). -
Re:Languages with buffer overflows need to be avoi
A good start to our current security problem would be to stop writing internet based software in languages that allow buffer overflows to occur (e.g. C, C++). [...] Writing computer programs in these types of languages and patching the errors as they are found is simply not a scalable solution. It essentially means that if you write a program to be used on a network, you have to maintain and patch it forever because you'll never catch all the buffer overflows it contains
This is a huge over reaction, although tends to be happening anyway
... for other reasons (mainly because it's cheaper). I'll repeat what I've said before writing with a real string type in C, is not hard ... and provides all the same benifits. The problem is convincing people that it can be fast enough (and even now, speed outweighs security for most customers).Of course I'm also backing up my retoric with a $500 security guarantee, on my webserver that's written using only a real string type
... in C. Also AFAIK the only software in the OSS community that have offered such guarantees have all been written in C (and all with real string types).90% of security exploits are caused by buffer overflows. I've seen a figure like this in research papers, but it should be obvious to anyone from reading patch descriptions and current security alters
This may have been true but is now false, and has been for years. Even if you take into account integer overflows etc. that can be mitigated by real string types, it maxes out around 60%. I do have some stats. I did myself, from 2003
... and I've seen other more recent ones which say the same thing (I don't own them though, and I'm not sure they are public). -
Re:Languages with buffer overflows need to be avoi
A good start to our current security problem would be to stop writing internet based software in languages that allow buffer overflows to occur (e.g. C, C++). [...] Writing computer programs in these types of languages and patching the errors as they are found is simply not a scalable solution. It essentially means that if you write a program to be used on a network, you have to maintain and patch it forever because you'll never catch all the buffer overflows it contains
This is a huge over reaction, although tends to be happening anyway
... for other reasons (mainly because it's cheaper). I'll repeat what I've said before writing with a real string type in C, is not hard ... and provides all the same benifits. The problem is convincing people that it can be fast enough (and even now, speed outweighs security for most customers).Of course I'm also backing up my retoric with a $500 security guarantee, on my webserver that's written using only a real string type
... in C. Also AFAIK the only software in the OSS community that have offered such guarantees have all been written in C (and all with real string types).90% of security exploits are caused by buffer overflows. I've seen a figure like this in research papers, but it should be obvious to anyone from reading patch descriptions and current security alters
This may have been true but is now false, and has been for years. Even if you take into account integer overflows etc. that can be mitigated by real string types, it maxes out around 60%. I do have some stats. I did myself, from 2003
... and I've seen other more recent ones which say the same thing (I don't own them though, and I'm not sure they are public). -
Re:Languages with buffer overflows need to be avoi
A good start to our current security problem would be to stop writing internet based software in languages that allow buffer overflows to occur (e.g. C, C++). [...] Writing computer programs in these types of languages and patching the errors as they are found is simply not a scalable solution. It essentially means that if you write a program to be used on a network, you have to maintain and patch it forever because you'll never catch all the buffer overflows it contains
This is a huge over reaction, although tends to be happening anyway
... for other reasons (mainly because it's cheaper). I'll repeat what I've said before writing with a real string type in C, is not hard ... and provides all the same benifits. The problem is convincing people that it can be fast enough (and even now, speed outweighs security for most customers).Of course I'm also backing up my retoric with a $500 security guarantee, on my webserver that's written using only a real string type
... in C. Also AFAIK the only software in the OSS community that have offered such guarantees have all been written in C (and all with real string types).90% of security exploits are caused by buffer overflows. I've seen a figure like this in research papers, but it should be obvious to anyone from reading patch descriptions and current security alters
This may have been true but is now false, and has been for years. Even if you take into account integer overflows etc. that can be mitigated by real string types, it maxes out around 60%. I do have some stats. I did myself, from 2003
... and I've seen other more recent ones which say the same thing (I don't own them though, and I'm not sure they are public). -
Worse isn't better, it's just 90% don't want it
This all seems to be a rehash of the "worse is better" meme
... that those damn software programers/companies aren't doing what we want. The only problem is, it's all crack. Almost no customers, even now, are willing to pay more for "quality".Yes, I think all other things being equal, people will go towards quality/security
... but it just isn't high on anyones list. Cheap, features, usable ... and maybe quality comes in fourth, maybe.And, yes, there are exceptions
... NASA JPL obviously spend huge amounts of money to get quality at the expense of everything else, and I say this having written my own webserver because apache-httpd had too many bugs (which comes with a security guarantee against remote attacks) ... but I'm not expecting people to migrate in droves from apache-httpd, it's got more features. The 90%+ market share have spoken, consistently, and they just don't care about the same things Bruce and I do.I have a lot of respect for Bruce, but the companies really are just producing what most people want
... so stop blaming them. -
Re:The MAIN GCC developer...
And while he happens to be right in this case, I don't think very highly of him.
[...]
Drepper however has repeatedly refused to include them (strlcpy/strlcat) because they work and they make it too easy to not code buffer overflows (no this is not a joke).While Ulrich has his faults, the above is completely false. The reason they weren't accepted into glibc was IIRC:
1) They are non-std. and did not have a usable standard like definition apart from the implementation and had no tests (Solaris implemented them slightly differently, for example, and Input Validation in C and C++ from oreilly also screwed it up -- and that was written by people selling a Secure codeing in C book).
2) It doesn't solve the problem better than asprintf() which had been around for years (although also non-standard), as you still have problems with truncation (and both APIs have the problem of requiring the programer to correctly pass around the meta data about the string -- Ie. it's size/length).
3) Given the above, and the fact the implementation is "free" then anyone wanting to use them can just include the source in their apps. and rely on autoconf (and they'll also be guaranteed to have the "correct" implementation). -
Re:No supprise
A good number of OSS zealots (of which a good number are found here) have the need to believe that OSS is always better, in every case, and part of that is not admitting faults. You admit faults, you admit the possibility something else could be done better.
OSS != Apache-httpd, yes apache-httpd is OSS
... but there is more than one OSS httpd server. Indeed, I disliked apache-httpd so much I wrote my own you is OSS. Part of the benefit of OSS is that a single organisation cannot say "thou shalt always use the one true httpd server".Secondly, yes, apache-httpd has it's faults
... but everything I heard about IIS, Java Sun ONE server etc. is that they have much more of them. And, if you want to fix/audit apache-httpd you can ... right now, free of charge. Want to do the same with ISS, yeh, good luck with that. -
Re:Thanks Microsoft!
Browsers generally contain parsers for a large number of file types, and parsers are notorious for security issues
You mean "parsers written using common C string handling techniques are notorious for security issues". There are other string handling libraries such as Vstr that aren't as vulnerable to buffer overflow, but many programmers who work with C or C++ don't know about them.
-
Re:Varying levels of seriousness...
Blanket statements like this (and like "Goto is evil") do nothing to help improve the quality of software as we know it. strcat() is not evil. Using strcat on uncontrolled/unmonitored input on buffers whose memory allocation we are unsure of IS.
Blanket statements like "wheels should be round" do nothing to help improve the advancement of cars? Or maybe not so much.
Sure, often blanket statements stop people from doing good as well as bad things
... but even that isn't such a bad thing. In the case of strcat() or say strncpy() it is easy to prove that something else is always better, even if it's just a simple wrapper around memcpy() or memmove().But it's also fair to say that NIL terminated "C strings" are a terrible idea for humans. Too much information needs to be kept inside the programer's head, and a single mistake has too high a price.
Of course, being the huge Apache Runtime fan that I am, I would write something like this myself in most "real" cases: [snip poor usage of apr_pstrncat()]
Of course I, on the other hand, wrote my own web server which uses a string library and doesn't directly manage buffers, mainly because I was updating apache every few months from the latest remote exploit.
And while testing it saw a client die because it was using something like what you posted for each header that was returned by the server
... return a lot of headers and exponential memory growth is a nice DOS remote exploit. -
Re:Varying levels of seriousness...
Blanket statements like this (and like "Goto is evil") do nothing to help improve the quality of software as we know it. strcat() is not evil. Using strcat on uncontrolled/unmonitored input on buffers whose memory allocation we are unsure of IS.
Blanket statements like "wheels should be round" do nothing to help improve the advancement of cars? Or maybe not so much.
Sure, often blanket statements stop people from doing good as well as bad things
... but even that isn't such a bad thing. In the case of strcat() or say strncpy() it is easy to prove that something else is always better, even if it's just a simple wrapper around memcpy() or memmove().But it's also fair to say that NIL terminated "C strings" are a terrible idea for humans. Too much information needs to be kept inside the programer's head, and a single mistake has too high a price.
Of course, being the huge Apache Runtime fan that I am, I would write something like this myself in most "real" cases: [snip poor usage of apr_pstrncat()]
Of course I, on the other hand, wrote my own web server which uses a string library and doesn't directly manage buffers, mainly because I was updating apache every few months from the latest remote exploit.
And while testing it saw a client die because it was using something like what you posted for each header that was returned by the server
... return a lot of headers and exponential memory growth is a nice DOS remote exploit. -
Why didn't they use anything like Vstr?
Even C and C++ have mechanisms for safe string handling. C++'s std::vector and std::string types can be configured with buffer checking, and judicious use of a decent string handling library can solve the problem for C. Thus, I see the problem as programmer ignorance of the available libraries rather than any inherent defect in the languages themselves.
-
Re:Good idea, but a time-sucker
#1: No, the bugs aren't in the UI itself, the bugs originate through the UI. Users can do things in such a way that you simply can't predict.
Then the unit tests could/should have been better. Sorry, it really is that simple.
#2: But you know, if you have time, there are LOTS of things you can do that might help, or might not. I stopped writing unit tests thinking that they one day *might* catch a bug.
True, and if you can get a large amount of free debugging (like a prominent OSS project) then any personal testing (like writing unit tests) might not be worth it for you, because getting X hundred people to just use it is going to exercise the normal paths of code pretty well. Of course that isn't going to find subtle bugs or secutiry flaws (and unit testing isn't guaranteed to either, but IMO it's much more likely).
I don't hack away at code, so I don't think unit tests will save me from something I don't do. My interfaces are developed exactly at the level of quality called for; no more and no less. So in the sense that unit tests can help me "think"...I never found myself lacking in that area. Perhaps it's something that helps less experienced programmers ramp up...I don't know. I've been programming for 20+ years and I'm alright in that regard.
Of course you don't intentionally write bad code, and neither do I (and I'll admit I assumed I made much fewer mistakes before I tested it all). But I presume you wouldn't compile C without any warnings turned on, or not use prototypes in header files. Why? Because the computer is much better at telling you that... Duh! fprintf("%s\n", foo); isn't what my brain wanted my hands to type. This is how I think of unit tests, they aren't there to catch design mistakes because I'm good at that
... they are there to "prove" the implementation of the design "works".I'm also pretty sick of seeing/using other people's code and it having obvious bugs in it
... "testing is not having to say you're sorry". For instance, when testing Vstr I found 4 or 5 bugs in glibc with some of the more esoteric uses of the *printf() functions when using the double formatters. When I found them, half of me was happy ... the other half depressed. -
Re:Good idea, but a time-sucker
I did unit testing (once upon a time), and even developed my own test suite for C++, but I find that it catches VERY few bugs and I end up spending time writing unit tests AS WELL AS hunting down bugs the same old ways I always have.
Sorry to let you know but, you didn't write good unit tests and probably did waste your time. I've found very close to 100% of the bugs in Vstr a network IO string library using unit tests. That includes a couple of ones that would have been damn hard to track down otherwise.
However it's been over a year since 1.0.0 which had a unit test for every function and every function option, to the last release which had over 99% code coverage found a couple of weird corner case issues (not just bugs, but optimizations that could never be reached for some reason). And going from 98% coverage to 99% coverage took a significant time investment, and required significant thinking about how the test should be written.
As with much software development, it's easy to write simple tests that don't show much and aren't very useful. It's much harder to write tests that find bugs (and you have to appraoch writing the tests with a very different mindset to how you approach writting the code you are testing. This is not even close to being "Like picking lint from your belly-button."
-
Re:Good idea, but a time-sucker
I did unit testing (once upon a time), and even developed my own test suite for C++, but I find that it catches VERY few bugs and I end up spending time writing unit tests AS WELL AS hunting down bugs the same old ways I always have.
Sorry to let you know but, you didn't write good unit tests and probably did waste your time. I've found very close to 100% of the bugs in Vstr a network IO string library using unit tests. That includes a couple of ones that would have been damn hard to track down otherwise.
However it's been over a year since 1.0.0 which had a unit test for every function and every function option, to the last release which had over 99% code coverage found a couple of weird corner case issues (not just bugs, but optimizations that could never be reached for some reason). And going from 98% coverage to 99% coverage took a significant time investment, and required significant thinking about how the test should be written.
As with much software development, it's easy to write simple tests that don't show much and aren't very useful. It's much harder to write tests that find bugs (and you have to appraoch writing the tests with a very different mindset to how you approach writting the code you are testing. This is not even close to being "Like picking lint from your belly-button."
-
TDD vs. unit tests
I'm far from convinced that TDD is actually a good approach. Although it's pretty obvious that without testing the code is often trivially buggy, and unit testing is the cheapest way to perform testing. For instance this kind of thing is all too easy to do with TDD.
For unit tests you want to write your code, and then look at the best set of unit tests to do complete code coverage. For an OSS e3xample of that you can look at Vstr string library and the code coverage for that project.
-
TDD vs. unit tests
I'm far from convinced that TDD is actually a good approach. Although it's pretty obvious that without testing the code is often trivially buggy, and unit testing is the cheapest way to perform testing. For instance this kind of thing is all too easy to do with TDD.
For unit tests you want to write your code, and then look at the best set of unit tests to do complete code coverage. For an OSS e3xample of that you can look at Vstr string library and the code coverage for that project.
-
Re:Um
If you amortize the time the GC spends in collection over the allocations, the average allocation isn't that much slower than the corresponding malloc/free. Best of all, the gap is shrinking. Soon, GC may be FASTER overall than malloc/free in many real-world situations. It obviously depends on memory usage patterns and collection strategies, but it is starting to happen.
I've heard that for years, it's yet to be true for the general case
... and most people doing manual allocation don't call malloc/free for every single grow/shrink operation. The caching slab allocator was put into solaris a long time ago now.If you're using GC, your program doesn't have to do all of the bookkeeping anymore.
The keyword being all, yes you can sometimes just completely forget about designing how you use memory and everything "works". However sometimes those bits of memory have other resources associated with them.
An L1 cache miss costs around 4-10 cycles; an L2 miss can cost 100-400 cycles; a page fault costs millions. The CPU time spent in garbage collection can become insignificant when compared with storage access time.
You seem to be under the impression that running the GC isn't going to blow your cache
... why is that?But compacting GC implementations are starting to take things like that into consideration when they collect, and they rearrange the memory of the process to maximize cache hits and minimize memory waste.
And the GC knows when to move memory, and when it's just wasting CPU/cache
... how? It's fairly simple to have double pointers (both win32 and MacOS9 use HANDLE types a lot IIRC). The big problem is working out when you can/should move them.GCs usually collect on a separate thread. That means that with a properly designed collector, while your program is blocked on IO or waiting for user input, the GC might be cleaning up the heap on a low priority thread. With luck, your main thread might NEVER actually be interrupted for a collection
If you actually want threads
... which many of us don't. It also seems to think that cross CPU cache invalidates are free (accessing the same data from more than one CPU -- this can have a massive cycle cost), and that the GC does some kind of magic locking which will never affect the application.It's also amusing that, by definition, it means that as your application does more work (and hence needs more CPU) it is guaranteed to not get it
... because you will then suddenly start fighting with the GC thread.While I don't think GC is quite to the point where it is free or beneficial to the performance of the average application, it is a lot less harmful than most people think.
It's my experience that the people who think GC is expensive generally know what they are talking about, and are in the minority. Most people either don't need to care (GC is good enough for them -- fair enough, I've written things where I used perl and didn't need to know how it allocated), or just don't care.
There are other ways of managing memory than just giving up and saying well the GC will save me. Vstr, which is a decent implementation of what the IBM article was trying to say removes almost all the allocation/deallocation headache when dealing with byte data
... but it's still controllable and predictable. -
Re:Gee, isn't that handy
Comprehensive, that's for sure, but the examples look like it's also reinventing FILE* in the io_* API.
The io_* API is part of the examples, and not in the library itself. But it's a pretty small wrapper over what's in the library
... and, yes, the library itself was designed so it could do non-blocking IO (which FILE* can't). So in that regard, yeh, I don't tend to use FILE* anymore.glibc's fmemopen() moots most of the IBM article, I think, but since I don't code exclusively in a glibc environment... Grrr... If only POSIX specced out FILE* a bit tighter...
Yeh, fmemopen() would have helped a lot. Esp. if you could have implemented asprintf() via. it (Ie. expanding memory regions). But it's completely unportable (and I wouldn't trust even the glibc implementation
... as hardly anyone uses it). But even then vstr_* is very nice in that it crosses both a IO stream like API with a string API. So you can get data from an fd etc. and immediately split it, or search for something without having to copy it into a string API. Then again, I'm biased :). -
Re:Hmmm
This is simple to do, and avoids a lot of errors. It's also not much of a headline.
While I agree the IBM "research" article is terrible, the idea behind it isn't.
Actually having donetests and benchmarks. I can safely say:
- It's not the simplest solution.
- It's certainly not anywhere near fast.
-
Re:Hmmm
This is simple to do, and avoids a lot of errors. It's also not much of a headline.
While I agree the IBM "research" article is terrible, the idea behind it isn't.
Actually having donetests and benchmarks. I can safely say:
- It's not the simplest solution.
- It's certainly not anywhere near fast.
-
Re:realloc
But linked lists are worse on performance than realloc.
Care to back that up with some facts.
Also linked lists are very prone to leaking and very hard to figure out which one to free first.
So you a) have the library code do that
... and b) have lots of tests. Of course you want to do both of those for something using a single block of memory too, and if you want it to be efficient it usualy does something clever to avoid copies ... and so is probably more likely to screw up and use/deallocate the wrong thing. As for leaking memory, that's pretty much built into the design with a single block of memory model, unless you want to keep calling realloc() to shrink the buffer (then you just blow the fragmentation of malloc to hell -- which if you're lucky will just make things slower).The "idea" of the article wasn't bad, it's just a very bad description of the proposed solution
... and a couple of years too late. -
Re:realloc
But linked lists are worse on performance than realloc.
Care to back that up with some facts.
Also linked lists are very prone to leaking and very hard to figure out which one to free first.
So you a) have the library code do that
... and b) have lots of tests. Of course you want to do both of those for something using a single block of memory too, and if you want it to be efficient it usualy does something clever to avoid copies ... and so is probably more likely to screw up and use/deallocate the wrong thing. As for leaking memory, that's pretty much built into the design with a single block of memory model, unless you want to keep calling realloc() to shrink the buffer (then you just blow the fragmentation of malloc to hell -- which if you're lucky will just make things slower).The "idea" of the article wasn't bad, it's just a very bad description of the proposed solution
... and a couple of years too late. -
Re:realloc
But linked lists are worse on performance than realloc.
Care to back that up with some facts.
Also linked lists are very prone to leaking and very hard to figure out which one to free first.
So you a) have the library code do that
... and b) have lots of tests. Of course you want to do both of those for something using a single block of memory too, and if you want it to be efficient it usualy does something clever to avoid copies ... and so is probably more likely to screw up and use/deallocate the wrong thing. As for leaking memory, that's pretty much built into the design with a single block of memory model, unless you want to keep calling realloc() to shrink the buffer (then you just blow the fragmentation of malloc to hell -- which if you're lucky will just make things slower).The "idea" of the article wasn't bad, it's just a very bad description of the proposed solution
... and a couple of years too late. -
Re:Gee, isn't that handy
The fundamental problem is that this sort of thing needs to be done at the C library level. And if it's not done in a flexible fashion, you end up with a library call that rarely gets used. Anyone used hsearch() lately?
Take a look at, Vstr I think it's pretty flexible
... it certainly has much better researched documentation than the content for this IBM "research" article. -
Vstr
The article basically proposes a very bad implementation of Vstr, most of the advise was extremly simplified at best but more likely just uninformed: an "efficient" abstract buffer that mixes shorts and pointers -- words almost fail me, how to solve the problem of "what do you do with the data when it's all in the buffer" -- "let's just copy it back out again (hey whats a couple of extra copies between friends). Representing in memory object sizes with "long int" *sigh*.
If you are interested in the article, go read this explanation of why you want it for security and this explanation of why you want it for speed .
Vstr is LGPL, has actual benchmark data behind the block sizes it picks, has an extensive test suite
... and has documentation for the many functions that come with the library (including a fully compliant printf like function). Of course, I don't have a PhD ... but after reading this, you might well count that as a plus too -
Vstr
The article basically proposes a very bad implementation of Vstr, most of the advise was extremly simplified at best but more likely just uninformed: an "efficient" abstract buffer that mixes shorts and pointers -- words almost fail me, how to solve the problem of "what do you do with the data when it's all in the buffer" -- "let's just copy it back out again (hey whats a couple of extra copies between friends). Representing in memory object sizes with "long int" *sigh*.
If you are interested in the article, go read this explanation of why you want it for security and this explanation of why you want it for speed .
Vstr is LGPL, has actual benchmark data behind the block sizes it picks, has an extensive test suite
... and has documentation for the many functions that come with the library (including a fully compliant printf like function). Of course, I don't have a PhD ... but after reading this, you might well count that as a plus too -
Vstr
The article basically proposes a very bad implementation of Vstr, most of the advise was extremly simplified at best but more likely just uninformed: an "efficient" abstract buffer that mixes shorts and pointers -- words almost fail me, how to solve the problem of "what do you do with the data when it's all in the buffer" -- "let's just copy it back out again (hey whats a couple of extra copies between friends). Representing in memory object sizes with "long int" *sigh*.
If you are interested in the article, go read this explanation of why you want it for security and this explanation of why you want it for speed .
Vstr is LGPL, has actual benchmark data behind the block sizes it picks, has an extensive test suite
... and has documentation for the many functions that come with the library (including a fully compliant printf like function). Of course, I don't have a PhD ... but after reading this, you might well count that as a plus too -
Re:Wow
Uh, ours does. In fact, we test every piece of code that goes to a customer on a dozen different hardware pieces, we have a unit of each model of printer that we've okayed for use (some 30 or 40 units) and for big releases we deal with several large beta customers before release.
And our company only employs 20 people. Every minute spent testing is a minute we could be making a new product...but supporting the old stuff is what makes us so popular with the customers we have, and it's why they pay support costs every year and buy our new stuff when it comes out.
In fact, now that I think about it, every company I've worked with since I started my professional career had a very serious and very adept quality team on our side. Most of the time they were structured in such a way that QA was working actively AGAINST the release of any software...playing a sort of programmatic Spy vs. Spy with the developers. The result is stronger software faster, which contributes to the bottom line.
Oh, my. Where do you work? Far to many places I've worked at companies the QA has been something that people assume the programer does (mostly for internal code)
... or they have a QA team, but it is "controlled" by the engineering managers. One very large company I worked for had a seperate QA dept. however if they found any bugs that they thought should hold up a release (and they had 3 days to do it in) then they had to ask the development dept. to stop the code from shipping. No matter what the bug, they were powerless to do anything. Of course the same dept. had no unit tests and used "RCS" for version control.I *LIKE* open source, but the existing mechanisms for testing are really terrible, even if the bug repair response can be great. And since there's no accountability, there's little enforcement for responsibility
This isn't true, my goddamn name is in the authors file and the web site etc. Not some nameless corporation. I am completely accountable, which is why Vstr has over 98% code coverage and the test suite is almost half the size of the code. This is very unlike when I work on code inside a company, where often can we deploy it is the only question asked and time just isn't allocated to do it to the same level of quality. It would probably be more fair to say that OSS is often more concentrated in it's accountability.
-
Re:Wow
Uh, ours does. In fact, we test every piece of code that goes to a customer on a dozen different hardware pieces, we have a unit of each model of printer that we've okayed for use (some 30 or 40 units) and for big releases we deal with several large beta customers before release.
And our company only employs 20 people. Every minute spent testing is a minute we could be making a new product...but supporting the old stuff is what makes us so popular with the customers we have, and it's why they pay support costs every year and buy our new stuff when it comes out.
In fact, now that I think about it, every company I've worked with since I started my professional career had a very serious and very adept quality team on our side. Most of the time they were structured in such a way that QA was working actively AGAINST the release of any software...playing a sort of programmatic Spy vs. Spy with the developers. The result is stronger software faster, which contributes to the bottom line.
Oh, my. Where do you work? Far to many places I've worked at companies the QA has been something that people assume the programer does (mostly for internal code)
... or they have a QA team, but it is "controlled" by the engineering managers. One very large company I worked for had a seperate QA dept. however if they found any bugs that they thought should hold up a release (and they had 3 days to do it in) then they had to ask the development dept. to stop the code from shipping. No matter what the bug, they were powerless to do anything. Of course the same dept. had no unit tests and used "RCS" for version control.I *LIKE* open source, but the existing mechanisms for testing are really terrible, even if the bug repair response can be great. And since there's no accountability, there's little enforcement for responsibility
This isn't true, my goddamn name is in the authors file and the web site etc. Not some nameless corporation. I am completely accountable, which is why Vstr has over 98% code coverage and the test suite is almost half the size of the code. This is very unlike when I work on code inside a company, where often can we deploy it is the only question asked and time just isn't allocated to do it to the same level of quality. It would probably be more fair to say that OSS is often more concentrated in it's accountability.
-
Re:My response to howard@princeton.edu
Congratulations, that was one of the most brilliant pieces of flamebait I've ever seen or read. It had everything
It wasn't bad, I'm not sure whether it was a troll or stupidity. However, assuming it was a troll, as with all good trolls there were grains of truth in it...
I don't know of a single open-source / free software project that doesn't use version control.
Really? Well, I don't use anything more than diff and tarballs of releases. Linus didn't have a public one for a long time, and I know other "well known" contributors who also just use the diff and tarball method. Also I'd be willing to bet that most OSS projects don't use version control, CVS just happens to be how the code is stored in sf.net. Also you could argue that CVS is a pretty bad source control mechanism, esp. given the use case for OSS programers. Arch. looks like it could be good soon. However the comercial market has bitkeeper, perforce and Clearcase
... which are all usable and used right now.Coding Standards?
There is a difference between large and small projects. For instance most smaller projects that only have one main contributor shouldn't have a coding standard, because it should all look the same and common best practice would dictate that patches be in the same format. And if they aren't then it's a relatively small problem to reformat them by the main contributor.
However the larger the project is, the more an official coding std. is needed as it's much harder for a single person to keep an eye on everything. Indeed some of the larger code infrastructure in Linux like the kernel, gcc and glibc are very anal about Coding Stds.
Quality Control?
... "all bugs are shallow, given enough eyeballs?"Again, in a smaller project, the Quality Control will basically be a function of what the main contributor does (as there are very few extra eyeballs). Some projects have very good "traditional" QA measures.
It's also worth pointing out that even though the larger projects will tend to get a lot more bug reports/fixes, for the normal code paths, it doesn't necessarily hold that all code paths will have been tested. For instance see wu-ftpd or sendmail, here you really do need good code and some real QA (see vsftpd and qmail/exim). However the comercial offerings tend to fair just as badly, or even worse.
Support?
... Modifing code?The point is not that A company can provide support, or alter the code. It's that anybody with sufficient knowledge can.
-
Re:My response to howard@princeton.edu
Congratulations, that was one of the most brilliant pieces of flamebait I've ever seen or read. It had everything
It wasn't bad, I'm not sure whether it was a troll or stupidity. However, assuming it was a troll, as with all good trolls there were grains of truth in it...
I don't know of a single open-source / free software project that doesn't use version control.
Really? Well, I don't use anything more than diff and tarballs of releases. Linus didn't have a public one for a long time, and I know other "well known" contributors who also just use the diff and tarball method. Also I'd be willing to bet that most OSS projects don't use version control, CVS just happens to be how the code is stored in sf.net. Also you could argue that CVS is a pretty bad source control mechanism, esp. given the use case for OSS programers. Arch. looks like it could be good soon. However the comercial market has bitkeeper, perforce and Clearcase
... which are all usable and used right now.Coding Standards?
There is a difference between large and small projects. For instance most smaller projects that only have one main contributor shouldn't have a coding standard, because it should all look the same and common best practice would dictate that patches be in the same format. And if they aren't then it's a relatively small problem to reformat them by the main contributor.
However the larger the project is, the more an official coding std. is needed as it's much harder for a single person to keep an eye on everything. Indeed some of the larger code infrastructure in Linux like the kernel, gcc and glibc are very anal about Coding Stds.
Quality Control?
... "all bugs are shallow, given enough eyeballs?"Again, in a smaller project, the Quality Control will basically be a function of what the main contributor does (as there are very few extra eyeballs). Some projects have very good "traditional" QA measures.
It's also worth pointing out that even though the larger projects will tend to get a lot more bug reports/fixes, for the normal code paths, it doesn't necessarily hold that all code paths will have been tested. For instance see wu-ftpd or sendmail, here you really do need good code and some real QA (see vsftpd and qmail/exim). However the comercial offerings tend to fair just as badly, or even worse.
Support?
... Modifing code?The point is not that A company can provide support, or alter the code. It's that anybody with sufficient knowledge can.
-
Re:Very interesting comment about GNU libc
So, did his Hello World support multibyte character sets, or, in fact, any sort of internationalization?
Without being nice about it, dietlibc is a piece of shit. If you just want a syscall list and the obvious functions (memcpy() etc.) use klibc. If you need more then dietlibc is almost certainly broken IMO, everytime I've looked at something "big" in it the implementation was worthless. In fact the printf() like function is unique only in how terrible the implementation is, and that's probably the most widley used function in libc.
As for multibyte, that's no problem because Felix is using a broken string library, which is one of the few things that tries to forgo the use of printf() making i18n almost impossible.