glibc 2.2.x has a number of really nice little quirks that you can use to help debug memory problems. Among my favourites are:
MALLOC_CHECK_
If you set the environment variable MALLOC_CHECK_ before running a program, glibc uses a slow but thorough variant of malloc to do some checking on buffer overruns, double-frees, etc... Setting MALLOC_CHECK_ to 0 makes it ignore problems, 1 causes it to print a diagnostic to stderr, and 2 causes it to print a diagnostic and abort(). All of this is the glibc malloc(3) man page.
MALLOC_TRACE and mtrace()
If you "#include " in your source, you can call mtrace(3) at some point in your code. This function looks for the environment variable MALLOC_TRACE which it then logs all malloc(3)s, free(3)s, realloc(3)s and calloc(3)s to. When your program is finished, you can run the mtrace(1) perl script (also supplied with glibc) to run through this log, and print out a list of all unfreed memory, all freed, unallocated memory, all double-freed memory and probably a bit more besides. It's really handy.
I tend to put the "#include " and "mtrace()" calls inside "#ifdef HAVE_MTRACE" guards, and then add "-DHAVE_MTRACE" to my CFLAGS when compiling debug builds.
The documentation for this can be found at http://www.gnu.org/manual/glibc-2.2.3/html_chapter / ibc_3.html#SEC37
malloc() and free() are weak symbols.
glibc's copy of free(3) is a `weak' symbol in the library. What this means is that you can write your own functions called malloc() and free() in your program, and those will be called all the time, instead of the proper ones. You can call the originals with _malloc() and _free, or __malloc() and __free() (can't remember which, think it's the first pair.) and do little extra checks and things yourself. (Such as filling memory with bogus data before returning, etc..., to make sure you're not forgetting to zero some bytes here and there for example.
gdb is also really great too and has loads of stuff that I've not found in other debuggers. Check out the manual sections on `ignore' (to ignore a breakpoint x times to catch the (x + 1)th malloc), and `commands' (to automatically print out variable values and continue for example) w.r.t. breakpoints.
Wow, I've not really played close to the limits on my machines in the past, so I didn't know about that. But that's really dumb.
Surely if a process tries to malloc(3) more memory than it's limit, all that should happen is that malloc(3) will return NULL.
And according to the malloc(3) and brk(2) man pages on my system, that's all that does happen. I can't find any stuff about signals being sent. Which signal is it, and is it catchable? (A look through signal(7) doesn't bring up anything obvious sounding either)
The incentive is that, as the software evolves alongside the work we're doing, it becomes increasingly hard to keep our own diffs and keep re-merging them into the main base.
So, we have two options. These are 1) Don't update the base software, and lose out on the continuing community development that gave us this great software in the first place. 2) Try to get as many of our changes as possible into the main base, and allow other people (as well as the guys we're paying full-time) to study them, debug them, refactor them, build on them, etc..., making our software even better still.
(2) keeps our diffs as small as possible (there still may be some - there may be some changes that the maintainers don't think are suitable for their product, but that we still need.) and makes our job easier. And if, as someone else has pointed out, our hackers do get a better job elsewhere and need to leave, if we've insisted that as much source as possible get contributed back to the main product, it's more likely to be documented, more people are likely to have looked over it, and our chances are better of being able to hire someone new who's capable of hitting the ground running. (This still helps if we just want to hire an extra hand to do even more work).
There's a few reasons why contributing back is good for business, and I've been thinking of as many as possible to convince people Higher Up that it's worth doing the Right Thing, without just saying "But it's the Right Thing to do", 'cos abstract ideals don't fit in well with business.
I'm a programmer, but I look at software as a way of getting something else done, something that makes money.
The company I work for sells real things to people (toasters, etc...). That's the business we're in, that's how we make money. We compete in the marketplace on the range of goods we offer, the price we offer them at, and the after sales service we provide for when these real things wear out and break down. We use software to help us achieve that goal as efficiently as possible.
To us, it doesn't really matter if the software we use (web servers, word processors, email programs, databases) is the same as the software used by our competitors - in fact it's quite likely they're using a lot of the same software from the same supplier. Our only goal is to get our software to do what we want as cheaply as possible.
So if we can hire 2 shit-hot hackers to work on this open source database system to control our stock, and that turns out to be cheaper or even comparable to however many licenses of the closed-source product we need, great. Because not only do we have the database we need, but we've got our own guys supporting it in-house who know it inside and out, who we can just *ask* for support.
It doesn't matter if our competitors have their own hackers working on the same product, becuase the more our guys _and_ their guys improve this software, this means to an end, the better we can all compete in the marketplace on what we do - on selling toasters, and not on what software and support contracts we happen to have.
Hey, it'll be alright providing you don't give it conflicting goals that it can't complete, without telling it's chief programmer what conflicting goals you've given it due to `security' reasons, so he has no hope in hell of figuring out _why_ his baby is acting so damn strange.
The live show has costumes, dancing, scenery, pyrotechnics, lights, all kinds of bullshit that the serious musician and music affictionado doesn't really need.
Dude, who do you go see play? Most bands I know go and stand up there with their instruments (OK, they generally have lights so we can see them) and play for an hour or two, exchanging some banter with the crowd between songs, and generally provide a damn good piece of entertainment.
But what, they've got their instruments (which they have already), a bunch of amps and speakers, a roadie or two to set it up, a downpayment on the venue, travel to get there and a nights accomodation on the minus side, and my (and a thousand or so people like me) few bucks from the door, a small percentage of the take from the bar(s) (which can still add up), and some small profits from t-shirts and CDs they can flog there and then on the plus side.
The advertising they get pretty damn cheap, as the venues tend to organise the posters and such for who's playing when and stick them all over town, becuase its in their interest to attract people and make money from the bar.
I'd have thought that colonies, once they are of the size to sustain civilian populations (as opposed to being just researchers and scientists) would want to form their own government and laws, as opposed to being ruled by a bunch of `foreign' (alien?) beaurocrats.
Yeah, they might base their laws (and constitution?) on that of the US, cos it seems a pretty good starting place, but to be ruled by a far off land, and have to pay federal income taxes to a place tens of millions of miles away? Come on, you Americans must be in a uniquely qualified position to know that colonies don't like to do that!
Ugh - why on earth would you want to go back to writing? It's so bloody *slow*. I hate writing. It takes forever, and having to differentiate my written symbols for ( { < and [ enough so that any intelligence (human, artificial or other) can decipher them on their own (without context) would really start to annoy me.
Typing's quicker and more precise.
I suppose at least for an English speaker though. I guess if you speak a language with characters that aren't neccessarily part of 7-bit ascii, things can get a little more complicated...
I put that down to him acutally being old (850+; in Empire he mentions to Luke that "when 900 years old you reach, look this good you will not, hmmm?") and quite fragile.
But, when he needs to, he can use the force to make the cane redundant, and allow him to bounce around just as fast as he can imagine, and overcome the limitations of his old and fragile body.
Most of the time he doesn't bother though, 'cos it requires a load of concentration and it's easier to just hobble around with a stick. Being wise, old and a generally serene kind of guy, he doesn't feel the need to rush everywhere at top speed, and is content with that.
I'm sure that UBE would be easily identifiable by a google type of database as practically no mail will exist that goes _back_ to the source.
Filters based on that (to either look for UCE, or to discard it) would probably be trivial based on ratios of sent/received messages to/from a particular envelope.
He's wrong about one thing. Email does have links. It has links indicating who it came from and who it went to. Even without the content, that sort of information, about who is talking to whom, and in what patterns, can be really informative to those who know what they're looking for.
But given a `literal' interpretation of the change you suggest (`That should be "second-year."' and not `That should be "second-year".') would leave our original sentence to be `I'm a second-year. high school teacher...', which has an extraneous period!
No, a 'long' in C++ is _at_least_ 32 bits. It may be more. Especially on 64-bit systems.
C++ doesn't have a 'long long' type. You're thinking of C99. Or a compiler-specific extension. (And, IMO, 'long long' is really fscking ugly. 'longlong' (one keyword) would have been better if it was _really_ needed, which it wasn't. Stupid decision by the C9X committee if you ask me....:-)
1) Factoring primes is easy. I can factor any prime you care to name in my head in no seconds flat.
2) The algorithm is a test of how quick the code is to run, not how quickly you can produce primes. If they made a better (say 2 times as quick) algorithm, but wanted to eliminate startup and shutdown costs as much as possible, they'd make the loop longer (say twice as long) to get the same amount of runtime.
Stop lumping everyone who reads/. in the same boat. One guy, a Free Software advocate, shows elitist behaviour (and hence probably doesn't read/.:-) and you surmise that _we_ are elitists? Just because some of us also support Free Software, or because some of us hate Micro$oft?
Strangely, I find that running code on a number of OSs is a good way to find bugs.
Reading the contents of newly allocated memory before initialising it, for example (I did a cut and paste and got a couple of lines the wrong way round once, a long time ago), could give you what you expect on one platform (all initialised - coincidentally - to zeros) and if that was the only platform you tested on, well, there's a latent bug in there.
Test the code on another platform, and it all falls apart within the first couple of seconds of initialisation. You soon track that one down.
Uh, dude, paragraph and heading tags _are_ content markup and _not_ presentational markup. They delineate the structure of the document; not how it looks.
Unicode isn't a character encoding. It's a character set. There's a difference. And Unicode, the character set, can be encoded in a number of different formats, including, as you mention, UCS2.
However, UCS2 can't hold all the Unicode characters, as there are code points above 65536 that are currently in use.
Encodings that do work are UTF-8, which is sort of compatible with ASCII - all ASCII programs should read and re-write UTF-8 encoded text properly, even if they don't display it correctly.
UTF-16 is a multi-character 16-bit format for encoding Unicode in multiple 16-bit characters.
UCS4 is a 'native' encoding of Unicode, with a 1-to-1 code point to character code mapping. But not many programs use it.
Don't use Word. Ever. That's almost as bad as using pkzip encryption. Word encryption is worse than useless as it gives you a false sense of security.
Spelling/grammer checking?
Um - what word processors don't have spell checkers? Grammer? Well, I suppose it's nice, but if you can't string a sentence together that scans properly, go back to school and get an education. And that's only a reason to use Word as an editor. That's not a reason to send the final version as a Word attachment. Sure, write in word. But why not still send as plain text. Most of the stuff I get as word attachments is just that - plain text. Just wrapped up in a huge word document.
Correction/collaboration? OK - you might have me there. I've no idea of how Word's version works or if any other package has it or not, as I've never had a need for it. *shrug*
Doxygen is great for producing API references with source code cross-references, if that's all the documentation you need. I've no problems with it there. It rules.
But for user-level documentation, or even developer-level general overviews of source organisation, resource ownership policies, etc..., I'd have to say it's not the idea tool for that. But then, that's not really what it was designed for.
I'd take a closer look at Docbook and the fairly large set of untilities that exist for converting it to HTML, TeX, man, texinfo, etc... Check http://www.oasis-open.org/docbook/
When doxygen's xml/docbook output format is sorted, even this could be moved that way too...
Uh - dude, the RIAA and MPAA may be different trade organisations, but they're both made up of the same bunch of guys. Sony. News Corporation. AOL-Time Warner. etc...
And it's the members that have the power in a trade organisation. Yeah, Hilary Rosen and Jack Valenti are the heads, but they're just front people both paid by the same masters.
glibc 2.2.x has a number of really nice little quirks that you can use to help debug memory problems. Among my favourites are:
r / ibc_3.html#SEC37
r /g db_6.html#SEC341 .1/html_chapter/g db_6.html#SEC35
MALLOC_CHECK_
If you set the environment variable MALLOC_CHECK_ before running a program, glibc uses a slow but thorough variant of malloc to do some checking on buffer overruns, double-frees, etc... Setting MALLOC_CHECK_ to 0 makes it ignore problems, 1 causes it to print a diagnostic to stderr, and 2 causes it to print a diagnostic and abort(). All of this is the glibc malloc(3) man page.
MALLOC_TRACE and mtrace()
If you "#include " in your source, you can call mtrace(3) at some point in your code. This function looks for the environment variable MALLOC_TRACE which it then logs all malloc(3)s, free(3)s, realloc(3)s and calloc(3)s to. When your program is finished, you can run the mtrace(1) perl script (also supplied with glibc) to run through this log, and print out a list of all unfreed memory, all freed, unallocated memory, all double-freed memory and probably a bit more besides. It's really handy.
I tend to put the "#include " and "mtrace()" calls inside "#ifdef HAVE_MTRACE" guards, and then add "-DHAVE_MTRACE" to my CFLAGS when compiling debug builds.
The documentation for this can be found at http://www.gnu.org/manual/glibc-2.2.3/html_chapte
malloc() and free() are weak symbols.
glibc's copy of free(3) is a `weak' symbol in the library. What this means is that you can write your own functions called malloc() and free() in your program, and those will be called all the time, instead of the proper ones. You can call the originals with _malloc() and _free, or __malloc() and __free() (can't remember which, think it's the first pair.) and do little extra checks and things yourself. (Such as filling memory with bogus data before returning, etc..., to make sure you're not forgetting to zero some bytes here and there for example.
gdb is also really great too and has loads of stuff that I've not found in other debuggers. Check out the manual sections on `ignore' (to ignore a breakpoint x times to catch the (x + 1)th malloc), and `commands' (to automatically print out variable values and continue for example) w.r.t. breakpoints.
http://www.gnu.org/manual/gdb-5.1.1/html_chapte
http://www.gnu.org/manual/gdb-5.
Wow, I've not really played close to the limits on my machines in the past, so I didn't know about that. But that's really dumb.
Surely if a process tries to malloc(3) more memory than it's limit, all that should happen is that malloc(3) will return NULL.
And according to the malloc(3) and brk(2) man pages on my system, that's all that does happen. I can't find any stuff about signals being sent. Which signal is it, and is it catchable? (A look through signal(7) doesn't bring up anything obvious sounding either)
Confused,
K.
The incentive is that, as the software evolves alongside the work we're doing, it becomes increasingly hard to keep our own diffs and keep re-merging them into the main base.
So, we have two options. These are 1) Don't update the base software, and lose out on the continuing community development that gave us this great software in the first place. 2) Try to get as many of our changes as possible into the main base, and allow other people (as well as the guys we're paying full-time) to study them, debug them, refactor them, build on them, etc..., making our software even better still.
(2) keeps our diffs as small as possible (there still may be some - there may be some changes that the maintainers don't think are suitable for their product, but that we still need.) and makes our job easier. And if, as someone else has pointed out, our hackers do get a better job elsewhere and need to leave, if we've insisted that as much source as possible get contributed back to the main product, it's more likely to be documented, more people are likely to have looked over it, and our chances are better of being able to hire someone new who's capable of hitting the ground running. (This still helps if we just want to hire an extra hand to do even more work).
There's a few reasons why contributing back is good for business, and I've been thinking of as many as possible to convince people Higher Up that it's worth doing the Right Thing, without just saying "But it's the Right Thing to do", 'cos abstract ideals don't fit in well with business.
I'm a programmer, but I look at software as a way of getting something else done, something that makes money.
The company I work for sells real things to people (toasters, etc...). That's the business we're in, that's how we make money. We compete in the marketplace on the range of goods we offer, the price we offer them at, and the after sales service we provide for when these real things wear out and break down. We use software to help us achieve that goal as efficiently as possible.
To us, it doesn't really matter if the software we use (web servers, word processors, email programs, databases) is the same as the software used by our competitors - in fact it's quite likely they're using a lot of the same software from the same supplier. Our only goal is to get our software to do what we want as cheaply as possible.
So if we can hire 2 shit-hot hackers to work on this open source database system to control our stock, and that turns out to be cheaper or even comparable to however many licenses of the closed-source product we need, great. Because not only do we have the database we need, but we've got our own guys supporting it in-house who know it inside and out, who we can just *ask* for support.
It doesn't matter if our competitors have their own hackers working on the same product, becuase the more our guys _and_ their guys improve this software, this means to an end, the better we can all compete in the marketplace on what we do - on selling toasters, and not on what software and support contracts we happen to have.
K.
Hey, it'll be alright providing you don't give it conflicting goals that it can't complete, without telling it's chief programmer what conflicting goals you've given it due to `security' reasons, so he has no hope in hell of figuring out _why_ his baby is acting so damn strange.
The live show has costumes, dancing, scenery, pyrotechnics, lights, all kinds of bullshit that the serious musician and music affictionado doesn't really need.
Dude, who do you go see play? Most bands I know go and stand up there with their instruments (OK, they generally have lights so we can see them) and play for an hour or two, exchanging some banter with the crowd between songs, and generally provide a damn good piece of entertainment.
But what, they've got their instruments (which they have already), a bunch of amps and speakers, a roadie or two to set it up, a downpayment on the venue, travel to get there and a nights accomodation on the minus side, and my (and a thousand or so people like me) few bucks from the door, a small percentage of the take from the bar(s) (which can still add up), and some small profits from t-shirts and CDs they can flog there and then on the plus side.
The advertising they get pretty damn cheap, as the venues tend to organise the posters and such for who's playing when and stick them all over town, becuase its in their interest to attract people and make money from the bar.
What the hell else is there?
I'd have thought that colonies, once they are of the size to sustain civilian populations (as opposed to being just researchers and scientists) would want to form their own government and laws, as opposed to being ruled by a bunch of `foreign' (alien?) beaurocrats.
Yeah, they might base their laws (and constitution?) on that of the US, cos it seems a pretty good starting place, but to be ruled by a far off land, and have to pay federal income taxes to a place tens of millions of miles away? Come on, you Americans must be in a uniquely qualified position to know that colonies don't like to do that!
K.
Ugh - why on earth would you want to go back to writing? It's so bloody *slow*. I hate writing. It takes forever, and having to differentiate my written symbols for ( { < and [ enough so that any intelligence (human, artificial or other) can decipher them on their own (without context) would really start to annoy me.
Typing's quicker and more precise.
I suppose at least for an English speaker though. I guess if you speak a language with characters that aren't neccessarily part of 7-bit ascii, things can get a little more complicated...
K.
I put that down to him acutally being old (850+; in Empire he mentions to Luke that "when 900 years old you reach, look this good you will not, hmmm?") and quite fragile.
But, when he needs to, he can use the force to make the cane redundant, and allow him to bounce around just as fast as he can imagine, and overcome the limitations of his old and fragile body.
Most of the time he doesn't bother though, 'cos it requires a load of concentration and it's easier to just hobble around with a stick. Being wise, old and a generally serene kind of guy, he doesn't feel the need to rush everywhere at top speed, and is content with that.
K.
The thing is, Java is an OS. And a platform. And a Language. And a library.
Which can get a little confusing at times.
I'm sure that UBE would be easily identifiable by a google type of database as practically no mail will exist that goes _back_ to the source.
Filters based on that (to either look for UCE, or to discard it) would probably be trivial based on ratios of sent/received messages to/from a particular envelope.
He's wrong about one thing. Email does have links. It has links indicating who it came from and who it went to. Even without the content, that sort of information, about who is talking to whom, and in what patterns, can be really informative to those who know what they're looking for.
If you include the content, it's a goldmine.
URLs embedded in email would make it better again
Aside from that though, great article.
But given a `literal' interpretation of the change you suggest (`That should be "second-year."' and not `That should be "second-year".') would leave our original sentence to be `I'm a second-year. high school teacher ...', which has an extraneous period!
:-)
K.
+1 Funny.
I hate to think of the comments you'll get from some. Nice one tho'.
Just had to add my best wishes, and try to get this the most-posted-to story ever. 1500 posts and rising...
Good luck to the pair of you.
K.
No, a 'long' in C++ is _at_least_ 32 bits. It may be more. Especially on 64-bit systems.
:-)
C++ doesn't have a 'long long' type. You're thinking of C99. Or a compiler-specific extension. (And, IMO, 'long long' is really fscking ugly. 'longlong' (one keyword) would have been better if it was _really_ needed, which it wasn't. Stupid decision by the C9X committee if you ask me....
Duh!
1) Factoring primes is easy. I can factor any prime you care to name in my head in no seconds flat.
2) The algorithm is a test of how quick the code is to run, not how quickly you can produce primes. If they made a better (say 2 times as quick) algorithm, but wanted to eliminate startup and shutdown costs as much as possible, they'd make the loop longer (say twice as long) to get the same amount of runtime.
_we_??
/. in the same boat. One guy, a Free Software advocate, shows elitist behaviour (and hence probably doesn't read /. :-) and you surmise that _we_ are elitists? Just because some of us also support Free Software, or because some of us hate Micro$oft?
That guy is an elitist.
Stop lumping everyone who reads
Get over yourself.
K.
Strangely, I find that running code on a number of OSs is a good way to find bugs.
Reading the contents of newly allocated memory before initialising it, for example (I did a cut and paste and got a couple of lines the wrong way round once, a long time ago), could give you what you expect on one platform (all initialised - coincidentally - to zeros) and if that was the only platform you tested on, well, there's a latent bug in there.
Test the code on another platform, and it all falls apart within the first couple of seconds of initialisation. You soon track that one down.
K.
Uh, dude, paragraph and heading tags _are_ content markup and _not_ presentational markup. They delineate the structure of the document; not how it looks.
Bad examples there...
Um, no it's not.
Unicode isn't a character encoding. It's a character set. There's a difference. And Unicode, the character set, can be encoded in a number of different formats, including, as you mention, UCS2.
However, UCS2 can't hold all the Unicode characters, as there are code points above 65536 that are currently in use.
Encodings that do work are UTF-8, which is sort of compatible with ASCII - all ASCII programs should read and re-write UTF-8 encoded text properly, even if they don't display it correctly.
UTF-16 is a multi-character 16-bit format for encoding Unicode in multiple 16-bit characters.
UCS4 is a 'native' encoding of Unicode, with a 1-to-1 code point to character code mapping. But not many programs use it.
Formatting/tables/graphics/highlighting?
Use PDF. Or HTML.
Password protection?
Don't use Word. Ever. That's almost as bad as using pkzip encryption. Word encryption is worse than useless as it gives you a false sense of security.
Spelling/grammer checking?
Um - what word processors don't have spell checkers? Grammer? Well, I suppose it's nice, but if you can't string a sentence together that scans properly, go back to school and get an education. And that's only a reason to use Word as an editor. That's not a reason to send the final version as a Word attachment. Sure, write in word. But why not still send as plain text. Most of the stuff I get as word attachments is just that - plain text. Just wrapped up in a huge word document.
Correction/collaboration? OK - you might have me there. I've no idea of how Word's version works or if any other package has it or not, as I've never had a need for it. *shrug*
Doxygen is great for producing API references with source code cross-references, if that's all the documentation you need. I've no problems with it there. It rules.
But for user-level documentation, or even developer-level general overviews of source organisation, resource ownership policies, etc..., I'd have to say it's not the idea tool for that. But then, that's not really what it was designed for.
I'd take a closer look at Docbook and the fairly large set of untilities that exist for converting it to HTML, TeX, man, texinfo, etc... Check http://www.oasis-open.org/docbook/
When doxygen's xml/docbook output format is sorted, even this could be moved that way too...
K.
Uh - dude, the RIAA and MPAA may be different trade organisations, but they're both made up of the same bunch of guys. Sony. News Corporation. AOL-Time Warner. etc...
And it's the members that have the power in a trade organisation. Yeah, Hilary Rosen and Jack Valenti are the heads, but they're just front people both paid by the same masters.
"It's important to have a job that makes a difference boys. That's why I manually masturbate caged animals for artificial insemination." - Clerks.