Re:Depends on function
on
Clean Code
·
· Score: 3, Insightful
Preformance related code and highly efficent code is the opposite of clean code. Clean code is often high level in nature, while efficient and robust code is low level and not pretty.
Clean code is the enemy of robust code?
I've never heard anyone state that before.
Comments are for readabilty, not the code. Always go for efficiency.
10GB per years is a lot more than I produce - my PhD was only 1.5GB in total, including temporary files (build cruft and so on), with only 210MB needed for the subversion repository (176MB after bzip2)
210MB is a lot. That's as large as my CVS repository, which I have added to daily for ten years or so,
and which contains lots of external data too (a copy of The Great Gatsby in troff format is in there somewhere).
On the other hand, I don't use Word, which manages to make single-page documents that are more or less plain text take up a few MBs. If you're in a company where everyone sends Word document attachments as emails instead of plain text (I've seen it done[1]) then you could probably generate 10-20MB of date per day from around 5KB of actual content, and backing this up might be cheaper than educating your users.
Yes; all those Windows formats are a plague. Look into them and find immense, desolated areas of zero bytes.
Another source of mail bloat in Windows+Exchange+Outlook places is BMP images. If you cut&paste a screenshot
or something in a mail, it tends to become megabytes of uncompressed 24-bit image data.
All this would be no problem at all if mail was compressed when sent. I cannot understand why Outlook
(and other mailers) don't do that. IIRC, it's even supported by the MIME RFCs.
You'd typically save 95% of the space, and the CPU time needed for
compression and decompression is negligable these days.
What do you use to MBOX your outlook data, if I may ask?
I cannot say what he does, but if the Sexchange server is open for IMAP,
you can telnet to it and pass an IMAP command to dump everything in RFC 822 format. It ends up very close
to mbox format; it might even have a _From line.
http://cprogramming.com/ - best site for beginners in my opinion.
I'd stay away from it. It muddles the distinction between two very different languages.
Also, this worthless "tip" got rated 3.28 out of 5 by over 800 readers:
Use ++ and --.
Try to use only unary operators for incrementing and decrementing variables; they produce fewer instructions and run faster.
www.cplusplus.com
Beats the hell out of man pages for the POSIX C libraries.
Where do you find POSIX documentation on that site?
All I see is C standard library documentation, and C++ standard library
documentation which feels, at a glance, as if something is missing (I cannot pinpoint exactly what).
Also, which man pages are you reading? The ones which come with Debian Linux
are excellent. Definitely not easy to "beat the hell out of".
(Remember: man pages are reference material. Any tutorial, overview or introduction
can beat the hell out of man pages, if you try to use them as a tutorial, overview or introduction.)
While K&R makes an excellent reference book, you must be very careful not to write programs like they are written in that book. The newer versions have been updated to be compliant with the C standards
Not quite. There is no C99 edition of the book.
but the programming style is firmly stuck in the 70's. This leads to code which is frankly unmaintainable.
For example, pointers have their purpose, but they should be avoided except where they truly make sense. For example, pointers in a link list make sense. Using pointers to access array elements is just plain stupid. The code becomes hard to read (which array does p point to again?, how far into p are we?) and is generally harder to debug from a crash dump.
Funny; I find loops which are based on pointers readable, and indexed loops unreadable.
Work for a while with C++ and its standard library's iterator concept, and things like
for(p=begin; p!=end; ++p) feel like the natural way to do such things.
It's easy to reason about, it's not more prone to off-by-one or fencepost errors, and I fail to see how
it's harder to debug a core dump of code done this way. And it's closer in spirit to the foreach
loops which we know and love from Bourne shell, Perl and Python.
The original excuse was *p was faster than p[index] is simply false with any modern compiler and machine.
True. And also, with today's 64-bit adressing, storing a pointer instead of a small index can be a significant waste of memory.
With respect to emacs code browsing: Check out etags. find-tag and find-tag-other-window are a godsend. (My makefiles are setup to easily recreate the TAGS file.)
So that was what the submitter meant when he talked about "code browsing"?
Or maybe not, but that doesn't matter -- when I combine (a) TAGS support, (b) pydoc in a terminal window
on the side, (c) unit tests in another terminal window and (d) paper and pencil, I don't feel I miss out on anything; I am not limited by the computer's ability to show
me the stuff I am working with.
That is partly because Python code tends to be brief and elegant.
Also note that vi also supports TAGS/tags files.
I hope all serious code editors do either that, or have a very good reason to use
some other file format for symbol -> file:line mapping.
The ol 'designers haven't figured out how to design efficient website' argument is rolled out now and then and it think it's time it was put to rest.
So you argue "they do known, but they don't care"? --
We're not on dialup anymore - there is no need to waste time on getting the most minimalistic (in terms of file size) anymore. Yes it's a nice to have, but no longer essential. Even mobile phones which are the closest to dial-up are heading down the fast 3G route.
I disagree. If you can take the time to get HTTP compression and caching
to work correctly, people who use your site regularly will benefit every time.
The difference can be drastic.
And if you compress text stuff, you can forget about all other, more painful,
strategies for making the documents smaller.
It's the same as to why we don't need to program in machine code anymore to make the most 'efficient' program possible... it's okay to use higher level coding tools.
Yes, but disk, RAM and CPU cycles have gotten like a thousand times cheaper each
since people stopped whining about how much better assembly language is compared to C.
Bandwidth and latency have improved, but not that much.
I reckon most of these 'the internet is too slow' articles are just a media beatup. I'm yet to meet anyone who has 'slow' internet.
Really?
I don't have a problem with my link, but many web sites are frustratingly slow.
I mean, take Slashdot's comment system as a mild example. There's usually a 1--2 second delay when you press the
"Submit" button. I don't think that makes people happy.
Compiler warnings shall be at an absolute minimum, utilization of compiler safeguards shall be used whenever possible. Enforcing "Option Explicit" in VB for example.
#!/usr/bin/perl -w use strict;
(The only big biotech company I have any info on uses Perl heavily.)
This doesn't matter AT ALL since IPv4 systems cannot talk to IPv6 systems, and v.v. They. Are. Completely. Alien. Networks. It just makes it easier to transport IPv4 across IPv6. Without a proxy/translator/etc. IPv6 and IPv4 hosts cannot talk to each other.
This is why IPv6 will take decades to be openly adopted -- if ever. (It's already been a decade, btw.)
What you say must be deliberately misleading (unless you were just trying to point out that
there are few IPv6 hosts to talk with out there.)
Sure, an IPv6 stack cannot talk to an IPv4 stack, but in reality
all IPv6 hosts are also IPv4 hosts, with two stacks.
Most serious software includes IPv6 support and will use either one, or both.
That, and IPv4 is just more convenient because you can actually remember the addresses without writing them down. I can say "Hey, ping 10.10.1.12" and people will do it. Try that with an ipv6..
Not a good reason for most people, I think.
I have a real, public IP address here -- has had it for like ten years,
and all I know is it starts with 83. Likewise with my 192.168.0.0/16 network.
I enter the stuff in/etc/hosts, and then I forget about it.
Re:Boost epitomizes everything that is wrong with
on
Boost 1.36 Released
·
· Score: 1
Which embedded platform are you on that has a C++ compiler? Most of them barely support the C standard.
Don't most of them simply use gcc these days? (And some stripped-down Linux distro as an OS.)
There's something called "Embedded C++" for compiler writers who need an excuse not
to write a working C++ compiler, but I haven't heard of anyone who has used it.
I consider that really sad. For the most part, since about 2004 or 2005 compilers have been perfectly fine for Boost. If most shops are still wondering about this, most shops are using dreadfully old development tools.
Not every environment looks like yours.
Where I live, 2005 is very recently, and compilers which were bleeding edge then
are still not the only ones we use. God knows how old Sun's compiler is, or what kind
of commitment they have to the C++ language...
I'm happy if I can work with a code base which uses the C++ standard library -- and that's
more than ten years old.
I'd like to use Boost some time (particularly regexes and the PRNGs) but
introducing that extra dependency isn't always worth the effort, and it's not always fair
to your coworkers.
This had me confused for a while, since I haven't seen the Tom Cruise movie.
I can recall no such thing from the 1949 short story by Theodore Sturgeon,
or from the 1959 PKD short story.
I don't think you can easily draw or modify a picture using shell programming (and I'm talking about interactive, creative image manipulation here, not about things like "increase the brightness by 50 percent in these 5,000 images").
That example is pretty rare and useless -- how often do you have 5000 images with the exact same brightness problem? For me there is usually a GUI-based step when I do such things
(e,g, find the JPEG images which are incorrectly rotated 90 degrees, and feed them
to jpegtran for lossless correction).
A better non-GUI image manipulation example: pictures that are diagrams, plots, graphical statistics and things like that.
Here, non-GUI interfaces are often superior (pic, grap, gnuplot, graphviz...).
You create a text file which describes the image.
But sure, some kinds of images are better created in a WYSIWYG GUI.
Usenet is searchable now? When I used it, years ago, you had to subscribe to a group, then download thousands of posts, before you could search them. With all the billions of posts on Usenet, if you wanted to search for something it would take months. With forums you just put something into Google and it comes up instantly.
The web is searchable now? When I used it, years ago, you had to download thousands of personal home pages, before you could search them. With all the billions of home pages on the web, if you wanted to search for something it would take months. With Usenet you just put something into Google Groups and it comes up instantly.
Seriously: DejaNews started indexing Usenet in 1994 or so.
After they folded there was, I think, a time period with no indexing before Google Groups was launched.
But so what? Searching the whole bloody Usenet isn't the point.
Social interaction with others, on a specific topic, over a long period of time, is the point of Usenet.
Most of it consists of angry flame threads that go on for years and years.
Last I heard, that was almost exclusively warez and p0rn.
Don't know what the bitrate is for the rest of it, but it must be a tiny fraction of 2TB.
Re:Hmm...Giganews and other services are still the
on
R.I.P Usenet: 1980-2008
·
· Score: 2, Insightful
Maybe a modern version of NNTP could be built on HTTP/XML?
Sure, but why?
That would break 20--30 years' worth of high-quality Usenet software for (AFAICS) no good reason.
What obviously needs work is the message format itself; character sets and so on.
A new RFC has been in the works at USEFOR for years... dunno if it's done yet, or if anyone will care.
Use a continuous integration server - A CI server will wake up on every checkin and run all the tests. That way you discover problems early, when they're easiest to fix.
I never understood these. What's wrong with typing 'make test' frequently?
But yeah, "don't check in broken stuff into mainline" is good advice.
Work out common standards - Every time you notice a difference in style, have a nice chat about it. Eventually you shouldn't be able to tell who wrote what.
I agree that there shouldn't be some blocks of code indented by 3 and others by 4,
drastically different function naming, and stuff like that.
But I don't think one should have removal of individuality in general as a goal.
Different people will solve a problem in different ways, and as long as the resulting code is
clear and readable, why would there be a problem?
(Of course, I'd expect people to learn from each other, so they write more similar code over time.)
Maybe it's more important to have a common understanding of the goal and the architecture of the code, in some wide sense.
Things like:
Should this code be portable?
Who should sanity-check interfaces -- the caller or the callee?
Should we spend time making the code general, or be satisfied when it works in this application?
Is performance important? And so on.
I'm probably going to be shouted down but in my 30 years of coding, I *rarely* reused code. [---]
I can't even look back on 10 years of coding and say "Oh things would have been so much better if we had shared code". I don't think that is the case. And fwiw, this is teams of 5-20 programmers on significant projects.
A reminder: reuse != building libraries.
Copying a piece of code is also reuse,
even if you have to hack it to make it do what you want.
I find the more code I write, the more I cut & paste.
If it has a well-defined interface, if I use and trust it elsewhere,
especially if it has unit tests --
then I have no qualms about copying it (and cutting away the
parts I have no immediate need for).
But building those general libraries which I assume the purists equate with code reuse
-- that's hard.
Especially if management thinks you can assign a group to write such libraries,
but never write code which use them.
In a sane world, I would be able to simply walk over to my bank (or a notarius publicus or whatever), and have them sign my GnuPG key. That would be a start, at least.
Suppose you lose your private key to a malicious party. How are you going to organize its proper revocation?
Why, by revoking it of course, and uploading it to the key servers.
Possibly by using a paper printout of my pre-generated revocation certificate.
It's my understanding that the technical aspects of this are solved;
are you telling me it isn't true?
Who is responsible for making sure this information gets propagated to all interested parties? Keyservers? The notarius? You yourself (contacting all your fiends, your bank(s), the government etc to tell them to disregard that key)?
Me of course, via the keyservers.
I suppose it would be nice if I could ask the government to do it for me,
in case I died or lost my mind, but that's optional.
The problem with PKI is that it requires a complex infrastructure in order to really be resilient against all possible ways in which things can go wrong.
You might know more about this than I do, but you don't indicate what.
As far as I know, PKI is this infrastructure, and it exists today.
What is really complex is understanding all this as a user...
but what are the alternatives?
As far as I can tell, it's pretending From: headers in mails are a signature,
pretending that DNS cannot be spoofed...
That's really simple, but not useful.
On that same note, seems to me like a bank would be a reasonable certificate authority as well, they have to do a somewhat reasonable job of verifying your identity.
Banks do that in Sweden -- the system is called "E-legitimation".
Of course, it's proprietary, Windows-specific, needs *spit* *shudder* ActiveX... and so on.
At least with my bank's implementation.
And (this is the most stupid part) as far as I can tell, only a few big organizations can verify your signatures.
It's for identifying yourself to the Man, not to your friends and family.
In a sane world, I would be able to simply walk over to my bank
(or a notarius publicus or whatever), and have them sign my GnuPG key.
That would be a start, at least.
Boot time is spent in 1 of 4 main areas: 1) BIOS, 2) bootloader, 3) kernel, 4) user space init.
[...]In the x86 space, with legacy hardware, I think the thing that will give you the most problem is BIOS.
Right. I just measured this on my PC with Debian Etch:
BIOS, probing for idiotic things forever: 37s
grub boot loader, including a 5s press-space timeout: 9s
optimized kernel plus starting plenty of servers and going to runlevel 2
(text-mode login prompt): 14s
It's not hard to get those 14s down to something insignificant.
Who wouldn't mind a 5s delay here, after waiting 30s for BIOS?
I don't think one has to hack the whole init sequence into pieces: begin by not starting
a lot of servers, check the contents of/etc/rc?.d, and measure the results.
Clean code is the enemy of robust code? I've never heard anyone state that before.
Even when I don't need it? You don't make sense.
210MB is a lot. That's as large as my CVS repository, which I have added to daily for ten years or so, and which contains lots of external data too (a copy of The Great Gatsby in troff format is in there somewhere).
Yes; all those Windows formats are a plague. Look into them and find immense, desolated areas of zero bytes. Another source of mail bloat in Windows+Exchange+Outlook places is BMP images. If you cut&paste a screenshot or something in a mail, it tends to become megabytes of uncompressed 24-bit image data.
All this would be no problem at all if mail was compressed when sent. I cannot understand why Outlook (and other mailers) don't do that. IIRC, it's even supported by the MIME RFCs. You'd typically save 95% of the space, and the CPU time needed for compression and decompression is negligable these days.
I cannot say what he does, but if the Sexchange server is open for IMAP, you can telnet to it and pass an IMAP command to dump everything in RFC 822 format. It ends up very close to mbox format; it might even have a _From line.
I'd stay away from it. It muddles the distinction between two very different languages. Also, this worthless "tip" got rated 3.28 out of 5 by over 800 readers:
Use ++ and --. Try to use only unary operators for incrementing and decrementing variables; they produce fewer instructions and run faster.
Where do you find POSIX documentation on that site? All I see is C standard library documentation, and C++ standard library documentation which feels, at a glance, as if something is missing (I cannot pinpoint exactly what).
Also, which man pages are you reading? The ones which come with Debian Linux are excellent. Definitely not easy to "beat the hell out of". (Remember: man pages are reference material. Any tutorial, overview or introduction can beat the hell out of man pages, if you try to use them as a tutorial, overview or introduction.)
Not quite. There is no C99 edition of the book.
Funny; I find loops which are based on pointers readable, and indexed loops unreadable. Work for a while with C++ and its standard library's iterator concept, and things like for(p=begin; p!=end; ++p) feel like the natural way to do such things. It's easy to reason about, it's not more prone to off-by-one or fencepost errors, and I fail to see how it's harder to debug a core dump of code done this way. And it's closer in spirit to the foreach loops which we know and love from Bourne shell, Perl and Python.
True. And also, with today's 64-bit adressing, storing a pointer instead of a small index can be a significant waste of memory.
So that was what the submitter meant when he talked about "code browsing"? Or maybe not, but that doesn't matter -- when I combine (a) TAGS support, (b) pydoc in a terminal window on the side, (c) unit tests in another terminal window and (d) paper and pencil, I don't feel I miss out on anything; I am not limited by the computer's ability to show me the stuff I am working with.
That is partly because Python code tends to be brief and elegant.
Also note that vi also supports TAGS/tags files. I hope all serious code editors do either that, or have a very good reason to use some other file format for symbol -> file:line mapping.
So you argue "they do known, but they don't care"? --
I disagree. If you can take the time to get HTTP compression and caching to work correctly, people who use your site regularly will benefit every time. The difference can be drastic. And if you compress text stuff, you can forget about all other, more painful, strategies for making the documents smaller.
Yes, but disk, RAM and CPU cycles have gotten like a thousand times cheaper each since people stopped whining about how much better assembly language is compared to C. Bandwidth and latency have improved, but not that much.
Really? I don't have a problem with my link, but many web sites are frustratingly slow. I mean, take Slashdot's comment system as a mild example. There's usually a 1--2 second delay when you press the "Submit" button. I don't think that makes people happy.
Might be true for 3G, but surely ADSL latency is much, much lower than for 56k dialup (or whatever standard is used nowadays).
(The only big biotech company I have any info on uses Perl heavily.)
What you say must be deliberately misleading (unless you were just trying to point out that there are few IPv6 hosts to talk with out there.)
Sure, an IPv6 stack cannot talk to an IPv4 stack, but in reality all IPv6 hosts are also IPv4 hosts, with two stacks. Most serious software includes IPv6 support and will use either one, or both.
Why not just feed it into inet_pton(3) (which surely is available, under some name, in any serious language, including Python) and check for errors?
Not a good reason for most people, I think. I have a real, public IP address here -- has had it for like ten years, and all I know is it starts with 83. Likewise with my 192.168.0.0/16 network. I enter the stuff in /etc/hosts, and then I forget about it.
Don't most of them simply use gcc these days? (And some stripped-down Linux distro as an OS.)
There's something called "Embedded C++" for compiler writers who need an excuse not to write a working C++ compiler, but I haven't heard of anyone who has used it.
Not every environment looks like yours.
Where I live, 2005 is very recently, and compilers which were bleeding edge then are still not the only ones we use. God knows how old Sun's compiler is, or what kind of commitment they have to the C++ language ...
I'm happy if I can work with a code base which uses the C++ standard library -- and that's
more than ten years old.
I'd like to use Boost some time (particularly regexes and the PRNGs) but introducing that extra dependency isn't always worth the effort, and it's not always fair to your coworkers.
This had me confused for a while, since I haven't seen the Tom Cruise movie. I can recall no such thing from the 1949 short story by Theodore Sturgeon, or from the 1959 PKD short story.
That example is pretty rare and useless -- how often do you have 5000 images with the exact same brightness problem? For me there is usually a GUI-based step when I do such things (e,g, find the JPEG images which are incorrectly rotated 90 degrees, and feed them to jpegtran for lossless correction).
A better non-GUI image manipulation example: pictures that are diagrams, plots, graphical statistics and things like that. Here, non-GUI interfaces are often superior (pic, grap, gnuplot, graphviz ...).
You create a text file which describes the image.
But sure, some kinds of images are better created in a WYSIWYG GUI.
The web is searchable now? When I used it, years ago, you had to download thousands of personal home pages, before you could search them. With all the billions of home pages on the web, if you wanted to search for something it would take months. With Usenet you just put something into Google Groups and it comes up instantly.
Seriously: DejaNews started indexing Usenet in 1994 or so. After they folded there was, I think, a time period with no indexing before Google Groups was launched. But so what? Searching the whole bloody Usenet isn't the point. Social interaction with others, on a specific topic, over a long period of time, is the point of Usenet.
That's true, though.
Last I heard, that was almost exclusively warez and p0rn. Don't know what the bitrate is for the rest of it, but it must be a tiny fraction of 2TB.
Sure, but why? That would break 20--30 years' worth of high-quality Usenet software for (AFAICS) no good reason.
What obviously needs work is the message format itself; character sets and so on. A new RFC has been in the works at USEFOR for years ... dunno if it's done yet, or if anyone will care.
I never understood these. What's wrong with typing 'make test' frequently? But yeah, "don't check in broken stuff into mainline" is good advice.
I agree that there shouldn't be some blocks of code indented by 3 and others by 4, drastically different function naming, and stuff like that. But I don't think one should have removal of individuality in general as a goal. Different people will solve a problem in different ways, and as long as the resulting code is clear and readable, why would there be a problem? (Of course, I'd expect people to learn from each other, so they write more similar code over time.)
Maybe it's more important to have a common understanding of the goal and the architecture of the code, in some wide sense. Things like: Should this code be portable? Who should sanity-check interfaces -- the caller or the callee? Should we spend time making the code general, or be satisfied when it works in this application? Is performance important? And so on.
A reminder: reuse != building libraries. Copying a piece of code is also reuse, even if you have to hack it to make it do what you want.
I find the more code I write, the more I cut & paste. If it has a well-defined interface, if I use and trust it elsewhere, especially if it has unit tests -- then I have no qualms about copying it (and cutting away the parts I have no immediate need for).
But building those general libraries which I assume the purists equate with code reuse -- that's hard. Especially if management thinks you can assign a group to write such libraries, but never write code which use them.
Why, by revoking it of course, and uploading it to the key servers. Possibly by using a paper printout of my pre-generated revocation certificate. It's my understanding that the technical aspects of this are solved; are you telling me it isn't true?
Me of course, via the keyservers. I suppose it would be nice if I could ask the government to do it for me, in case I died or lost my mind, but that's optional.
You might know more about this than I do, but you don't indicate what. As far as I know, PKI is this infrastructure, and it exists today.
What is really complex is understanding all this as a user ...
but what are the alternatives?
As far as I can tell, it's pretending From: headers in mails are a signature,
pretending that DNS cannot be spoofed ...
That's really simple, but not useful.
Banks do that in Sweden -- the system is called "E-legitimation". Of course, it's proprietary, Windows-specific, needs *spit* *shudder* ActiveX ... and so on.
At least with my bank's implementation.
And (this is the most stupid part) as far as I can tell, only a few big organizations can verify your signatures.
It's for identifying yourself to the Man, not to your friends and family.
In a sane world, I would be able to simply walk over to my bank (or a notarius publicus or whatever), and have them sign my GnuPG key. That would be a start, at least.
Right. I just measured this on my PC with Debian Etch:
It's not hard to get those 14s down to something insignificant. Who wouldn't mind a 5s delay here, after waiting 30s for BIOS? I don't think one has to hack the whole init sequence into pieces: begin by not starting a lot of servers, check the contents of /etc/rc?.d, and measure the results.