Don't Overlook Efficient C/C++ Cmd Line Processing
An anonymous reader writes "Command-line processing is historically one of the most ignored areas in software development. Just about any relatively complicated software has dozens of available command-line options. The GNU tool gperf is a "perfect" hash function that, for a given set of user-provided strings, generates C/C++ code for a hash table, a hash function, and a lookup function. This article provides a reference for a good discussion on how to use gperf for effective command-line processing in your C/C++ code."
Oh, gee, well, nobody except:
1) Every linux kernel developer
2) Every *BSD kernel developer
3) John Carmack, for the core of every ID engine up to and possibly beyond Doom3
4) You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).
http://www.tiobe.com/tpci.htm/
You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).
Excuse me???? That was not even true anymore when I started using C++, back in 1992. There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling. Exception handling comes to mind as the prime example. A failed attempt to support exceptions was the reason why Cfront 4.0 was abandoned. Note that 3.0 was released as early as 1991. The last Cfront based compiler I had the horor of using was HP's CC. It was superseeded by the new native aCC by 1994 at the latest.
By the way, I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic.... :-)
Linux user since early January 1992.
Hardly the case. Most of the win32 shit I've used accepts command lines. It's much simpler and a more powerful debugging tool then to force a config file change for every attempt.
Are you seriously trying to argue that gperf is more portable than getopt?
"I develop for a battery-powered computer with 384 KiB of RAM. In such an environment, what you appear to sarcastically call a "mere couple hundred kilobytes" is a bigger deal than it is on a personal computer manufactured in 2007."
I fail to see how is this strong argument in this discussion. How many of these embedded tools you write actually _do_ command line processing? If they do, why don't you invest in more (both memory- and time-) efficient ways to do IPC than the command line?
The following two directories should bring it up to the latest version I know of.
This is not efficient, mind you. Command line parsing doesn't generally need to be efficient, even by my miserly standards, honed when a PDP-11 was something you hoped to upgrade to... some day...
ftp://ftp.uu.net/usenet/comp.sources.misc/volume2
ftp://ftp.uu.net/usenet/comp.sources.misc/volume3
http://www.cmcrossroads.com/bradapp/ftp/src/libs/
http://www.cmcrossroads.com/bradapp/ftp/src/libs/
Try supergetopt instead. Much easier to use and also open source.. 1.tgz
http://www.ibiblio.org/pub/Linux/devel/sugerget-1
With this code, you simply specify command-line strings and variables in a printf()
style format.
E.g. supergetopt( argc, argv,
"string1", "%d %d", function1,
"string2", "%s", function2 )
will call function1( int a, int b ) when string1 is on the command line,
and will call function2( char *s ) when string2 is used on the command line.
A whole lot easier than gperf, IMHO.
The problem is that people set their tab breaks at all sorts of places (eg: every 4 characters), and then use tabs to space things in the middle of lines, or they'll mix tabs and spaces at the beginnings of lines. When somebody with different settings opens the same file, the indentation looks really screwed. That happens even after you've gotten everybody to agree on a common number of columns for indentation.
I only know of two solutions:
I didn't have the energy to do the first, so I use the second solution.
If you're developing on your own it's not an issue, but I don't like to have one coding style here and another there - it's not just confusing, but it takes a while to change my editor settings every time I open code for somebody else. I use spaces and that's that. At least my editors are clever enough to know that Makefiles still need tabs!
-- Steve
I challenge: cite as an example any fixed set of strings (such as would be applicable for perfect hashing) for which a realistic perfect hashing scheme of any sort outperforms a statically-sized conventional chaining table using a trivial 33/37-style string hash. I don't think you can. Gperf languishes in obscurity for a reason.
> Generally speaking hashes are very cpu and cache-inefficient beasts
Um... why you think hashes are inefficient? In a lot of languages (Perl, Python, Javascript, etc) the standard collection is the hash. In Javascript, even a simple array is a hash! Why you think it is inefficient?
My thinking is that it is both CPU and cache efficient: it is CPU efficient because it usually just need one round of computation to get you to the correct result (as compared to a tree, which you need one round per tree level). It is cache efficient because you are usually not lead to somewhere irrelevant to your search (in contrast, any intermediate node in a tree when searching for an item in a binary tree will pollute your cache). Yes, in hash you have the hash table entries themselves which will pollute the cache, but that's not as much, exactly because of what you talk about: (spatial) locality of reference. In a hash all entries are in nearby memory, so it is likely that many searches in the same hash table will end up using very few cache lines. In contrast, in a search tree or a list, different nodes are allocated at different time and are much more likely to use completely different cache lines. At least this should be true until the time you overload it, but then you have extensible hashes.