Slashdot Mirror


Don't Overlook Efficient C/C++ Cmd Line Processing

An anonymous reader writes "Command-line processing is historically one of the most ignored areas in software development. Just about any relatively complicated software has dozens of available command-line options. The GNU tool gperf is a "perfect" hash function that, for a given set of user-provided strings, generates C/C++ code for a hash table, a hash function, and a lookup function. This article provides a reference for a good discussion on how to use gperf for effective command-line processing in your C/C++ code."

11 of 219 comments (clear)

  1. Re:C++ I get by hxnwix · · Score: 1, Interesting

    Oh, gee, well, nobody except:

    1) Every linux kernel developer
    2) Every *BSD kernel developer
    3) John Carmack, for the core of every ID engine up to and possibly beyond Doom3
    4) You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).

  2. Re:C++ I get by Anonymous Coward · · Score: 1, Interesting
  3. Re:C++ I get by mce · · Score: 4, Interesting

    You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).

    Excuse me???? That was not even true anymore when I started using C++, back in 1992. There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling. Exception handling comes to mind as the prime example. A failed attempt to support exceptions was the reason why Cfront 4.0 was abandoned. Note that 3.0 was released as early as 1991. The last Cfront based compiler I had the horor of using was HP's CC. It was superseeded by the new native aCC by 1994 at the latest.

    By the way, I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic.... :-)

  4. Re:Correction... by Anonymous Coward · · Score: 1, Interesting

    Hardly the case. Most of the win32 shit I've used accepts command lines. It's much simpler and a more powerful debugging tool then to force a config file change for every attempt.

  5. Re:It is if the linker complains about not finding by tqbf · · Score: 2, Interesting

    Are you seriously trying to argue that gperf is more portable than getopt?

  6. Re:All the world is not a PC by Anonymous Coward · · Score: 1, Interesting

    "I develop for a battery-powered computer with 384 KiB of RAM. In such an environment, what you appear to sarcastically call a "mere couple hundred kilobytes" is a bigger deal than it is on a personal computer manufactured in 2007."

    I fail to see how is this strong argument in this discussion. How many of these embedded tools you write actually _do_ command line processing? If they do, why don't you invest in more (both memory- and time-) efficient ways to do IPC than the command line?

  7. Another approach - parseargs by argent · · Score: 2, Interesting
    Something Eric Allman wrote many moons ago. I found it and modified it to support "native" command line syntax on MS-DOS, VMS, and AmigaDOS, and added some support for improved self-documentation... and then Brad Appleton saw it and rapidly enhanced it to support a plethora of shells and interfaces until it took up 10 posts in comp.sources.misc.

    The following two directories should bring it up to the latest version I know of.

    This is not efficient, mind you. Command line parsing doesn't generally need to be efficient, even by my miserly standards, honed when a PDP-11 was something you hoped to upgrade to... some day...

    ftp://ftp.uu.net/usenet/comp.sources.misc/volume29 /parseargs/
    ftp://ftp.uu.net/usenet/comp.sources.misc/volume30 /parseargs/

    PARSEARGS
     
                            extracted from Eric Allman's
     
                                NIFTY UTILITY LIBRARY
     
                              Created by Eric P. Allman
                                <eric@Berkeley.EDU>
     
                            Modified by Peter da Silva
                                <peter@Ferranti.COM>
     
                      Modified and Rewritten by Brad Appleton
                              <brad@SSD.CSD.Harris.COM>
    Brad's latest work in this area seems to be here:

    http://www.cmcrossroads.com/bradapp/ftp/src/libs/C ++/CmdLine.html

    http://www.cmcrossroads.com/bradapp/ftp/src/libs/C ++/Options.html

  8. This tool is much easier by stupendou · · Score: 3, Interesting

    Try supergetopt instead. Much easier to use and also open source.
    http://www.ibiblio.org/pub/Linux/devel/sugerget-1. 1.tgz

    With this code, you simply specify command-line strings and variables in a printf()
    style format.

    E.g. supergetopt( argc, argv,
                                        "string1", "%d %d", function1,
                                        "string2", "%s", function2 )

    will call function1( int a, int b ) when string1 is on the command line,
    and will call function2( char *s ) when string2 is used on the command line.

    A whole lot easier than gperf, IMHO.

  9. Re:Speed in options parsing? by sbryant · · Score: 2, Interesting

    What is the problem with tabs?

    The problem is that people set their tab breaks at all sorts of places (eg: every 4 characters), and then use tabs to space things in the middle of lines, or they'll mix tabs and spaces at the beginnings of lines. When somebody with different settings opens the same file, the indentation looks really screwed. That happens even after you've gotten everybody to agree on a common number of columns for indentation.

    I only know of two solutions:

    1. Make all software, everywhere, ever, use tab stops every eight characters and never anything else.
    2. Use spaces.

    I didn't have the energy to do the first, so I use the second solution.

    If you're developing on your own it's not an issue, but I don't like to have one coding style here and another there - it's not just confusing, but it takes a while to change my editor settings every time I open code for somebody else. I use spaces and that's that. At least my editors are clever enough to know that Makefiles still need tabs!

    -- Steve

  10. Re:Wrong in so many ways by tqbf · · Score: 2, Interesting

    I challenge: cite as an example any fixed set of strings (such as would be applicable for perfect hashing) for which a realistic perfect hashing scheme of any sort outperforms a statically-sized conventional chaining table using a trivial 33/37-style string hash. I don't think you can. Gperf languishes in obscurity for a reason.

  11. Re:Silly by IkeTo · · Score: 2, Interesting

    > Generally speaking hashes are very cpu and cache-inefficient beasts

    Um... why you think hashes are inefficient? In a lot of languages (Perl, Python, Javascript, etc) the standard collection is the hash. In Javascript, even a simple array is a hash! Why you think it is inefficient?

    My thinking is that it is both CPU and cache efficient: it is CPU efficient because it usually just need one round of computation to get you to the correct result (as compared to a tree, which you need one round per tree level). It is cache efficient because you are usually not lead to somewhere irrelevant to your search (in contrast, any intermediate node in a tree when searching for an item in a binary tree will pollute your cache). Yes, in hash you have the hash table entries themselves which will pollute the cache, but that's not as much, exactly because of what you talk about: (spatial) locality of reference. In a hash all entries are in nearby memory, so it is likely that many searches in the same hash table will end up using very few cache lines. In contrast, in a search tree or a list, different nodes are allocated at different time and are much more likely to use completely different cache lines. At least this should be true until the time you overload it, but then you have extensible hashes.