Slashdot Mirror


Searchable C/C++ DB surpasses 275 million lines

Sembiance writes "I've been working on a C/C++ source code search database for the past year. It has recently surpassed 275 million lines of searchable open source C/C++ code. The search engine is C/C++ syntax aware so you can search for specific elements such as functions, macros, classes, comments, etc. The site is built upon many open source products including: MySQL and Lucene for the database, CodeWorker to parse the code, PHP and Apache for the website and GeSHi for syntax highlighting. I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code."

8 of 328 comments (clear)

  1. Some statistics to get you started by Anonymous Coward · · Score: 5, Funny
    I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code.


    The following "interesting statistics" come to mind:

    • Percentage of functions named "deepThroat" (0%)
    • Number of comments mentioning a "girlfriend" (11) or "wife" (29) to "Natalie Portman" (41)
    • How many variables named "penis" are of type "long" versus type "short" (unknowable!)


    You gotta get the variables searchable. Most critical for that last statistic. Also, I'm too lazy to learn Lucene Query Parser Syntax, so the statistics for "Natalie Portman" may include references to "portman."
  2. useful statistic by kunzy · · Score: 5, Funny

    the time from the frontpage acticle on /. to the death of your server?

    1. Re:useful statistic by Sembiance · · Score: 5, Funny

      Well, it's been about 2 minutes on slashdot... my site is already dead. So uhm... 2 minutes?

  3. Similarity checking by roguerez · · Score: 5, Funny

    Find similarities with stuff like SCO.

  4. ratio by FreeBSDbigot · · Score: 5, Funny

    ... of "foo" to "bar."

    --
    Orange whip? Orange whip? Three orange whips.
  5. Suggestion by lbmouse · · Score: 5, Funny

    "I'm currently looking for suggestions..."

    How about a new server?

  6. Re:My vote is for... by mebollocks · · Score: 5, Funny

    I dunno, maybe you could find the algorithm on the net somewhere? ...if only there was some kinda searchable code database of some sort...

  7. Re:histogram of C reserved words by plabtfall · · Score: 5, Funny

    Yeah, me too:

        2431 int
        1802 goto