Slashdot Mirror


Searchable C/C++ DB surpasses 275 million lines

Sembiance writes "I've been working on a C/C++ source code search database for the past year. It has recently surpassed 275 million lines of searchable open source C/C++ code. The search engine is C/C++ syntax aware so you can search for specific elements such as functions, macros, classes, comments, etc. The site is built upon many open source products including: MySQL and Lucene for the database, CodeWorker to parse the code, PHP and Apache for the website and GeSHi for syntax highlighting. I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code."

26 of 328 comments (clear)

  1. Some statistics to get you started by Anonymous Coward · · Score: 5, Funny
    I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code.


    The following "interesting statistics" come to mind:

    • Percentage of functions named "deepThroat" (0%)
    • Number of comments mentioning a "girlfriend" (11) or "wife" (29) to "Natalie Portman" (41)
    • How many variables named "penis" are of type "long" versus type "short" (unknowable!)


    You gotta get the variables searchable. Most critical for that last statistic. Also, I'm too lazy to learn Lucene Query Parser Syntax, so the statistics for "Natalie Portman" may include references to "portman."
  2. useful statistic by kunzy · · Score: 5, Funny

    the time from the frontpage acticle on /. to the death of your server?

    1. Re:useful statistic by Sembiance · · Score: 5, Funny

      Well, it's been about 2 minutes on slashdot... my site is already dead. So uhm... 2 minutes?

    2. Re:useful statistic by Baricom · · Score: 4, Funny

      So uhm... 2 minutes?

      Sounds like you should have written it in C++ instead of a laggard language like PHP ;).

  3. Similarity checking by roguerez · · Score: 5, Funny

    Find similarities with stuff like SCO.

  4. SCO by cmburns69 · · Score: 2, Funny

    With all that code indexed, maybe we'll finally be able to figure out what the heck SCO's talking about.

    But then again, probably not...

    --
    Online Starcraft RPG? At
    Dietary fiber is like asynchronous IO-- Non-blocking!
  5. ratio by FreeBSDbigot · · Score: 5, Funny

    ... of "foo" to "bar."

    --
    Orange whip? Orange whip? Three orange whips.
    1. Re:ratio by ahem · · Score: 4, Funny

      From google:

      Search -- foo -> Results 1 - 10 of about 26,600,000 for foo. (0.06 seconds)
      Search -- bar -> Results 1 - 10 of about 385,000,000 for bar [definition]. (0.16 seconds)
      Search -- foo bar -> Results 1 - 10 of about 7,900,000 for foo bar. (0.12 seconds)

      'bar' wins. This intuitively makes sense, as who would want to go to the 'foo' for a drink, or eat an 'energy foo'? Could you imagine a lawyer being 'dis-fooed'?

      --
      Not A Sig
  6. 275+ million lines by four2five · · Score: 1, Funny

    How about the % of them that would work on a lady in a bar? line 53256 "Hey pretty lady, are you an astronaut because your ass looks out of this world" ....oh....not those kinds of lines....*sigh* and I thought I was so close

    --
    -or so you'd think
    1. Re:275+ million lines by gstoddart · · Score: 2, Funny
      How about the % of them that would work on a lady in a bar? line 53256 "Hey pretty lady, are you an astronaut because your ass looks out of this world" ....oh....not those kinds of lines....*sigh* and I thought I was so close

      No, no, no.

      You do not use lines 1..N on the same lady until it works. It's not like breaking encryption -- you don't get to try all the possible keys.

      I have friends who have done this, and they swear it's a percentage game. Choose one line you like, and try it on women 1..N until it does work, or you get tired of getting told to sod off. Apparently, with the right combination of variables, any line can be verified to work under some circumstances.

      Truthfully, I don't know how anyone can set out with the knowledge they're going to get told to drop dead 70-100 times/night, but I guess if you can live with that kind of failure rate on an ongoing basis, you'll eventually get the success rate you wanted.

      Now go forth young geek, and attempt to multiply. ;-)
      --
      Lost at C:>. Found at C.
  7. Suggestion by lbmouse · · Score: 5, Funny

    "I'm currently looking for suggestions..."

    How about a new server?

  8. Re:And then... by guaigean · · Score: 2, Funny

    My apologies then. As a regular Slashdotter it is forbidden for me to RTFA.

    --
    Microsoft Sucks, F/OSS Rocks. I get mod points now right?
  9. Re:My vote is for... by Anonymous Coward · · Score: 2, Funny

    Probably about as many lines consist of: {

  10. interesting stat by bsdluvr · · Score: 3, Funny

    1) randomly select 2000 lines of code
    2) compile
    3) execute
    4) ???????
    5) PROFIT!

  11. Woman by chris_mahan · · Score: 2, Funny

    I'd like to know whether the word "woman" appears anywhere, and if so, in what projects.

    Eh.

    --

    "Piter, too, is dead."

  12. Re:Hit Refresh by sglane81 · · Score: 2, Funny

    Actually, if you click refresh on a page from a link, it will resend the referrer as well. Most browsers do this. One more thing, you spelled HTTP_REFERRER correctly, which is wrong :) It's spelled HTTP_REFERER, only has one R. Reverse grammar nazi FTW?

    --
    This is the Internet. You can say "fuck" here. - AC
  13. stats we'd like to see... by digitaldc · · Score: 4, Funny

    -# of non-numerical constants
    -# of ( ),{ },\ /,#,; characters in code
    -time spent debugging/compiling
    -total hours spent in production
    -gallons of coffee consumed
    -hours of daylight seen
    -# of relationships destroyed

    --
    He who knows best knows how little he knows. - Thomas Jefferson
  14. Need to watch those stats by Quiet_Desperation · · Score: 2, Funny

    For example, "Lines of code" / "Lines of commenting" will always produce "Inf"

  15. Re:Size doesn't matter by kmartshopper · · Score: 3, Funny
    It's the quality of the search results that counts.
    Yeah, keep telling yourself that...
  16. or "// FIXME" by StandardDeviant · · Score: 4, Funny

    (subject says it all ;))

  17. Re:Statistics: by maxwell+demon · · Score: 2, Funny
    Total Number of Functions: 7,782,468
    Total Number of Functions Called: 69,500,700

    So the code calls 61,718,232 functions which don't even exist?

    But maybe they just meant "Total Number of Function Calls" :-)
    --
    The Tao of math: The numbers you can count are not the real numbers.
  18. Re:My vote is for... by mebollocks · · Score: 5, Funny

    I dunno, maybe you could find the algorithm on the net somewhere? ...if only there was some kinda searchable code database of some sort...

  19. Re:histogram of C reserved words by plabtfall · · Score: 5, Funny

    Yeah, me too:

        2431 int
        1802 goto

  20. Re:Please check for this: comma in brackets in C++ by hikerhat · · Score: 2, Funny
    Well, the obscureness of the comma operator is used by C++ recruiters who thinks they are really "clever", and in "clever" C/C++ puzzles on usenet. If you took it away, how would you hire C++ programmers and how would you have fun on usenet?

    Also, C++ programmers are getting really old, and they don't handle change very well.

  21. Re:And then... by Anonymous Coward · · Score: 2, Funny

    I'm just a single coder

    -1, Redundant

    This is Slashdot, of course we're all single.

  22. Re:useful statistic: parent: -1 troll by Baricom · · Score: 3, Funny

    That "woosh" sound you hear is the wink emoticon zooming over your head, joke in tow.

    I know PHP is a great web language and that it probably isn't the cause of the slowdown. Heck, even Yahoo! uses it these days.

    I was attempting (unsuccessfully, it seems) to make fun of the purists who insist that robust web applications must run on something compiled in order to reach acceptable performance under high load.