Searchable C/C++ DB surpasses 275 million lines
Sembiance writes "I've been working on a C/C++ source code search database for the past year. It has recently surpassed 275 million lines of searchable open source C/C++ code. The search engine is C/C++ syntax aware so you can search for specific elements such as functions, macros, classes, comments, etc. The site is built upon many open source products including: MySQL and Lucene for the database, CodeWorker to parse the code, PHP and Apache for the website and GeSHi for syntax highlighting. I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code."
The following "interesting statistics" come to mind:
You gotta get the variables searchable. Most critical for that last statistic. Also, I'm too lazy to learn Lucene Query Parser Syntax, so the statistics for "Natalie Portman" may include references to "portman."
the time from the frontpage acticle on /. to the death of your server?
Find similarities with stuff like SCO.
With all that code indexed, maybe we'll finally be able to figure out what the heck SCO's talking about.
But then again, probably not...
Online Starcraft RPG? At
Dietary fiber is like asynchronous IO-- Non-blocking!
... of "foo" to "bar."
Orange whip? Orange whip? Three orange whips.
How about the % of them that would work on a lady in a bar? line 53256 "Hey pretty lady, are you an astronaut because your ass looks out of this world" ....oh....not those kinds of lines....*sigh* and I thought I was so close
-or so you'd think
"I'm currently looking for suggestions..."
How about a new server?
My apologies then. As a regular Slashdotter it is forbidden for me to RTFA.
Microsoft Sucks, F/OSS Rocks. I get mod points now right?
Probably about as many lines consist of: {
1) randomly select 2000 lines of code
2) compile
3) execute
4) ???????
5) PROFIT!
I'd like to know whether the word "woman" appears anywhere, and if so, in what projects.
Eh.
"Piter, too, is dead."
Actually, if you click refresh on a page from a link, it will resend the referrer as well. Most browsers do this. One more thing, you spelled HTTP_REFERRER correctly, which is wrong :) It's spelled HTTP_REFERER, only has one R. Reverse grammar nazi FTW?
This is the Internet. You can say "fuck" here. - AC
-# of non-numerical constants /,#,; characters in code
-# of ( ),{ },\
-time spent debugging/compiling
-total hours spent in production
-gallons of coffee consumed
-hours of daylight seen
-# of relationships destroyed
He who knows best knows how little he knows. - Thomas Jefferson
For example, "Lines of code" / "Lines of commenting" will always produce "Inf"
(subject says it all ;))
News for Geeks in Austin, TX
So the code calls 61,718,232 functions which don't even exist?
But maybe they just meant "Total Number of Function Calls"
The Tao of math: The numbers you can count are not the real numbers.
I dunno, maybe you could find the algorithm on the net somewhere? ...if only there was some kinda searchable code database of some sort...
Yeah, me too:
2431 int
1802 goto
Also, C++ programmers are getting really old, and they don't handle change very well.
I'm just a single coder
-1, Redundant
This is Slashdot, of course we're all single.
That "woosh" sound you hear is the wink emoticon zooming over your head, joke in tow.
I know PHP is a great web language and that it probably isn't the cause of the slowdown. Heck, even Yahoo! uses it these days.
I was attempting (unsuccessfully, it seems) to make fun of the purists who insist that robust web applications must run on something compiled in order to reach acceptable performance under high load.