Searchable C/C++ DB surpasses 275 million lines

Some statistics to get you started by Anonymous Coward · 2005-12-05 05:28 · Score: 5, Funny

I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code.

The following "interesting statistics" come to mind:

Percentage of functions named "deepThroat" (0%)
Number of comments mentioning a "girlfriend" (11) or "wife" (29) to "Natalie Portman" (41)
How many variables named "penis" are of type "long" versus type "short" (unknowable!)

You gotta get the variables searchable. Most critical for that last statistic. Also, I'm too lazy to learn Lucene Query Parser Syntax, so the statistics for "Natalie Portman" may include references to "portman."

Re:Some statistics to get you started by WWWWolf · 2005-12-06 06:02 · Score: 1

How many variables named "penis" are of type "long" versus type "short" (unknowable!)

Better to have that as "short" or even "void*", rather than "char"...
Re:Some statistics to get you started by Anonymous Coward · 2005-12-09 05:49 · Score: 0

My penis is a String.

useful statistic by kunzy · 2005-12-05 05:30 · Score: 5, Funny

the time from the frontpage acticle on /. to the death of your server?

Re:useful statistic by Sembiance · 2005-12-05 05:33 · Score: 5, Funny

Well, it's been about 2 minutes on slashdot... my site is already dead. So uhm... 2 minutes?
Re:useful statistic by Anonymous Coward · 2005-12-05 05:42 · Score: 0

You probably saying Fuck!

Using a search of "Fuck"

Well, so are these guys:

void FuckUp 11 GSMP-0.0.6/Plugins/src/Factory.cc
void MindFuckUrquan 22 uqm-0.4.0/src/sc2code/comm/talkpet/talkpet.c
void OS_FreeFuckOver 6 bnserv-1.0.3/services/operserv.c
void * OS_Thread_FuckOver 46 bnserv-1.0.3/services/operserv.c
char * SolveFuckName 25 setedit/install/install.cc

- Moomin
Re:useful statistic by Guysmiley777 · 2005-12-05 05:51 · Score: 1

Mmmm, it smells like burning.

--
Coding with assembly is like playing with Legos. Coding an application in assembly is like building a car with Legos.
Re:useful statistic by Baricom · 2005-12-05 06:11 · Score: 4, Funny

So uhm... 2 minutes?

Sounds like you should have written it in C++ instead of a laggard language like PHP ;).
Re:useful statistic by Rac3r5 · 2005-12-05 07:01 · Score: 1

Hi Sembiance,

Is your code able to interact with a compiler?
Why I ask this is because sometimes when you are looking at some code and would like to implemnt it, it would be useful to run it through your parser and see how many warnings/errors it comes up with. Or maybe its overkill, but just a thought :)
Re:useful statistic by Wikipedia · 2005-12-05 09:24 · Score: 0

The backups work OK:
The search, uhhhh, I dunno
http://www.google.com/search?q=cache:http://csourc esearch.net/
http://csourcesearch.net.nyud.net:8090/

--
P2P Anonymous Distributed Web Search: http://www.yacy.net/

My vote is for... by Anonymous Coward · 2005-12-05 05:31 · Score: 5, Insightful

How many lines consist of:
}

Re:My vote is for... by Forseti · 2005-12-05 05:36 · Score: 0

You forgot the semicolon, that ain't gonna compile! ;-)

--
Delay is preferable to error. (Thomas Jefferson)
Re:My vote is for... by Erioll · 2005-12-05 05:41 · Score: 1

Most closing-statements require no semicolon. While things like class definitions, structs, etc, DO, "typical" programming blocks do NOT, like if, while, and switch blocks. Even functions don't terminate their blocks with a semicolon.

So I'd suspect lines with purely "}" and whitespace would be quite a few.
Re:My vote is for... by Anonymous Coward · 2005-12-05 05:43 · Score: 2, Funny

Probably about as many lines consist of: {
Re:My vote is for... by epiphani · 2005-12-05 05:43 · Score: 4, Interesting

Same type of thing, but indenting styles. K&R vs. BSD, ect. I'm curious how that breaks up.

(Partial to BSD style myself..)

--
.
Re:My vote is for... by Forseti · 2005-12-05 05:47 · Score: 1

Yeah, it was a joke, hence the ";-)" ...

--
Delay is preferable to error. (Thomas Jefferson)
Re:My vote is for... by sparkes · 2005-12-05 05:50 · Score: 1

That would be interesting but a bugger to search for

--
blog and junk
Re:My vote is for... by buanzo · 2005-12-05 06:16 · Score: 1

How many times the word "fuck" appears on comments. :P We've already done that on the linux kernel! :)

--
Buanzo Consulting - 15 Years of GNU/Linux experience, for you.
Re:My vote is for... by mebollocks · 2005-12-05 06:33 · Score: 5, Funny

I dunno, maybe you could find the algorithm on the net somewhere? ...if only there was some kinda searchable code database of some sort...
Re:My vote is for... by Anonymous Coward · 2005-12-05 06:37 · Score: 0

Not. { often occurs at the end of a line, such as a conditional test.
Re:My vote is for... by Triple+Click · 2005-12-05 06:46 · Score: 2, Insightful

Depends whether you do this:

if (cond) {
}

or this:

if (cond)
{

}
Re:My vote is for... by Anonymous Coward · 2005-12-05 07:36 · Score: 2, Informative

K&R!!
ONLY K&R!!!!

Seriously, I am a K&R maniac, which caused me to get quite irritated at one of my professors, who once wrote "confusing braces" on a programming assignment I handed in. (It was a little confusing, but because I was being clever and efficient, not because of my braces preferences.)
I think the proportion of code written in K&R vs. The Incorrect Styles would be very interesting to see.
Re:My vote is for... by joshdick · 2005-12-05 07:43 · Score: 1

The GNU program indent could provide useful help, as that program converts C files from one style to another.
Re:My vote is for... by baadger · 2005-12-05 07:57 · Score: 5, Interesting
Theres an idea right there, how about some stats showing popularity of various coding conventions?
- Variables: under_score vs. camelCase
- Tabs vs. spaces
- "if (cond) {" vs. "if (cond)\n{"
- How many coders bother enclosing single conditionally executed statements with {}
- How many coders bother producing comments directly before or after function definitions, describing function implementation?
- Lines of comments to lines of code ratios
- Number of functions to lines of code ratios for various projects?
- Number of projects making use of global variables?
- C, to C++, to C# (if your engine covers it) project ratio
etc
Re:My vote is for... by abigor · 2005-12-05 08:33 · Score: 1

Just wait until you get an actual job - if you insist on using K&R anywhere I've ever worked, you'll get canned for not following the coding standard, assuming you actually read it. That goes double for Microsoft shops (of which I've only ever worked in one, back in 1999). I've never seen K&R in use outside of school.
Re:My vote is for... by Anonymous Coward · 2005-12-05 08:47 · Score: 0

.I.. oIo pJp ,i,, Giving you the finger! pIpp
Re:My vote is for... by John+Courtland · 2005-12-05 08:52 · Score: 1

I used BSD style for pretty much ever, weening myself on Dr Dobbs journals and Charles Petzold in my youth and not UNIX code... At my current job where we use K&R it took me a while to get used to it, and I will still CR+LF+{ instead of {+CR+LF to this day. Annoying.

--
Slashdot is proof that Sturgeon's Law applies to mankind.
Re:My vote is for... by Anonymous Coward · 2005-12-05 08:55 · Score: 0
Well, personally (not C++ but general programming):
- Variables: under_score vs. camelCase
  Depends on the languages own naming conventions, usually I stick to them but I tend to use camcelCase because it looks prettier ^-^
- Tabs vs. spaces
- If I'm programming in an IDE which doesn't indent lines too far I'll probably use tabs because it's easier to fiddle with, but I prefer to use 2 spaces otherwise.
- "if (cond) {" vs. "if (cond)\n{"
  "if (cond) {"
- How many coders bother enclosing single conditionally executed statements with {}
  Nope
- How many coders bother producing comments directly before or after function definitions, describing function implementation?
- Depends. Normally I won't bother since I can understand my own coding without really getting confused or hunting through things. If I feel something needs documenting for debugging, future modification or if it's a complex function I probably will. Normally on the line straight after the function declaration, with a gap of one line before I write the actual code. When coding I also like to put noticablly sized comments with fancy surrounding ASCII art to split each section of the code up; global variable declaration, functions, commented out codes and\or testing section, etc.
- Lines of comments to lines of code ratios
  Barely any really.
I was gonna finish this post, but then I got high...
Re:My vote is for... by Mr+Z · 2005-12-05 10:58 · Score: 1

Depends on the languages own naming conventions, usually I stick to them but I tend to use camcelCase because it looks prettier

Funny, I think naming_with_underscores is prettier personally. I feel like my head's being bounced around whenever I see a variable namedLikeThis.
If I'm programming in an IDE which doesn't indent lines too far I'll probably use tabs because it's easier to fiddle with, but I prefer to use 2 spaces otherwise.

I personally prefer 4 and expand tabs to spaces. I've had to work with too many tab-damaged programs to leave tabs in my source.

"if (cond) {" vs. "if (cond)\n{"

I have to go with "if (cond)\n{" with the brace lined up to the if. It works better with VI's indent operators < and >. Handy tip: Increase the indent level of a block of code by placing the cursor on *either* brace and press >%. Decrease with <%. Handy, huh? Now, what I'm curious about, is how many projects other than maybe Emacs and GCC use the FSF's crazy indentation scheme: (The dots are there to preserve the relative spacing.)

. if (cond) . . { . . . . code... . . . . code... . . } . else . . { . . . . code... . . . . code... . . }

All those indentation levels make the code look like a giant squiggle. :-)
--Joe

--
Program Intellivision!
Re:My vote is for... by Heembo · 2005-12-05 11:07 · Score: 1

I use a mixture of both - sometimes I want
function
{
}
for clarity, sometimes I want
function {
}
for brevity. Why must I choose one?

--
Horns are really just a broken halo.
Re:My vote is for... by phrotoma · 2005-12-05 11:21 · Score: 1

Does the database track bugfixes to the available code? It would be interesting to correlate data on syntactical style against the "quality" of code. (Not to be taken literally, bugfixes per module of code could provide a crude baseline.) Perhaps certain schools of syntatical thought lend themselves to simpler code? Perhaps better coders tend towards a similar style? Surely an invitation for a flame war over semicolons but interesting nonetheless.

--
STANDARDS: The principles we use to reject other people's code.
Re:My vote is for... by Anonymous Coward · 2005-12-05 11:45 · Score: 0

Apparently, the brace was put on the same line as the signature to save space in the book.
Re:My vote is for... by bondsbw · 2005-12-05 13:42 · Score: 0

Number of lines like so:

printf("Fatal Error 573: You should never see this message!");

--
All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
Re:My vote is for... by Anonymous Coward · 2005-12-05 14:32 · Score: 0

Have you no backbone? Damn fence sitters!
Re:My vote is for... by Anonymous Coward · 2005-12-05 20:17 · Score: 0

I would like to see how many use "i" as a variable in for-loops

Similarity checking by roguerez · 2005-12-05 05:31 · Score: 5, Funny

Find similarities with stuff like SCO.

Interesting stats by sparkes · 2005-12-05 05:32 · Score: 4, Interesting

How many lines contain expletives?

--
blog and junk

Re:Interesting stats by Anonymous Coward · 2005-12-05 05:47 · Score: 0

+1
Re:Interesting stats by moosesocks · 2005-12-05 07:05 · Score: 4, Informative

How many lines contain expletives?

for your reading pleasure.... the linux kernel fuck count

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:Interesting stats by Anonymous Coward · 2005-12-05 12:17 · Score: 0

I suppose one could say that Linux has progressed from being predominantly shit to being mostly crap.

SCO by cmburns69 · 2005-12-05 05:32 · Score: 2, Funny

With all that code indexed, maybe we'll finally be able to figure out what the heck SCO's talking about.

But then again, probably not...

--
Online Starcraft RPG? At
Dietary fiber is like asynchronous IO-- Non-blocking!

Wtf? by GeckoX · 2005-12-05 05:32 · Score: 1, Interesting

What, you've created this wonderful piece of software and _now_ want to figure out what to do with it?

Am I missing something here?

--
No Comment.

Re:Wtf? by Anonymous Coward · 2005-12-05 05:37 · Score: 0

And he spent a year on it, too. :)
Re:Wtf? by Shimmer · 2005-12-05 05:49 · Score: 0, Troll

That was pretty much my reaction too. Talk about a pointless project.

--
The most rabid believers in American Exceptionalism are the exact same people whose policies are destroying it.
Re:Wtf? by Digital+Vomit · 2005-12-05 06:02 · Score: 2, Insightful

What better reason than to create such a program other than "why not"?
A person who is a true programmer in his soul doesn't ask himself "why". Oftentimes the sheer joy of creating something from nothing is enough.

--
Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.
Re:Wtf? by hotdiggitydawg · 2005-12-05 06:08 · Score: 1

I think you just answered your own question - datamining for The Daily WTF...
Re:Wtf? by Geoffreyerffoeg · 2005-12-05 10:32 · Score: 1

Or as the saying goes: scientists ask "why?"; engineers ask "why not?".

One word by OverlordQ · 2005-12-05 05:32 · Score: 1

. . . well program, sloccount. Of course, do some research and tweak the paramaters to get a reasonably accurate result though.

--
Your hair look like poop, Bob! - Wanker.

Statistics: by duckpoopy · 2005-12-05 05:32 · Score: 5, Interesting

1. Lines per function
2. Comment / command ratio
3. Number of curse word variable names

--
word.

Re:Statistics: by gronofer · 2005-12-05 05:38 · Score: 2, Insightful

4. The number of times the wheel has been reinvented.
Re:Statistics: by Anonymous Coward · 2005-12-05 05:40 · Score: 3, Informative

From the stats page if you cannot get to it...

Overall Stats
Number of Packages: 10,931
Total Number of Files: 1,151,819
Total Lines of Code (No comments, no blank lines): 283,119,081
Total of All Lines: 420,355,464
Total Number of Functions: 7,782,468
Total Number of Functions Called: 69,500,700
Total Number of Macros: 9,947,564
Total Number of Classes: 209,361
Total Number of Comments: 38,125,107
Total Number of Structures: 554,178
Total Number of Unions: 19,687
Total Number of Includes: 5,904,187
Re:Statistics: by manJerk · 2005-12-05 05:45 · Score: 1

would be interesting to see how many memory leaks are in all those lines of code...

--
-Boycot shampoo! demand real poo!
Re:Statistics: by Anonymous Coward · 2005-12-05 06:07 · Score: 0

Holy crap! 38,125,107 comments?
What kind of programmer are you?
Oh wait, open-source. You don't HAVE to upkeep this as part of your job!

:D
Re:Statistics: by lcs · 2005-12-05 06:25 · Score: 1

Total Number of Functions: 7,782,468 Total Number of Macros: 9,947,564
Ouch.
Re:Statistics: by AdamWeeden · 2005-12-05 06:29 · Score: 1

Well if we compile them in Windows, all of them. ;)

--
I was quoted out of context in my autobiography...
Re:Statistics: by maxwell+demon · 2005-12-05 06:30 · Score: 2, Funny

Total Number of Functions: 7,782,468
Total Number of Functions Called: 69,500,700

So the code calls 61,718,232 functions which don't even exist?

But maybe they just meant "Total Number of Function Calls" :-)

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Statistics: by dkleinsc · 2005-12-05 06:32 · Score: 1

Some other real suggestions of useful statistics:
1. Maximum brace nesting level for each function (might be difficult, but a good metric for determining the complexity of a function)
2. Percentages of control structures that are while, for, switch, if, etc.
3. Number of embedded constants that aren't 0 or 1
4. Count of references to each function/constant within in a single project

--
I am officially gone from /. Long live http://www.soylentnews.com/
Re:Statistics: by bob_jordan · 2005-12-05 06:50 · Score: 1

4. How many lines belong to SCO.
5. ?
6. Profit

Bob.

(where 5 is a pretty good chance of getting counter-sued out of existance by IBM when the answer is some { and a few less }.)
Re:Statistics: by ThE_DoOmSmItH · 2005-12-05 06:53 · Score: 0

you forgot everyone's favourite command... "GOTO" ... that would be a interesting statistic to see, if people actually still use goto :)

TDS

--
-TubaMan / ThE_DoOmSmItH
Re:Statistics: by NewWorldDan · 2005-12-05 06:55 · Score: 1

I'd like to see the code broken down (and searchable) by license type. BSD vs. GPL vs. Public Domain vs. Other?
Re:Statistics: by sglane81 · 2005-12-05 07:08 · Score: 1

Total Lines of Code (No comments, no blank lines): 283,119,081
Total of All Lines: 420,355,464

Seems to me almost half the code in there was written in Whitespace.

--
This is the Internet. You can say "fuck" here. - AC
Re:Statistics: by Sembiance · 2005-12-05 08:07 · Score: 2, Informative

You can see the license type broken down here:

http://csourcesearch.net/license/

You can also click on any of those licenses and then on that page choose to only search for code found in that license.
Re:Statistics: by Mr+Z · 2005-12-05 11:38 · Score: 1

They're probably counting all the #defines, not just the parameterized ones. So crap like "#define DEBUG" probably counts towards the total.

--
Program Intellivision!
Re:Statistics: by thornist · 2005-12-05 14:02 · Score: 1

What about calls to external libraries. I realise there's a lot of code in there, so a lot of the libraries will be covered too, but there'll surely be calls out into the operating system and so forth.
Re:Statistics: by maxwell+demon · 2005-12-05 22:43 · Score: 1

Well, if it really indices all important C/C++ OSS projects then it should also index glibc (covers all standard C functions), gcc including libstdc++ (covers all standard C++ functions), Linux (covers Posix calls and of course Linux specific calls), X, gtk, gnome, KDE (together cover most GUI stuff) and Wine (covers a lot of the Windows APIs).

--
The Tao of math: The numbers you can count are not the real numbers.

ratio by FreeBSDbigot · 2005-12-05 05:33 · Score: 5, Funny

... of "foo" to "bar."

--
Orange whip? Orange whip? Three orange whips.

Re:ratio by c_forq · 2005-12-05 06:33 · Score: 1

It should be "fu" (FUBAR: Fucked Up Beyond All Recognition), but it is funny nonetheless (if I had mod points I would of modded you up funny instead of posting this).

--
Computers allow humans to make mistakes at the fastest speeds known, with the possible exception of tequila and handguns
Re:ratio by snol · 2005-12-05 07:04 · Score: 1

And yet people always use "foo" as the throwaway identifier. Maybe most coders have an aversion to two-letter names for some reason.
Re:ratio by Sexy+Commando · 2005-12-05 07:06 · Score: 1

Uh... See this.
Re:ratio by MaynardJanKeymeulen · 2005-12-05 07:15 · Score: 1

You must be new here:
when referencing to the movie which made it famous, you could say fubar,
but in programming, and that's what this article is about, one uses foo bar

--
"The day Microsoft makes a product that doesn't suck is the day they make a vacuum cleaner."
Re:ratio by TheGatekeeper · 2005-12-05 07:16 · Score: 1

Ah, no, no it shouldn't.

--
'The staff in the hand of a wizard may be more than a prop for age,' -Hamá, the doorward
Re:ratio by ahem · 2005-12-05 07:24 · Score: 4, Funny

From google:

Search -- foo -> Results 1 - 10 of about 26,600,000 for foo. (0.06 seconds)
Search -- bar -> Results 1 - 10 of about 385,000,000 for bar [definition]. (0.16 seconds)
Search -- foo bar -> Results 1 - 10 of about 7,900,000 for foo bar. (0.12 seconds)

'bar' wins. This intuitively makes sense, as who would want to go to the 'foo' for a drink, or eat an 'energy foo'? Could you imagine a lawyer being 'dis-fooed'?

--
Not A Sig
Re:ratio by sh0dan · 2005-12-05 07:49 · Score: 1

It should be "fu" [...] (if I had mod points I would of modded you up funny instead of posting this).
If I had mod points I would have... oh nevermind... joke's on you. ;)
Re:ratio by c_forq · 2005-12-05 07:58 · Score: 1

I wasn't referring to the movie, but the actual military jargon. See SNAFU, TARFUN, and FUBAR in a slang dictionary.

--
Computers allow humans to make mistakes at the fastest speeds known, with the possible exception of tequila and handguns
Re:ratio by kernelfoobar · 2005-12-05 08:27 · Score: 1

Quit making fun of my NAME!

--
Here we go again!
Re:ratio by karnal · 2005-12-05 10:19 · Score: 1

Linus hates you.

--
Karnal
Re:ratio by circusboy · 2005-12-05 13:01 · Score: 1

I recall this making it famous first, but then again, as acronyms go, it's been around for some time...

--
-- it's ridiculous how many people misspell ridiculous... (damn, damn, damn...)
Re:ratio by Anonymous Coward · 2005-12-05 16:27 · Score: 0

Could you imagine a lawyer being 'dis-fooed'?

No, but I can image a lawyer being 'fish-fooed'.
Re:ratio by pclminion · 2005-12-06 05:59 · Score: 1

Assuming Google indexes about 10 billion pages, we can compute the conditional probability of Bar given Foo, i.e. P(Bar|Foo):
P(Bar|Foo) = P(Bar,Foo)/P(Foo) = C(Bar,Foo)/C(Foo) = 7.9e6/2.66e7 = 0.297.
Relatively high, actually.

275+ million lines by four2five · 2005-12-05 05:33 · Score: 1, Funny

How about the % of them that would work on a lady in a bar? line 53256 "Hey pretty lady, are you an astronaut because your ass looks out of this world" ....oh....not those kinds of lines....*sigh* and I thought I was so close

--
-or so you'd think

Re:275+ million lines by hritcu · 2005-12-05 05:50 · Score: 1

line 53256 "Hey pretty lady, are you an astronaut because your ass looks out of this world"

Knowing that there are not so many women writing (or *sigh* reading) open source I think it is very unlikely that adding such line to your source code will get you anywhere. You could try though, and of course tell us what happend :)

--
If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
Re:275+ million lines by gstoddart · 2005-12-05 07:11 · Score: 2, Funny

How about the % of them that would work on a lady in a bar? line 53256 "Hey pretty lady, are you an astronaut because your ass looks out of this world" ....oh....not those kinds of lines....*sigh* and I thought I was so close

No, no, no.

You do not use lines 1..N on the same lady until it works. It's not like breaking encryption -- you don't get to try all the possible keys.

I have friends who have done this, and they swear it's a percentage game. Choose one line you like, and try it on women 1..N until it does work, or you get tired of getting told to sod off. Apparently, with the right combination of variables, any line can be verified to work under some circumstances.

Truthfully, I don't know how anyone can set out with the knowledge they're going to get told to drop dead 70-100 times/night, but I guess if you can live with that kind of failure rate on an ongoing basis, you'll eventually get the success rate you wanted.

Now go forth young geek, and attempt to multiply. ;-)

--
Lost at C:>. Found at C.
Re:275+ million lines by four2five · 2005-12-05 08:01 · Score: 1

"Truthfully, I don't know how anyone can set out with the knowledge they're going to get told to drop dead 70-100 times/night, but I guess if you can live with that kind of failure rate on an ongoing basis, you'll eventually get the success rate you wanted." This is simply the brute force method of reproduction ;) Great post, I'd mod your's up if I had any points right now.

--
-or so you'd think
Re:275+ million lines by kn0tw0rk · 2005-12-05 10:17 · Score: 1

Its not the fact you use a 'line' multiple times, but the fact you try talking to 1..N ladies until you succeed.

For those with the rejection proof ego that can go and do this, I say good luck to you.

--
See my art -> http://herbevore.deviantart.com

Suggestion by lbmouse · 2005-12-05 05:33 · Score: 5, Funny

"I'm currently looking for suggestions..."

How about a new server?

Slashdot Block by Yerase · 2005-12-05 05:34 · Score: 3, Interesting

I love the GeShi page, how it blocks everything from Slashdot. Setup a site to advertise a product, then restrict people from using it....

URLs on this server linked by slashdot.org will be refused. Permission is given to slashdot to mirror content as necessary for the purpose of providing its users access to the information on the site. Slashdot should not attempt to bypass the referer block. Use of the google cache page for the site is acceptable as long as the page(s) concerned have no more than 1 image.

Re:Slashdot Block by lowrydr310 · 2005-12-05 05:41 · Score: 2, Insightful

This policy is employed for the sole purpose of avoiding a huge bandwidth bill that I would have to pay out of my own pocket. Anyone who would like this restriction to go away is more than welcome to send me bucketloads of cash.
If you don't want to pay a big bandwidth bill then don't run a webserver.
Re:Slashdot Block by Anonymous Coward · 2005-12-05 05:42 · Score: 0

Ok, everybody get ready to open the GeShi page in a seperate window...
Paste the address now.
Get ready to press enter..... 5... 4... 3... 2... 1... and press.

Anyone still seeing the site is to press 'Refresh' as required.

Their fault for trying to tell me what site I'm not allowed to visit if I want to visit theirs.
Re:Slashdot Block by wampus · 2005-12-05 05:45 · Score: 2, Interesting

Thats why I use Cacheout. Its a Firefox extension that adds a context menu item to coralize any link. Bypass the restriction AND not kill the site, all at the same time.
Re:Slashdot Block by Anonymous Coward · 2005-12-05 05:45 · Score: 0

Wow...sounds like someone actually RTFA.

You must be new here.
Re:Slashdot Block by Anonymous Coward · 2005-12-05 05:49 · Score: 0

Yeah, no kidding. His first step should have been to not post to /. if he was so concerned about outlaying money.
Re:Slashdot Block by b4k3d+b34nz · 2005-12-05 05:53 · Score: 2, Insightful

Why would anybody WANT to pay a big bandwidth bill? It's called being smart so that he doesn't get the shaft when he has to pay his utilities this month.

--
Grammar Lesson: you're is a contraction of "you are"; your means you possess something; yore means days gone by.
Re:Slashdot Block by teslar · 2005-12-05 05:57 · Score: 1

Wow...sounds like someone actually RTFA.
You must be new here.
You on the other hand must be a veteran. The link the parent is talking about is not TFA ;)
Re:Slashdot Block by Anonymous+Brave+Guy · 2005-12-05 05:59 · Score: 1

If you don't want to pay a big bandwidth bill then don't run a webserver.

If you want access to a web server, don't run a system that's known to give the provider big bandwidth bills.

At the end of the the day, they don't owe you anything, and anything they offer you is a courtesy, not an obligation. If you don't like that, please feel free to go create and finance your own WWW.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:Slashdot Block by gstoddart · 2005-12-05 06:20 · Score: 2, Insightful

This policy is employed for the sole purpose of avoiding a huge bandwidth bill that I would have to pay out of my own pocket. Anyone who would like this restriction to go away is more than welcome to send me bucketloads of cash.

If you don't want to pay a big bandwidth bill then don't run a webserver.

That's a little harsh don't you think?

It's one thing to run a site and have reasonable expectations of having "enough" bandwidth for your projected traffic, and it's another thing to pay for a slashdotting on an ongoing basis.

This person has decided they don't really want to be linked from Slashdot.

It's hardly an all-or-nothing thing ... for my personal web-site, the several gigs of traffic I'm allowed per month are more than adequate. But I'm sure as hell not going to pay extra to have enough on the off-beat chance that everyone in the world suddenly wants to see my site.

--
Lost at C:>. Found at C.
Re:Slashdot Block by Kjella · 2005-12-05 06:25 · Score: 2, Informative

If you don't want to pay a big bandwidth bill then don't run a webserver.

For every problem, there is a solution that is simple, elegant and wrong. In every other market, the more demand there is, the higher the price/revenue/profit. Web servers are pretty much the only place where you lose more money the more popular you are (e-commerce sites and such not included). If so many people want the content, they can find a way to share it. Even then they're getting a bloody good deal, if you ask me. What exactly are you complaining about, that they aren't generous *enough*? Blocking slashdottings is a small price to pay compared to turning it into a [ad] pile [ad] of [ad] advertisements or subscription site. That is what you do if you "don't want a big bandwidth bill".

--
Live today, because you never know what tomorrow brings
Re:Slashdot Block by radu124 · 2005-12-05 07:11 · Score: 1

This is like having a party and wanting people to come over, but not too many of them because you don't want to give away too much beer.

Some would rather have a big party even if they are left broke and some won't.

If I really had something to show to the world, and hopefully someday I will, I'll invite Slashdot to come by.
Re:Slashdot Block by gstoddart · 2005-12-05 07:16 · Score: 1

This is like having a party and wanting people to come over, but not too many of them because you don't want to give away too much beer.

No, it's like having a party, but being very firm that's not some "intergalactic kegger" for everyone to come by because you bought a case of beer, not enough for a frat party.

Providing beer to a small handful of friends is one thing, providing it for everyone who crashes the party is another.

--
Lost at C:>. Found at C.
Re:Slashdot Block by Karora · 2005-12-05 07:17 · Score: 1

Setup a site to advertise a product, then restrict people from using it....

Similar to the main linked site, GeShi is the product of a very smart young coder (who I happen to employ) and who does not have the funds to pay for bandwidth. He doesn't advertise on the site, and GeShi is GPL and something he did for fun in his spare time.

I'll see if we can get it onto some better bandwidth / equipment for him so he can cope with Slashdot in the future, but it's not going to happen during this event.

--

...heellpppp! I've been captured by little green penguins!
Re:Slashdot Block by PhiRatE · 2005-12-05 08:23 · Score: 1

Actually the issue isn't so much bandwidth, I put the block in place years ago when hundreds of gigabytes would have been a problem, it's not a problem anymore. The issue is one of processing power. Most sites hosted on Exorsus are PHP based and utilise a database to some degree or another. Apache is more than capable of delivering the number of hits required but on a general basis, it is more than likely that the system would overload and interrupt other users sites, email etc.

In the end, on a free system like Exorsus, it comes down to most utility. It does not serve a good purpose to interrupt all the rest of the system users' functions in order to cater to an occasional spike, especially since there are plenty of mechanisms for helping out servers, such as coralize, which will both avoid the slashdot restriction on the site and avoid placing unnecessary load on my system.

--
You can't win a fight.
Re:Slashdot Block by kula.shinoda · 2005-12-05 08:27 · Score: 1

I get hosted there by my cousin, who graciously grants me space and bandwidth for free. He blocks slashdot because he doesn't charge any money for his hosting.

I'm very grateful for his hosting, and if you were running a small OSS project for free (note, I make _no_ money from it, even though I have been offered), you'd be grateful for free hosting too. Making slashdotters hide their referrers is a small price to pay.

--
Real men don't write sigs
Re:Slashdot Block by Anonymous Coward · 2005-12-05 20:28 · Score: 0

I see that his project uses GPL.

Why can't he just put his project on Sourceforge and get all the bandwith and CPU power he needs?
Re:Slashdot Block by lowrydr310 · 2005-12-06 00:05 · Score: 1

Blocking slashdottings is a small price to pay compared to turning it into a [ad] pile [ad] of [ad] advertisements or subscription site. That is what you do if you "don't want a big bandwidth bill".
That's a very good point that I overlooked. I didn't see the original site because of the slashdot block and I was too lazy to manually type it in. If there are no ads (and he's paying out of pocket) then good for him and everyone who regularly visits his site.

Of course.. by Anonymous Coward · 2005-12-05 05:35 · Score: 0

how many lines contain the word 'fcuk'

d'oh!

Choice of db? by Anonymous Coward · 2005-12-05 05:35 · Score: 4, Interesting

So, this is not a flame, but I'm curious about your choice of dbs.
I've used mysql for some small projects, but generally it does handle
millions of rows (although the upper limit on rows can be patched with
some additional behaviors). So, for big dbs, I use postgresql.

How did you decide to use mysql? (Was it that the project started,
and grew, or did you know it would handle large numbers of rows
from the start)?

Just curious. This is probably going to be viewed as a flame by many
(particularly those who don't really use dbs very much, but use them
enough to have strong opinions).

Re:Choice of db? by Anonymous Coward · 2005-12-05 05:44 · Score: 1, Interesting

Tell that to the 23 million row table I'm currently playing with - no tweaking or patching needed. What version of MySQL have you tried "large" databases with (23 million rows isn't large)
Re:Choice of db? by Sembiance · 2005-12-05 08:18 · Score: 4, Informative

I've used MySQL in the past for some projects at work, where the number of rows were several hundred million and ran with no problems so I knew it was capable of large row numbers.

I initially used their FULLTEXT indexing as well, but it dies a horrible death with a large number of rows or search terms. (The developers that live in #mysql on Freenode confirmed this)

So I had to hand off searching to Lucene, which worried me a great deal (being java) but as folks tell me 'Java is not slow'.
They are right, Java is very fast at handling the searching and I've been very impressed.
Most searches in the Java database only take one or two seconds.
The MySQL query/join for additional info take another 4 or 5 seconds.

Most searches take about 8 seconds to come up, even under no load.

I simply don't have enough RAM to keep the necessary MySQL indexes in RAM and use index only queries.
Re:Choice of db? by Anonymous Coward · 2005-12-05 11:08 · Score: 0

So... why is the server dead?

Web server?
Database?
Search engine?
None of the above?
Re:Choice of db? by Sembiance · 2005-12-05 12:47 · Score: 1

Well the server didn't actually die.
I had a limit of 125 connections for Apache, and that was reached and stayed maxxed for several hours.

The site did respond to requests, searches, etc. it was just very slow due to high load on the box:
18:38:29 up 97 days, 9:21, 1 user, load average: 27.49, 21.31, 28.03

It was in the 40+ range earlier :)

And then... by guaigean · 2005-12-05 05:36 · Score: 1, Troll

Stay tuned for our reaching 280 million lines, followed by 285, 290, 295, and 300. Expect a new Slashdot post soon, as we need to advertise!

--
Microsoft Sucks, F/OSS Rocks. I get mod points now right?

Re:And then... by Sembiance · 2005-12-05 05:39 · Score: 5, Interesting

Advertise? No, I'm just a single coder doing this for fun and hope that some people will find it useful.
Re:And then... by guaigean · 2005-12-05 05:42 · Score: 2, Funny

My apologies then. As a regular Slashdotter it is forbidden for me to RTFA.

--
Microsoft Sucks, F/OSS Rocks. I get mod points now right?
Re:And then... by Anonymous Coward · 2005-12-05 06:23 · Score: 0

Thanks, dude!!!

I can throw away many stupid/bloated books and set the homepage in my browser to about:blank again (instead of google for searching code snippets).

You made my life easier!
Re:And then... by Anonymous Coward · 2005-12-05 07:44 · Score: 2, Funny

I'm just a single coder

-1, Redundant

This is Slashdot, of course we're all single.
Re:And then... by chris_eineke · 2005-12-05 08:44 · Score: 1

My hat's off to you. Great work.

--
"All you have to do is be fragile and grateful. So stay the underdog." Chuck Palahniuk, Choke

Just a couple... by nexxuz · 2005-12-05 05:36 · Score: 0

Most frequently searched items, number of searches per min. (or after /. per sec.)

--
I love random hex numbers! Just like this one, 09f911029d74e35bd84156c5635688c0.

Just Like The Linux Kernel Problem by jack_csk · 2005-12-05 05:38 · Score: 0, Redundant

Find out how many profane words are there in the source code comments.

Statistics TM (c) by chunews · 2005-12-05 05:38 · Score: 5, Interesting

It would be interesting to see the number of different copyright notices contained within all that source code, and then to present the notices in groups, like GPL GPL2, etc..

Also, I would really like to find "patient 0" for sourcecode. For example, is there a common library or utility function (perhaps Hex2Ascii?) that *everybody* uses? Well, who wrote it first?

And in a similar vein, who are the "top 5-10-100" authors of open source code by use, reuse, KLOC, etc.. Not of too much use unless I were awarding the Nobel prize for programming, or perhaps creating a list of individuals for the RIAA to sue, after their done with their other useless lawsuits. :)

Interesting Statistics by iso-cop · 2005-12-05 05:39 · Score: 5, Interesting

In the software engineering world, people will be interested in all sorts of code metrics such as cyclomatic complexity, operator/operand counts, lines of code per module, and such as well as object oriented metrics for the C++ code (depth of inheritance, for example). If you can marry these sorts of metrics with defect data (bugs) for each of the modules then you have a useful data repository for predicting defects in source code. Keeping around different versions of modules changed is also valuable here. If you can gather information on how long it took to produce the module and how long it took to correct defects in the module you are getting even better. If you make it easy to reuse the C and C++ modules...even better.

Re:Interesting Statistics by julesh · 2005-12-05 07:02 · Score: 1

In the software engineering world, people will be interested in all sorts of code metrics such as cyclomatic complexity, operator/operand counts, lines of code per module, and such as well as object oriented metrics for the C++ code (depth of inheritance, for example).

Nah. What we really want to know is, which open source project has the most obscenities in its comments?
Re:Interesting Statistics by Andreas(R) · 2005-12-05 10:57 · Score: 1

In the software engineering world, people will be interested in all sorts of code metrics such as cyclomatic complexity, operator/operand counts, lines of code per module, and such as well as object oriented metrics for the C++ code (depth of inheritance, for example).

As part of an empirical study, I've used CCCC , to measure the software evolution of software (how metrics for SW architecture evolves over time). This tool supports McCabe's cyclomatic complecity, LOC, Henry-Kafura information flow metric, module coupling etc. The purpose can be to detect software decay, architecural mismatch etc. However, to study software evolution, one needs access to all the versions of the software, which this site does not provide (yet).
Lehman did the first studies on software evolution, proposing several laws for the field.
Re:Interesting Statistics by Anonymous Coward · 2005-12-05 13:27 · Score: 0

"we like profanity in open source software"

Size doesn't matter by Peteresch · 2005-12-05 05:39 · Score: 1

It's the quality of the search results that counts.

Re:Size doesn't matter by kmartshopper · 2005-12-05 06:17 · Score: 3, Funny

It's the quality of the search results that counts.
Yeah, keep telling yourself that...

Amazon style statistics by tod_miller · 2005-12-05 05:39 · Score: 4, Interesting

I was very impressed with Amazon, who for each book say which phrases and words were particularly unique to that book. (reminds me of that google game where try try and get any two words with only 1 hit).

So show code with coloured background to the lines, from green to red, green being 'normal every day boiler plate' code, red would mean this code must be more specialised, or written by some half-wit l33t h4x0r at least.

I forgot what they called it, but they had 3/4 visible stats based on the semantics of the stuff, probably more under the 'hood (omg lol).

word. Oh some adhesion stats would rock!

please type the word in this image: adhesion
random letters - if you are visually impaired, please email us at pater@slashdot.org

--
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com

Re:Amazon style statistics by 3770 · 2005-12-05 07:22 · Score: 1

That was harder than I thought:

perpetuum fellatio, 922 hits
perpetuum chives, 84 hits
perpetuum snail, 190

922 people were able to fit perpeetum and fellatio on the same page. 84 people wrote about perpetuum and chives and 190 about perpetuum and snail.

perpetuum napalm, 2190
perpetuum booger, 487
perpetuum spleen, 518
perpetuum ninja, 931

Man...

--
The Internet is full. Go Away!!!
Re:Amazon style statistics by RobbieGee · 2005-12-05 10:02 · Score: 1

"Results 1 - 10 of about 3,460,000 for revolutionizing technology."

Hm, this is going to be harder than I thought...

--
If you get this, we're 10 of a kind.
Re:Amazon style statistics by Anonymous Coward · 2005-12-06 10:14 · Score: 0

senile zooarcheologist. Well, at least until Google hits this page...

Re:Are you proud of 275 million lines of code? by sparkes · 2005-12-05 05:40 · Score: 1

Write 'I must read at least the post before I comment' 275 million times and when you are finished you can use slashdot again.

--
blog and junk

Pro Words by SpinJaunt · 2005-12-05 05:41 · Score: 1

Obviously we geeks really want to know is, how many "F" & "S" profanity words there are, amongst other useful and descriptive comments.

--
/. is good for you.

Several basic stats by Anonymous Coward · 2005-12-05 05:43 · Score: 0

Number of non-comment, non-blank lines
Number/percentage of each C/C++ control structure (if, switch, variable assignment, etc.)
Average size of functions in lines and min/max.

The basics and more by PetriBORG · 2005-12-05 05:44 · Score: 2, Insightful

Start with the basics, and then move on..

Whitespace to code ratio
Counts for each of the dirty 7
Line counts that just contained () or {} or []
A list of projects the code is from
And then more interestingly, I'd like to run some sort of program on it to find similarities in code, to see how much one code base overlaps with another. It would be interesting to see if OSS actually does share code between projects or if its all NIH (not invented here).

--
Pete/Petri "damn, my chainsaw is clogged with 1's and 0's again." --clyde

Re:The basics and more by Sembiance · 2005-12-05 08:25 · Score: 1

You can see the list of projects this code is from here:
http://csourcesearch.net/package/

Hit Refresh by everphilski · 2005-12-05 05:44 · Score: 4, Informative

Just hit refresh and the webserver won't get the HTTP_REFERRER (granted you'll have to manually delete the text file he serves you)

-everphilski-

Re:Hit Refresh by sglane81 · 2005-12-05 06:03 · Score: 2, Funny

Actually, if you click refresh on a page from a link, it will resend the referrer as well. Most browsers do this. One more thing, you spelled HTTP_REFERRER correctly, which is wrong :) It's spelled HTTP_REFERER, only has one R. Reverse grammar nazi FTW?

--
This is the Internet. You can say "fuck" here. - AC
Re:Hit Refresh by FuzzyBad-Mofo · 2005-12-05 06:43 · Score: 1

You can also disable the Referer header in Mozilla-based browsers by a setting in about:config. I'll leave the merits of doing so to the reader.
Re:Hit Refresh by Anonymous Coward · 2005-12-05 09:12 · Score: 0

HTTP_REFERER, only has one R
Well three if you really want to be pedantic.
Re:Hit Refresh by RobertLTux · 2005-12-05 13:06 · Score: 1

or for extra Pedantic Points the second group of the character R is a single R (as are the other two groups) or the only character in the sequence that is doubled is the T (the second and third characters)

--
Any person using FTFY or editing my postings agrees to a US$50.00 charge

interesting stat by bsdluvr · 2005-12-05 05:45 · Score: 3, Funny

1) randomly select 2000 lines of code
2) compile
3) execute
4) ???????
5) PROFIT!

Re:interesting stat by gnud · 2005-12-05 06:12 · Score: 1

I think Step 4 should be "Rewrite into a ground-breaking application, that will for decades be known as the epitome of usefullness and good engineering. Sell to highest bidder."

What? Millions of code? by bogaboga · 2005-12-05 05:45 · Score: 0

As a programmer myself, though not that serious, the greatest number of lines of code I have written is 13,671 in a VB application processing costs for chemical analysis in a lab. This makes me wonder...How can one write over 200 million lines of code? How does one debug the beast? Believe me, even 1 million lines of code is a lot of code. How long does this thing take to compile? There are so many questions that just leave me to respect these programmers.

Re:What? Millions of code? by Anonymous Coward · 2005-12-05 05:48 · Score: 0

It's not all written by one guy.

In case you weren't being serious, "heh". +0.1 funny.
Re:What? Millions of code? by tgd · 2005-12-05 05:49 · Score: 4, Informative

Its a searchable database OF code from other products, containing 275 million lines you can search across.

Its not a searchable database written in 275 million lines of code.
Re:What? Millions of code? by Anonymous Coward · 2005-12-05 05:50 · Score: 0

This is a database of many multi-purpose code sections. This is not a single application.
Re:What? Millions of code? by masklinn · 2005-12-05 05:51 · Score: 1

Whoa, not only you didn't RTFA (well, that's slashdot so it's ok) but you didn't even read the headline?

--
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
Re:What? Millions of code? by Anonymous Coward · 2005-12-05 05:52 · Score: 0

The main problem you have is that VB takes a lot less lines to code the same.
For example:
On VB you do: 'msgbox "I don't read the posts"'
On C++ yo do:
main
(
int
argc
,
char
*
*
argv
)
{
etc..
etc..

I hope you get the picture.
Re:What? Millions of code? by Shimmer · 2005-12-05 06:01 · Score: 1

This project didn't write the 275 million lines of code, they collected code written by others.

--
The most rabid believers in American Exceptionalism are the exact same people whose policies are destroying it.
Re:What? Millions of code? by Quiet_Desperation · 2005-12-05 06:07 · Score: 1

That's why I moved on to higher level stuff like VB, or RealBasic now that VB has been sucked into the .Net singularity. I don't write 3D games or supercomuter simulations of galactic collisions. Most of what I write is toolware or interfaces to my own hardware designs- very GUI oriented stuff that needs to go from idea to working application in, like, one day. But I still get the "serious coders" asking "why aren't you doing that in C?" Or the message board trolls with "Dur! You couldn't write a FPS in RB! Dur!" Yeah, no shit, Sherlock.
At this point, I just laugh at them and put dirt in their hair.
Re:What? Millions of code? by Anonymous Coward · 2005-12-05 06:07 · Score: 0

I think you're confused.

This is not a single software product. It is a collection of many OSS projects who's contents have have been indexed by a search engine. Hence the huge line count.
Re:What? Millions of code? by Anonymous Coward · 2005-12-05 06:22 · Score: 0

What kind of questions are you asking on message boards that gets people to make fun of you? By asking, you demonstrate that you don't in fact know it all. Maybe you should heed their advice. Stay away from toy languages like C, C++, Java, and Python. Use FoxPro.
Re:What? Millions of code? by Quiet_Desperation · 2005-12-05 07:55 · Score: 1

Wow. You really were looking for someone to dump on today, weren't you?
I meant trolls who go to RB message boards and troll about RB. I'm not asking any questions at all.
Toy languages? Huh? Every language has its place and use. That's all I say.
Re:What? Millions of code? by SamSim · 2005-12-05 11:49 · Score: 1

So it's a searchable database written in 275 million lines of database!

--
qntm.org

Three or Four? by sycodon · 2005-12-05 05:46 · Score: 1

So that's what...3 or 4 programs worth?

--
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.

Woman by chris_mahan · 2005-12-05 05:46 · Score: 2, Funny

I'd like to know whether the word "woman" appears anywhere, and if so, in what projects.

Eh.

--

"Piter, too, is dead."

Unfortunately by aztektum · 2005-12-05 05:46 · Score: 1

All the code was just /.'ed into oblivion. Time to start from the beginning all over again. :(

--
:: aztek ::
No sig for you!!

Measurements I have made by derek_farn · 2005-12-05 05:47 · Score: 4, Insightful

Source code usage measurements contain many surprises (ie, developers don't always write what people think they do). Some statistics I have collected, on a smaller code base, are available here. The source of the tools used to exract much of the data (at least for those tables and figure I produced) is available here (C only at the moment).

Being able to search so much source is also very useful. I was involved in a discussion a while back about the frequency of use of bessel functions in programs (I claimed rare). The handful of uses returned from your database helped back up my argument (dare I say prove it).

Keep up the good work!

Re:Measurements I have made by Stealth+Potato · 2005-12-05 09:04 · Score: 1

You numbered the sections starting at negative five?

You, sir, are a geek among geeks. I salute you.
Re:Measurements I have made by sir99 · 2005-12-05 16:10 · Score: 1

Is there a reason why the caption of Figure 0.19 is mirrored, upside down, and has its period in the right margin? That's some right fancy formatting you've got in that document! (only half-joking)

--
The ocean parts and the meteors come down
Laid out in amber, baby.
Re:Measurements I have made by derek_farn · 2005-12-06 01:23 · Score: 1

Period in the right margin! Thanks for reporting the typo (it should be in the expected place).
The text (available via this page) asks readers to estimate the relative brightness of various squares. Not wanting people to get the answer by simply reading the caption (many people can read upside down writing relatively easily) I mirrored it as well making it upside down (who said modern books were dumbing down for their readers ;-).
Re:Measurements I have made by sir99 · 2005-12-06 10:11 · Score: 1

At first I thought it was a dead pixel, can't have that! But I see your point, literally and figuratively ;). I had to use a mirror to read it myself. Even if one draws a thick line of constant color between the two squares, it still looks like the line is a smooth gradient between different intensities.

--
The ocean parts and the meteors come down
Laid out in amber, baby.

Sounds kind of like the PMD scoreboard... by tcopeland · 2005-12-05 05:48 · Score: 4, Interesting

...that is, a static analysis of a bunch of Java SourceForge projects. It does unused code and duplicate code detection... sometimes it finds some interesting things.

PMD home page is here, book site is here.

--
The Army reading list

Re:Sounds kind of like the PMD scoreboard... by gcooke · 2005-12-05 09:53 · Score: 1

There's a company called Headway Software, makes a tool called reView (...I think it has a different name now) that generates UML-like diagrams of source code annotated with a buttload of metrics. I'd love to see the reView tool run against this C/C++ database and the resulting trees stored for perusal.

Metrics are going to be a key adjunct to this database. All the metrics tools like reView, PMD, Parasoft's tools, etc. produce would be useful.

Even more useful would be pattern metrics -- CheckStyle and PMD can recognize some common design patterns and one of them (PMD?) has the ability to be extended with custom pattern recognizers. With a database this big, it should be feasible to produce proof substantiating the claim to bestness of just about any practice.
Re:Sounds kind of like the PMD scoreboard... by tcopeland · 2005-12-05 10:00 · Score: 1

Yup, both Checkstyle and PMD can be extended by writing custom rules. I think with Checkstyle you can use regexp's quite cleanly, while with PMD you can use XPath. There's some discussion of PMD XPath examples in this CodeSnipers interview.

--
The Army reading list

Statistics? by Anonymous Coward · 2005-12-05 05:51 · Score: 0

"I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code."

The obvious statistic would be: how many of these are copyrighted by CSO?

cout "why bother" by micromuncher · 2005-12-05 05:51 · Score: 1

I'm currious, when people are looking for code, what do they do as a first resort? Maybe this should be a poll. Me, I'm a bit funny...
1) look in my library (books)
2) do a deja search
3) ask smarter people than me
4) do a web search (usually on specific sites)

--
/\/\icro/\/\uncher

Find all buffer overflows please by G4from128k · 2005-12-05 05:52 · Score: 1

I can only hope that this database has good metadata on which code fragments contain/don't contain various common species of exploits (buffer overflow, stack overflow, mal-formed input vulnerabilities, etc.). It would be nice to know which code fragments have all the needed input/size checking needed to be safe for exposure to the outside world and which are "for internal use only."

--
Two wrongs don't make a right, but three lefts do.

the obvious answer by BushCheney08 · 2005-12-05 05:52 · Score: 0, Flamebait

The most obvious statistic is "how many of these lines were stolen from SCO?"

--
Be a real patriot: Question authority. Think for yourself. Formulate your own conclusions.

Re:the obvious answer by iapetus · 2005-12-05 06:12 · Score: 1

275 million, but I'm not telling you which ones.

--
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.

Interesting statistic... by programic · 2005-12-05 05:52 · Score: 1

How about the number of lines marked up with "TODO"?

--
-- yawn. --

Koders dot com by Anonymous Coward · 2005-12-05 05:53 · Score: 0

Searching 225,816,744 lines of code...

interesting stats .. by torpor · 2005-12-05 05:54 · Score: 1

.. how many times the same code appears with different function names (i.e. how plagued by NIH are you)? .. how many times the same function_name() appears with different code? .. how much of the code fails to compile?

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --

How about... by Hrvat · 2005-12-05 05:56 · Score: 1

What is the most common controlling variable name in a for loop?

--
TANSTAAFL

Most important search term by Boom11 · 2005-12-05 05:57 · Score: 0

The most important search term would be "functionality", ie. show me functions which do this or that.
Without the ability to find the needle, the big hay-stack you have collected will only give you huge bandwidth bills, and give us with very little that cannot be found elsewhere.

Not working well -- TRY AGAIN LATER by putko · 2005-12-05 05:58 · Score: 1

It is hosed.

I tried searching. Here's what I got:

XML Parsing Error: junk after document element Location: http://csourcesearch.net/performSearch.php?type=Fu nctionTypeReturned&search=(&ignoredRandomNumber=11 33805159922.7798 Line Number 2, Column 1:Warning: mysql_connect() [function.mysql-connect]: Can't connect to MySQL server on '127.0.0.1' (4) in /home/csourcesearch.net/include/php/GraphXML.php on line 309
^

--
http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html

Please check for this: comma in brackets in C++ by Animats · 2005-12-05 05:58 · Score: 5, Interesting

C++, for historical reasons dating back to C, has wierd semantics for commas in brackets. The operator precedence for commas is different inside of "()" and "[]".

So tab(i,j) is a function call with two arguments. But tab[i,j] is an invocation of the "comma operator", then a function call with one argument. The default "comma operator" ignores the first argument and returns the second. It once had some uses in C macros.

I've argued with the C++ committee about this. If "operator[]" had the same syntax as "operator()", we could have support for multidimensional arrays in C++. But there's a concern that somewhere, someone might have code that depends on the current semantics of the comma operator inside square brackets.

This new archive offers the opportunity to eliminate that possibility. So, do this search: Find, in non-comment standard C++ code, any occurences of a comma operator within square brackets. Eliminate any where there are parentheses within the square brackets enclosing the comma. Can you find any? In any production code? In any open-source project? Anywhere?

Re:Please check for this: comma in brackets in C++ by Vorondil28 · 2005-12-05 06:17 · Score: 3, Insightful

I've argued with the C++ committee about this. If "operator[]" had the same syntax as "operator()", we could have support for multidimensional arrays in C++.

I'm no C++ expert, but isn't int array[row][col] a multidimensional array?

--
This sig rocks the casbah.
Re:Please check for this: comma in brackets in C++ by Animats · 2005-12-05 06:23 · Score: 1

I'm no C++ expert, but isn't int array[row][col] a multidimensional array?
No, it's an array of pointers to an array of elements, which is not quite the same thing.
Arrays with multiple subscripts have many uses. Sparse array implementations, for example. People implement this now with code that looks like
tab(i,j) = 1;
This is valid C++, and with the right overloads, it compiles and runs, but it looks wierd.
Re:Please check for this: comma in brackets in C++ by chris+macura · 2005-12-05 06:29 · Score: 4, Informative

Yes, they are. But from an OOP standpoint, it's impossible to create a datastructure that "knows" you're using the [] operator twice. So if you overload the [] operator in an array structure, to get multi-dimensional arrays, you have to nest single dimensions arrays, which is almost always inefficient because the rows (or columns, depending on whether you're row major, or column major) are lying around the RAM (depending on where they were allocated) , rather than a continous chunk like with C. In other words, you can't do something like this in C++: class SmartArray { public: SmartArray(int height, int width); int operator(const int &x, const int &y) const; // ... }; ... SmartArray a(5, 5); a[12, 13];
Re:Please check for this: comma in brackets in C++ by milgr · 2005-12-05 06:38 · Score: 2, Informative

The grandparent got it correct. C does support multidimensional arrays. I suspect that C++ does too.
To validate, I pulled out my copy of K&R 2nd edition (Actually a copy I once rescued from a trash bin, and my copy is only "Based on Draft-Proposed ANSI C"). In section 5.9 Pointers vs. Multi-dimensional Arrays it points out,
Newcomers to C are sometimes confused about the difference between a two-dimensional array and an array of pointers, such as name in the example above. Given the definitions
int a[10][20]; int *b[10];
then a[3][4] and b[3][4] are both syntatctically legal references to a single int. But a is a true two-dimensional array: 200 int-sided locations have been set aside, and the conventional rectangular subscript calculation 20xrow+col is used to find the element a[row,col]. For b, however the definition only allocates 10 pointers and does not initialize them; initialization must be done explicitly, either statically or with code.

--
Where law ends, tyranny begins -- William Pitt
Re:Please check for this: comma in brackets in C++ by Anonymous Coward · 2005-12-05 06:42 · Score: 0

It once had some uses in C macros.

What are you talking about? The main use of the , operator was and still is in for-loops. No wonder they ignored you.
Re:Please check for this: comma in brackets in C++ by Anonymous Coward · 2005-12-05 06:44 · Score: 0

But there's a concern that somewhere, someone might have code that depends on the current semantics of the comma operator inside square brackets.

I'd say go ahead and break it. People shouldn't do that. It's confusing and a good example of operator overloading at its worst.
Re:Please check for this: comma in brackets in C++ by Jesus+2.0 · 2005-12-05 06:44 · Score: 1

The default "comma operator" ignores the first argument and returns the second.

I don't believe that that's true.

I believe that it executes the first, executes the second, and returns the result of having executed the second.
Re:Please check for this: comma in brackets in C++ by unsinged+int · 2005-12-05 06:50 · Score: 1

You're correct, but that's not what the original post is saying. The only way to provide a sparse-matrix class in C++ is with member functions. You can't do it by overloading [] to accept two arguments, e.g. array[2,4]. You have to use a member function, making it look like array.get(2,4), or perhaps overloading () for array(2,4). There's no way to write a matrix class that uses square brackets for indexing more than one dimension.
Re:Please check for this: comma in brackets in C++ by tehshen · 2005-12-05 06:58 · Score: 1

The default "comma operator" ignores the first argument and returns the second.

Not quite, it executes all arguments, and returns the final argument. i++, j++; increments both i and j, and returns j+1.

--
Guy asked me for a quarter for a cup of coffee. So I bit him.
Re:Please check for this: comma in brackets in C++ by Animats · 2005-12-05 07:05 · Score: 1

Those are built-in C arrays. That mechanism is a C special case, and applies only to built-in arrays. It doesn't generalize well to C++ objects. Try to use <vector> to handle multidimensional arrays and you'll see what I mean. You can declare
vector<vector<float> >
but you'll get an array of arrays, not a 2D array.
Re:Please check for this: comma in brackets in C++ by Articuno · 2005-12-05 07:07 · Score: 1

Not quite, j++ changes j to j+1 but returns the old value, not the new one (just being -Wall --pedantic :-)

--
So Long and Thanks for All the Fish!
Re:Please check for this: comma in brackets in C++ by torokun · 2005-12-05 07:09 · Score: 1

why can't you just overload the comma operator, build up the arguments into a single datatype, and pass it to the indexing function? This kind of stuff is done a lot for fast matrix multiplication and similar things.
Re:Please check for this: comma in brackets in C++ by The+boojum · 2005-12-05 07:19 · Score: 3, Interesting

I was just going to point this out. I even hacked up a simple example to show it:
struct location { int dimension, coordinates[ 20 ]; location( int first_coordinate ) : dimension( 1 ) { coordinates[ 0 ] = first_coordinate; } location &operator,( int const right ) { coordinates[ dimension++ ] = right; return *this; } }; struct array { int matrix[ 100 ][ 100 ]; int &operator[]( location const &right ) { return matrix[ right.coordinates[ 1 ] ][ right.coordinates[ 0 ] ]; } }; int main( int argc, char **argv ) { array blah; blah[ 5, 5 ] = 10; }
Proof of concept and it doesn't really do anything, but it compiles just fine. I don't see a problem here. A real implementation would probably do some clever stuff so that the optimizer can optimize away the intermediate data structure.
Re:Please check for this: comma in brackets in C++ by spongman · 2005-12-05 07:21 · Score: 1

No, it's an array of pointers to an array of elements
This is the default definition of operator [], but you could easily overload it to do something else.
for example, you could easily write a set of templates such that array[row][col]=3 would result in array.set(row,col,3) and still maintain type-safety and efficiency.
Re:Please check for this: comma in brackets in C++ by NoOneInParticular · 2005-12-05 07:26 · Score: 1

Bollocks:
class SmartArray { class SmartArrayProxy { SmartArray& data; int row; double operator[](int col); /* now we have row and col and can do our smart stuff */ }; SmartArrayProxy operator[](int idx); };
Re:Please check for this: comma in brackets in C++ by NoOneInParticular · 2005-12-05 07:31 · Score: 1

Not true, check my previous post. With that, you can simply use a[i][j] as usual, and have your sparse operations as if it was a single function.
Re:Please check for this: comma in brackets in C++ by hikerhat · 2005-12-05 07:40 · Score: 2, Funny

Well, the obscureness of the comma operator is used by C++ recruiters who thinks they are really "clever", and in "clever" C/C++ puzzles on usenet. If you took it away, how would you hire C++ programmers and how would you have fun on usenet?
Also, C++ programmers are getting really old, and they don't handle change very well.
Re:Please check for this: comma in brackets in C++ by Old+Wolf · 2005-12-05 08:09 · Score: 3, Insightful

You can do exactly that -- just write a(12,13) instead of a[12,13].
This is a great counterexample to the GP. Changing the meaning
of the comma within square brackets would gain NOTHING and would
mean every existing compiler is now wrong.

The existing C array type is bad enough as it is, why make it
even more unwieldy by introducing a new variant? C++ is already
on the right track: discourage C arrays, and encourage container
classes that have things like bounds checking and automatic
memory allocation.
Re:Please check for this: comma in brackets in C++ by demo · 2005-12-05 08:57 · Score: 1

Well, the point is to use the same indexing mechanism with a multidimensional (and/or sparse) array.

When dealing with vectors and other containers, "operator[]" provides a clear syntax for access.

Of course, it is mostly syntactic sugar, but it is _nice_ syntactic sugar.

--
---
Re:Please check for this: comma in brackets in C++ by tehshen · 2005-12-05 09:50 · Score: 1

I meant that. You know I meant that.

--
Guy asked me for a quarter for a cup of coffee. So I bit him.
Re:Please check for this: comma in brackets in C++ by chris+macura · 2005-12-05 10:07 · Score: 2, Insightful

That's the whole point of the complaint. Inconsistentcy between [] and ().
Re:Please check for this: comma in brackets in C++ by Geoffreyerffoeg · 2005-12-05 10:27 · Score: 1

const int &x, const int &y

This is wasteful. You're not avoiding copying, you're copying a pointer. On most systems an int and a pointer are the same size. On all systems the size difference is so small that you should start saving space by making it a short. Besides, directly passing by value gives you a local copy to mess with (for stuff like while (*a++)).

Anyway, back on topic. You've always been able to do multidimensional array syntax in C++ classes:

class SmartArray {
public:
int getElement(int row, int col);

class SmartArrayHelper {
private:
int row;
SmartArray* array;
public:
int operator[](int col) {
return array->getElement(row, col);
}
friend class SmartArray;
};

SmartArrayHelper operator[](int row) {
SmartArrayHelper h={row, this};
return h;
}
private: ...
};

This follows C multidimensional array semantics: if a is the array, a[x][y] is the element at (x, y), and a[x] is a valid object (whose type you can ignore, e.g., by templates) such that applying [y] to it yields the right element. a[x,y] still returns a[y], as it always did ("this is a feature, not a bug").

If you're picky about the commas, try this:

#define [ +brackets(
#define ] )
struct brackets {
int row, col;
brackets(int row, int col) {row=r; col=c;}
};

template <class T>
T operator+(T array[][], brackets b) {
return array[b.row][b.col];
}

I haven't tested the #defines, but they should work. (This turns a[x, y] into a+brackets(x, y).) Overload operator+ for all classes you like, and overload the brackets constructor if you want other than two dimensions (and keep track of the dimensions, and throw stuff if the dimensions are wrong).
Re:Please check for this: comma in brackets in C++ by Articuno · 2005-12-05 10:35 · Score: 1

Yeah, I knew (i think... :-)
But I couldn't resist to repeat the "pattern" of your post ^_^;;;

--
So Long and Thanks for All the Fish!
Re:Please check for this: comma in brackets in C++ by Anonymous Coward · 2005-12-05 11:46 · Score: 0

You could do something like this:

double **a= new double * [n];
a[0]=new double[m*n];
for(int i=1;in;i++)
a[i]=a[i-1]+m;

But you still have to write
a[i][j];
instead of
a[i,j];

And you have to free the memory at the end or in destructor:
delete [] a[0];
delete [] a;
Re:Please check for this: comma in brackets in C++ by Anonymous Coward · 2005-12-05 14:49 · Score: 0

I doubt you've looked very hard or argued with the C++ committee (unless you count trolling comp.std.c++) because here is an obvious counterexample:

http://www.boost.org/doc/html/lambda/le_in_details .html#lambda.lambda_expressions_for_control_struct ures
Re:Please check for this: comma in brackets in C++ by Anonymous Coward · 2005-12-05 14:57 · Score: 0

If "operator[]" had the same syntax as "operator()", we could have support for multidimensional arrays in C++.

While, as others have pointed out, there is a perfectly functional workaround for this (and of course the native multidimentional arrays), this would be very cool. While the workaround works, it's really ugly and annoying.

Find, in non-comment standard C++ code, any occurences of a comma operator within square brackets. Eliminate any where there are parentheses within the square brackets enclosing the comma.

Not quite enough:
#define F(X) X,3 #define G(X,Y) (X[Y]) G(a,F(b));
Of course, F() is a very evil macro.
Re:Please check for this: comma in brackets in C++ by ozzee · 2005-12-05 16:43 · Score: 1

You don't even need a proxy for this. class SmartArray { double * operator[](int idx); };
Re:Please check for this: comma in brackets in C++ by Anonymous Coward · 2005-12-05 17:27 · Score: 0

In C# that's referred to as a jagged array. I like the term.
Re:Please check for this: comma in brackets in C++ by NoOneInParticular · 2005-12-06 02:35 · Score: 1

Yours will work fine when you are dealing with a simple matrix stored in contiguous memory. The proxy is necessary if you want to cater for triangular matrices, diagonal matrices, and/or sparse matrices, and only want to store the (possibly) non-zero numbers.
Re:Please check for this: comma in brackets in C++ by Old+Wolf · 2005-12-06 10:40 · Score: 1

That's the whole point of the complaint. Inconsistentcy between [] and ().

Well, a comma is the comma operator everywhere, except when it's a separator in a list (specifically: argument-list, formal-parameter-list, initializer-list).

Perhaps it would have been better originally to use a different symbol for the comma operator from the list separator. But it's too late now, as both usages are very widespread.

Currently there's no such thing as an "array index list", because C++ does not have true multi-dimensional arrays. If we were to add the index-list syntax I think it would just add to the confusion (most casual programmers already have enough trouble with the interaction between arrays and pointers as it is, without adding complication).

Note: Although the OP was suggesting the array index list only in the context of UDT operator[] functions. But it follows from this that the syntax would also be usable with actual arrays, otherwise we would need context-sensitivity to parse x[a,b] and IMHO that would be a big mistake.
Re:Please check for this: comma in brackets in C++ by Old+Wolf · 2005-12-06 10:54 · Score: 1

Sorry, something I forgot to mention in my initial post:

But from an OOP standpoint, it's impossible to create a datastructure that "knows" you're using the [] operator twice. So if you overload the [] operator in an array structure, to get multi-dimensional arrays, you have to nest single dimensions arrays, which is almost always inefficient

This is crap. Here's a quick example (without bounds-checking, exception-safety, or const-correctless, for the sake of clearly illustrating my point):

#include <iostream>

struct SmartArray
{
SmartArray(int rows, int cols)
: rows(rows), cols(cols), mem( new int[rows * cols] ) {}
~SmartArray() { delete [] mem; }

int &operator()(int y, int x) { return mem[y * cols + x]; }
int *operator[](int y) { return &mem[y * cols]; }

private:
int rows, cols, *mem;
};

int main()
{
SmartArray sa(2, 4);
sa(1, 2) = 12;
std::cout << "sa[1,2] is " << sa[1][2] << std::endl;
}

You can even generalise this to any number of dimensions by
using a template with the number of dimensions as a parameter,
and a helper class instead of "int *" as the return type of
operator[] .

Yet another source code search engine? by Anonymous Coward · 2005-12-05 05:58 · Score: 1, Insightful

Source code search engines have been extremely helpful for me. I prefer www.koders.com, but there are quite a few other decent ones out there. What does this engine has to offer that the others don't? It seems like this one doesn't index code repositories but only indexes files local to the server. Neither does it allow you to click on words in the code and search for them. I also sorely miss bookmark friendly URL:s and free text queries. On the positive side, I note that your search engine is totally free from ads! Very nice! Although I wouldn't mind having to look at a few ads (which I might even click on) because running a search engine is expensive and a good source code search engine is a very useful service. I sincerly hope that we will see some upgrades of the site.

Re:Yet another source code search engine? by Sembiance · 2005-12-05 08:36 · Score: 2, Interesting

I just did it for fun, and hopefully some people might get some use out of it.

This engine understands the code at a C/C++ syntax level, unlike koders.com so you can better search for what your after (comments, functions, macros, classes, etc).

Also this engine DOES allow you to click on words in the code, but only includes and function or macro calls.

There are several things that are not that great about my site, it's a little slow, doesn't support free text searching nor variable searching, and you can't copy search URL's for pasting (uses XMLHttp and form POST's).

But it's just me doing this thing, and I have limited time and most importantly limited money/hardware.

My wish is for google to do their own but index a LOT more code and have it be fast and friendly :)

They certainly have the resources to do it and would be a great tool for coders to use. Maybe this will help fill a gap in the mean time :)

best_idea_ever by l33t-gu3lph1t3 · 2005-12-05 05:58 · Score: 3, Insightful

charge for a premium service that allows Computer Science and Software Engineering profs to perform a somewhat intelligent search of the code to see just how much of their students' code is lifted off the 'net ;)

--
------- "From bored to fanboy in 3.8 asian girls" ----------

Re:best_idea_ever by Philodoxx · 2005-12-07 10:12 · Score: 1

Why would you need to compare it against a database? There are already free programs that can do a pretty good job of comparing one submission against all the others. If a cheater in a large class copied code from the net, there's a good chance that somebody in the class copied it from the same resource.

Plus schools are cash strapped enough as it is, they don't need something else draining their coffers.

--
Oh, a lesson in history from Mr. I'm my own grandpa.

Statistical artificial program by yttrium · 2005-12-05 06:00 · Score: 1

Use statistics to construct what an "average" program looks like, and see what it does. :)

Function by ninthwave · 2005-12-05 06:00 · Score: 1

Compare functions looking for library routines that need to be created.
Look for common code structures that are not in libraries to create more libraries.

More libraries.

--
I was thinking of the immortal words of Socrates, who said: "I drank what?" - Chris Knight (Val Kilmer)- Real Genius

Search for this bug by ibpooks · 2005-12-05 06:00 · Score: 1

if( something = something ) ...

Re:Search for this bug by maxwell+demon · 2005-12-05 06:41 · Score: 0

Well, given that assigning something to itself is a no-op (unless it's a volatile or an user-defined type with overloaded assignment operator with non-standard semantics, and unless something is an uninitialized variable causing undefined behaviour on read), this is just a verbose way to write "if (something) ...", while "(if something == something)" would (under equivalent conditions) just be a complicated way to write "if (true) ..."

--
The Tao of math: The numbers you can count are not the real numbers.

Does it compile? by Anonymous Coward · 2005-12-05 06:00 · Score: 0

I wonder if the entire code-base complies... and if so, what comes out? Windows Vista, or some Linux/BSD merge?

See also: Codase.com by kriegsman · 2005-12-05 06:00 · Score: 2, Informative

See also Codase.com, another "Source Code Search Engine", which lets you search by method names, class names, variable names, free text, etc..

-Mark

interesting statistics by Ylleks · 2005-12-05 06:00 · Score: 0

"I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code."

That one's easy. Just tell us how many bugs are hidden in the code, and give us a code/bug ratio.

Koders.com by knipknap · 2005-12-05 06:01 · Score: 2, Informative

Don't know, koders.com supports a lot more languages and also lets you narrow your search to specific licenses. The few extra lines of code just don't seem too do it, especially because such measures highly depend on the chosen method.

grep++ by Doc+Ruby · 2005-12-05 06:03 · Score: 1

I'm surprised that Perl's CPAN archive doesn't have structured searching at smaller granularity than module name or freeform metadata. Maybe once the archives let us find code by content, we'll get version control databases that store each line in a record, each block as references in a separate table, maybe even referential integrity of variables as foreign keys. I'd love my editor to pull code from DB storage, padding whitespace only in the presentation layer per my preferences.

I'd really love to see datamining techniques for factoring, optimizing and profiling code. Not to mention the enforcement efficiencies for source license "due diligence" comparisons beyond grep. It's bizarre that programs are still so united with a hierarchical directory filesystem that scopes are enforced per-file, while class scopes have only lexical (not purely structural or referential) implementation. Relational math is rigorous enough that its direct combination with a compiler ought to produce even more revolutions than it would with an editor.

--

--
make install -not war

Markov Chains. by GuruBuckaroo · 2005-12-05 06:03 · Score: 1

Run the whole shebang through a Markov Chain analyzer, then have it generate some new code. Hell, ought to work as well as anything else put out these days...

--
Poor means hoping the toothache goes away.

Can it tell... by pulse2600 · 2005-12-05 06:05 · Score: 1

...How many libraries of Congress would all this code occupy?

Interesting statistics by thisisauniqueid · 2005-12-05 06:05 · Score: 1

You could calculate the percentage overlap between the 275 million lines of code and SCO's source code. For additional interest, you could plot that percentage as a function of time. You should see it go up right before every major new SCO filing.

Re:Are you proud of 275 million lines of code? by Anonymous Coward · 2005-12-05 06:06 · Score: 0

He said 275 million lines of searchable code... not the length of his search program. Maybe you should read the post instead...

How about a potential buffer overflow index? by raddan · 2005-12-05 06:07 · Score: 4, Informative

You can start by seeing how often people use gets(), strcpy(), strcat(), etc... Look for all the fun little common mistakes that people make.

Re:How about a potential buffer overflow index? by Anonymous Coward · 2005-12-05 07:27 · Score: 0

I use strcat all over the place. I know the length of the strings. No really, I do, it's a closed system.

stats we'd like to see... by digitaldc · 2005-12-05 06:08 · Score: 4, Funny

-# of non-numerical constants
-# of ( ),{ },\ /,#,; characters in code
-time spent debugging/compiling
-total hours spent in production
-gallons of coffee consumed
-hours of daylight seen
-# of relationships destroyed

--
He who knows best knows how little he knows. - Thomas Jefferson

Re:stats we'd like to see... by dascandy · 2005-12-05 09:08 · Score: 1

Non-numerical numerical constants:

0xCAFE
0xBABE
0xDEAD
0xC0DE
0xBEEF
0xFADE
0xDEAF
0xBEAD

Code Styles by ionrock · 2005-12-05 06:09 · Score: 5, Interesting

I would love to see if different code styles could be analyzed to see how many peopel use what sort of syntax style. There is camelCase and under_scores but it seems possible to find more complicated trends that might allow reviews to statistically determine what practices really help to make code better.

A reality check by Monte · 2005-12-05 06:09 · Score: 1

As an old Big Iron grognard asked me many years ago...

"What prints your paycheck?"

Recycling code by ZachPruckowski · 2005-12-05 06:09 · Score: 1

How much of this open-source code DB is reusable? Are most of the lines things that have limited applications, or are most of them more general? I mean, if you have 275 million lines, but 175 million lines are code designed to solve one specific problem and can't be easily cross-applied, then it isn't as useful as the statement implies.

That said, congrats on the milestone, and looking forward to hearing of more!

Need to watch those stats by Quiet_Desperation · 2005-12-05 06:09 · Score: 2, Funny

For example, "Lines of code" / "Lines of commenting" will always produce "Inf"

Evolution data server and courier imap by llamalicious · 2005-12-05 06:10 · Score: 1

I think the developers of eds are mad at courier

Search: function names containing shit
type: void
name: courier_imap_is_a_piece_of_shit
line: 17
file: evolution-data-server-1.2.3/camel/providers/imap4/ camel-imap4-summary.c

Re:Evolution data server and courier imap by kalislashdot · 2005-12-05 06:43 · Score: 1

I love Courier, what else is there? UW? The maildir format is pretty awesome.

Re:cout "why bother" by Vorondil28 · 2005-12-05 06:13 · Score: 1

You forgot: "5) Ask CowboyNeal".

:)

--
This sig rocks the casbah.

histogram of C reserved words by jab · 2005-12-05 06:14 · Score: 5, Interesting

I'd love to see how one of my programs (stats below) compares to the, uh, national average. 1222 if 638 return 482 static 413 for 399 int 217 const 201 else 194 void 128 char 115 case 112 break 55 default 43 sizeof 37 do 35 switch 27 enum 24 struct 23 while 15 float 14 typedef 10 auto 7 unsigned 6 extern 1 long

Re:histogram of C reserved words by maxwell+demon · 2005-12-05 06:47 · Score: 1

35 switch, but 55 default? Do you have switches with more than one default case, or did I miss another use of that keyword?

--
The Tao of math: The numbers you can count are not the real numbers.
Re:histogram of C reserved words by AnalystX · 2005-12-05 07:04 · Score: 1

The poster can answer this for certain, but I wonder if his histogram doesn't include reserved words found in comments. It seems to me that the word "default" would be found a lot in comments to describe the default value being assigned to a variable.
Re:histogram of C reserved words by plabtfall · 2005-12-05 07:25 · Score: 5, Funny

Yeah, me too: 2431 int 1802 goto
Re:histogram of C reserved words by Anonymous Coward · 2005-12-05 07:48 · Score: 0

It's odd that you have more 'do's than 'while's no? I would think that you would have at least as many 'while's as 'do's, and usually then some.

was it /.ed by duke12aw · 2005-12-05 06:17 · Score: 1

was the server /.ed or does he just need a new one?

--
As an american High School student, I'd like to officially apologize for my generation.

Re:was it /.ed by Sembiance · 2005-12-05 08:42 · Score: 1

Before the slashdot article I was getting about 4 or 5 visitors an hour.

Since it's been slashdotted my server is maxxed serving 125 requests at a time, according to server-status about 10 requests/second.

I imagine the traffic will die down in a few days, and settle into something more sane, hopefully something my poor little celeron server can handle :)

I have to ask by Anonymous Coward · 2005-12-05 06:18 · Score: 0

Why be an a**hole? The guy is wanting to offer a nice product, but can not afford the bandwidth hit of a /.. Now, you try to bypass his request (but you were wrong). I mean, why not instead, do a coral link or a google cache link like he asks? After all, he is providing useful code.

or "// FIXME" by StandardDeviant · 2005-12-05 06:20 · Score: 4, Funny

(subject says it all ;))

--

News for Geeks in Austin, TX

Statistics to search for by Anonymous Coward · 2005-12-05 06:20 · Score: 0

How about searching for functions that don't check their parameters for overflow?

Finally! by Locke2005 · 2005-12-05 06:21 · Score: 1

Now SCO will finally be able to find all the code that was stolen from them!

--
I've abandoned my search for truth; now I'm just looking for some useful delusions.

Comments by Daedala · 2005-12-05 06:24 · Score: 1

I'm dying to know... What percentage of the code is commentary?

And are there any haiku?

--
What I say does not represent the views of my employers, my friends, my cats, or myself.

Or similarities between different projects by Jamie+Lokier · 2005-12-05 06:24 · Score: 1

Including those with incompatible licenses.

Related: having found similar code sections, follow trends in them over time. Find where two programs copied the same code, but one has failed to implement what might be a bug fix or improvement in another, by looking at changes to the code over time.

"Nobody will see this" by Short+Circuit · 2005-12-05 06:26 · Score: 1

I'd like to see a comment search for "Nobody will see this".

Unfortunately, the site's running so slow, I guess nobody will.

--
tasks(723) drafts(105) languages(484) examples(29106)

The stat I want to see: by Samedi1971 · 2005-12-05 06:27 · Score: 1

How many GOTOs are in there?

I actually came across one recently. It was a real surprise considering the rest of the code was decently written. And it was pretty simple to remove.

Re:The stat I want to see: by pclminion · 2005-12-06 06:15 · Score: 1

I actually came across one recently. It was a real surprise considering the rest of the code was decently written.
Perhaps this implies that the use of goto in that instance was justified. If somebody is skilled and produced quality code, maybe your FIRST choice should not be to second-guess them. Let me guess, you had to introduce a redundant test or extra variable to remove the goto? Is the resulting code REALLY any clearer than it previously was?

Is there a good bubble sort in there somewhere? by jbx · 2005-12-05 06:29 · Score: 1

... because every time I profile my code, it seems I end up spending a lot of time in my bubble sorts. In all that code, surely someone took the time to write a really fast bubble sort, right?

--
(sig) The last bug isn't fixed until the last user is dead. (/sig)

the known answer by __aaitqo8496 · 2005-12-05 06:29 · Score: 1

select count(*) from sourcecode where comments > 0
0 row(s) returned

plagerism at its finest

mod -1 lame

Caching Mechanism by hgfischer · 2005-12-05 06:30 · Score: 1

Implement a caching mechanism for the more used search results. This cache would be invalidated when you add more code.

Show a few lines in the search results after and before the searched text.

Is the search case-sensitive or not? Maybe just adding a option like this on the search results can be helpful (eg.: include names on Windose platform are insensitive).

Implement statistics for:
- most/less used function name
- most used word in comments
- most used dirty word

Count the lines of "dense" code. I mean do not include empty lines and lines that only have comments or a just a opening/closing brace.

don't complain about getting ./ed... by Anonymous Coward · 2005-12-05 06:31 · Score: 1, Insightful

... like this guy's site is right now, when YOU submitted the story about your site! If you're not prepared for slashdot traffic, don't submit the story.

Don't mess around, learn from NLP folks by Xofer+D · 2005-12-05 06:37 · Score: 4, Insightful

This is a good opportunity to build complex statistics about the C++ grammar actually used in context. Learn from the NLP people! Parse the whole thing, and start finding common subtrees in the grammar used. Look at common lexical entries between subtrees, so we can make a tool that can help recognize errors by comparing against commonly used C++ grammar fragments. Or do function completion based on what kind of function you look like you're writing. See if you can do alignment with similar languages and do statistical source translation. If you keep information about comments used (and maybe apply some real NLP), you might even have a shot at automatically classifying functions based on their form, and documenting them with simple comments.

If that's too hard, try finding all n-grams instead, at least under some length. That's a lot more useful than just individual tokens or strings.

With a lot of data, you can do very cool things. Don't mess around with string frequency counting. C++ is simple compared to English, do something interesting.

--
The Signal/Noise ratio can be improved in two ways. Remaining silent is the OTHER way.

Re:Don't mess around, learn from NLP folks by Anonymous Coward · 2005-12-05 15:23 · Score: 0

Yeah, and then sometime next year, the following will happen:
#include <stdio.h> int main (int argc,
Suddenly a dialog pops up. It's the paperclip.

@: It looks like you're writing "Hello, world" program in the C language. Would you like to finish the program and compile it for you?
Me: Sure!
@: Tada. Hello, world!
Me: Wow. That rocks. Can you do the halting problem too?
@: Are you trying to make me go away? o_O
Me: Ooops. You caught me...
Re:Don't mess around, learn from NLP folks by justins · 2005-12-06 02:06 · Score: 1

If that's too hard, try finding all n-grams [wikipedia.org] instead

Tom Cruise can help!

--
Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga

This rocks by Borg_5x8 · 2005-12-05 06:39 · Score: 1

Less time wasted on google searching down the examples I need to check.. someone is my new hero.

Find the most common :) by brys · 2005-12-05 06:40 · Score: 1

Find most common functions so they can be moved to the kernel ;))))

That's easy: search for known security holes by CodeShark · 2005-12-05 06:40 · Score: 1

that permit things like buffer overflows, etc.

Though I don't develop much in C++ currently, and haven't had the time to do anything Linux wise in years, I would love to have an identified location for security-bug free algorithms, etc. that I could use if I need to do more C++ work in the future.

--
...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...

GPL by ramrom · 2005-12-05 06:43 · Score: 1

The Code copied from here should be GPL ed. Damn the viral nature

There is boost? by Cyberax · 2005-12-05 06:45 · Score: 1

This index doesn't even contain Boost (http://www.boost.org/) and Loki libraries!

It can't be called 'comprehensive' after that...

Re:There is boost? by vawjr · 2005-12-05 08:51 · Score: 1

more likely it's an indication of how few people actually know about boost (a recent topic of conversation on the boost EMail echos).

How about by Anonymous Coward · 2005-12-05 06:46 · Score: 0

number of lines that contain both "should" and "probably"

penis by ToddFFW · 2005-12-05 06:47 · Score: 0

only one reference to penis is 275 million lines of code??? whew thank god they didn't index my code.

Percentage... by Anonymous Coward · 2005-12-05 06:48 · Score: 0

... of lines claimed by SCO.

...barely the lobby. by C10H14N2 · 2005-12-05 06:49 · Score: 1

Consider, a page is 45 lines, an average book is 350 pages @ about 2" thick (ergo, about 15-16k books), a stack is roughly 12' wide by 6 shelves, double sided (864 books) and a row is about six stacks long (72' / 5,184 books). So, in a compactus, about 432sq/ft, to the 2,100,000sq/ft of the Madison building alone. The total linear capacity is 540 miles. Using the above assumptions, that's about 205 million books, so if printed, this repository would take up roughly 1/13,000th of the space. Imagine if your house is 2500sq/ft, the equivalent displacement would be a five-inch square. Would you even notice?

Gawd, I'm bored.

Cyclomatic complexity... by xquark · 2005-12-05 06:51 · Score: 1

would be a nice feature to have, both average and per project/module basis.

--
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.

TODOs by mrshoe · 2005-12-05 06:52 · Score: 2, Interesting

Counting the number of "TODO"s and "XXX"s in "production" open source code could be interesting.

--
There are two types of people in this world: those that categorize other people and those that don't.

Statistical mistakes? by LukeWink · 2005-12-05 06:53 · Score: 1

Maybe you could see how many times certain C faux pas exist. Things like the use of gets(), fflush(stdin), void main(), etc.

Oooo great ideas by Sembiance · 2005-12-05 06:54 · Score: 1

Thanks for all the comments, there are some great statistic ideas here.

It will take a while to generate all the stats, potentially months.
Once they are finished I'll post the results somewhere.

In the mean time I hope some people find the site useful.
I didn't do it to make money or anything, I just want to help out other coders :)

Isn't Lucene written in Java? by samuel4242 · 2005-12-05 06:54 · Score: 1

Can't the C/C++ folks write a decent indexing package? Or does Java really rule?

Oh You Open Source Guys Are In For It Now! by Petersko · 2005-12-05 06:58 · Score: 1

I'm gonna search for comments containing combinations of the words "Stolen" "From" and "SCO", and blackmail you all to keep the results quiet.

Works with Firefox by everphilski · 2005-12-05 07:01 · Score: 1

guess Firefox isn't standard compliant then (there goes my karma from the grandparent post :) )

-everphilski-

Predictive input statistics by Dracolytch · 2005-12-05 07:02 · Score: 1

I would like to see some character frequency/pattern information so that we might be able to create baselines for predictive input tools while programming. This would be a good step forward for things such as programming by voice, or other alternative input tools such as dasher.

~D

--
This sig has been enciphered with a one-time pad. It could say almost anything.

How about messiest code base? by oncebitten · 2005-12-05 07:12 · Score: 1

just search for:
#if 0
XXX
FIXME

A tool for SCO by joshaidan · 2005-12-05 07:15 · Score: 1

A useful tool could be a list all open source programs that contain the string: for(i = 0; i *; i++). Then we will know which programs violate SCO patents. Brian.

Amazing by Donkey5555 · 2005-12-05 07:17 · Score: 1

Wow, thats amazing! You must have upwards of 3 or 4 functions in that thing.

Generate yet more code by DCFC · 2005-12-05 07:23 · Score: 1

What you want is Markov chains. For any given statement you can easily construct the set of statmenets that follow it, together with their number. We will observe that given for There are a lot of branches to the tree, with (i =0; being very common and // Woo woo being quite rare. We can easily generate more code by picking statements at random, but in proportion to their observed frequency. But we can do better. Given pairs of statements, we can generate a table where given the pair of statements for( i=0; we know the probability of i != This because we're dealing with pairs from a large set, the branching will be quite low, ie it will be much more constrained, and thus a lot more like the original code. This is analagous to being able to say in English, given "Space, the" we mostly get "final" as the most probable next symbol, and now we have "the final", the next word wmay be|"frontier" but given that "Space" is not being thought about any more, we might have "cut". Text generated like this is quite like English, and if you include punctuation, can generate things that are as grammatically correct as many people's posting. It can be quite funny. I have used it to merge Terry Pratchett and Microsoft marketing blurb, and certainly the MS stuff improves... Same applies to C++. It will take a little experimentation to find out the best length for the chain, but assuming that the input code is syntactically correct, so the output will often compile. Quite what it will do is another question, but this sort of chain is rather like how people learn. If you listen to babies, they start off making random noises, then they make sounds that have very roughly the same frequency of ocurrence as their parents language, then they burble in things that sound like English, but clearly are not. Their "singing" at 18 month is usually quite free of real words, but they've heard "twinkle twinlke little star" so often that the chain of "twinnle twinnle lipple sarr" is carved into their neural net. Given that the spec here is for "useful" I propose this as an AI test.

--
Dominic Connor,Quant Headhunter

Google desktop as backend by kalenj · 2005-12-05 07:40 · Score: 1

Might be sweet to hook your source code into a google desktop index and then deliver the google desktop search results through your web interface. Leverage the power of their search speed and reliability. --Kalen

--
directory of tech articles

That GeSHi guy is a bit of a dick. by Anonymous Coward · 2005-12-05 07:44 · Score: 0

I can understand wanting to avoid a slashdotting, but he's a dick about it.

Read this.
http://exorsus.net/slashdot.txt

URLs on this server linked by slashdot.org will be refused. Permission is
given to slashdot to mirror content as necessary for the purpose of
providing its users access to the information on the site. Slashdot should
not attempt to bypass the referer block. Use of the google cache page for
the site is acceptable as long as the page(s) concerned have no more than 1
image.

This policy is employed for the sole purpose of avoiding a huge bandwidth
bill that I would have to pay out of my own pocket. Anyone who would like
this restriction to go away is more than welcome to send me bucketloads of
cash.

Fuck you buddy. Protect your own interests as you need to, but don't be a cocksmith about it.

Re:That GeSHi guy is a bit of a dick. by Anonymous Coward · 2005-12-05 08:45 · Score: 0

THe message was not set up by the "GeSHI guy". It was set up by the server administrator.

To bad the site is down by Anonymous Coward · 2005-12-05 07:46 · Score: 0

That Apache web server is just wonderful!

HACK, TODO, BUG & FIXME by Pete+Brubaker · 2005-12-05 07:52 · Score: 2, Interesting

I recently did a search on some of our codebase here at work to see how many times the above keywords remained in shipping code. I was a little surprised to see how many cases there were in our code. I think sometimes, maybe even most of the time we as programmers over use these words.

Pete

--
What's a sig? Pete Brubaker

how about... by Connie_Lingus · 2005-12-05 08:08 · Score: 1

...how many comments contain the letters JSB??

I can still remember my 1000 level C course where an example of "poor commentting" was presented where the only comment was /* RIP jsb */

--
never bring a twinkie to a food fight.

10 auto by Khashishi · 2005-12-05 08:18 · Score: 1

jab must be one of the 133t coders who knows what this qualifier does

AFAIK, it means it isn't static, so it should be cleaned off the stack when the scope ends.

Which Dirty Seven would that be? by Stavr0 · 2005-12-05 08:23 · Score: 1

fscanf()
gets()
scanf()
sprintf()
strcat()
strcpy()
vsprintf()

... oh... the other dirty seven.

Stats by Twon · 2005-12-05 08:29 · Score: 1

Number of gotos per project would probably be amusing.

Proposed workaround doesn't work by Animats · 2005-12-05 08:31 · Score: 3, Informative

Yes, that compiles and runs, but it doesn't do what you think it does. Put in some debug print to see what's actually happening, which is this:

"5,5" is evaluated using the built-in definition of ",", returning "5". The no-conversion built-in operator comma has higher priority than the conversion sequence involving a conversion to "location", then the use of the overloaded comma operator. So the built-in comma operator is used. See the discussion in the C++ ARM, section 13.2, "Argument matching": which says "consider an exact match better than any conversion".
"5" is converted to type "location" by the constructor for "location", resulting in a "location" object with "dimension=1" and "coordinates[0]=5".
This "location" object is passed to "operator[]", which then accesses "coordinates[1]", an uninitialized value, which it then uses as a subscript, returning a reference to a arbitrary memory location. So, instead of returning "&blah.matrix[5][5]", it returns "&blah.matrix[???][5]". The example program seems to run in VC++ only because that part of memory happens to be 0 at startup, so this returns "&blah.matrix[0][5]". In other circumstances, it might cause a crash.
"10" is stored into the wrong location of "blah",or outside it, due to the bad reference generated above.. This is where the buffer overflow occurs.

You can force the conversion with

blah[ location(5), 5] = 10;

but that's not useful except to see what's happening.

You can't overload the built-in operators for built-in types. So overloading, outside of an object, "operator,(int, int)" won't work either.

Hence the need for a straightforward solution.

Re:Proposed workaround doesn't work by torokun · 2005-12-06 01:46 · Score: 1

This is where C++ gets fun. ;)

Interesting comment. I think, though, that in real life you're not calling with static built-in types like '5'. You're calling with results of other operations or variables, so it doesn't really have to be a problem...

You just disallow the types that don't work. Maybe you have to construct some objects that seem like a lot of overhead sometimes, but you can end up with a natural syntax...

Project similarites by dagar · 2005-12-05 08:42 · Score: 1

It would be interested to see what projects share same/similar chunks of code. This could be used to move similar code into libraries where things are redundant.

Dead Dead Deadsky by Uosdwis · 2005-12-05 08:56 · Score: 1

What are the conditional compiles? Are they being used or abused?

#if GCC_1_8 or #if DEBUG vs #if 0
#FOREVER for(;;) vs #INC (x) x++

How much #define hell is there in codebases (ever see a VxWorks BSP?)
- hardware specific defines in C verses base classes and inheritence in C++
- #def that can be retouched at run time or used as constant without using const
- how many #ifdefs #undefs with other included code #define again burried.
- symbol replacement

Or how about casting! how much was done implicitly or explicitly? And could it have been avoided.

Re:useful statistic: parent: -1 troll by Baricom · 2005-12-05 09:12 · Score: 3, Funny

That "woosh" sound you hear is the wink emoticon zooming over your head, joke in tow.

I know PHP is a great web language and that it probably isn't the cause of the slowdown. Heck, even Yahoo! uses it these days.

I was attempting (unsuccessfully, it seems) to make fun of the purists who insist that robust web applications must run on something compiled in order to reach acceptable performance under high load.

Dangerous function calls? by generic · 2005-12-05 09:17 · Score: 1

like strcpy, vprintf, strcat, scanf would be interesting. It would be a basic buffer overflow fuzzer.

--
Microsoft aggravates my tourettes syndrome.

and auto? by YesIAmAScript · 2005-12-05 09:30 · Score: 1

I know it's a legimate keyword. There's also absolutely no reason to use it. It's the default in the only place where it is even valid.

It'd be like typing "unsigned long int" instead of "unsigned long".

I call BS on this. No reasonable code uses "auto" nearly as many times as typedef.

--
http://lkml.org/lkml/2005/8/20/95

Switches with fall through versus ever case sepera by Anonymous Coward · 2005-12-05 09:39 · Score: 0

I, and I'm sure not the only one, tend to use switch statements specifically for the fall through, and a series of if-else-ifs when I don't need fall-through. I'd like to know how many times a switch statement contains fall through, versus how many times each case has a seperate break. The ratio of case statements to break statements would be interesting as well.

AJAX? by RavenChild · 2005-12-05 09:51 · Score: 0

Perhaps the Slashdotting of this site might be a good reason for the administrator to create an AJAX interface. That way the server would not have to process so many pages and focus on queries.

Re:AJAX? by Sembiance · 2005-12-05 11:29 · Score: 1

The site is AJAX :)

It also uses XSLT to transform the received XML into the search results.

275 million lines of C+ -- bah! by gru3hunt3r · 2005-12-05 09:53 · Score: 1

275 million lines of C .. so what is that roughly translated to?
probably about 27,500,000,000 lines of quick basic?
or roughly 10 lines of Perl.

How about - by Anonymous Coward · 2005-12-05 09:56 · Score: 0

The number of comments that include the string(s) "magic", "magically", "Then a miracle occurs...", "[You|We] keep using that word; I do not think it means what you think it means." or "This is not the code you are looking for, move along." - ?

libtool by Wikipedia · 2005-12-05 10:04 · Score: 0

You might be interested in this:
http://www.advogato.org/article/85.html
which links to the open-source metrics:
http://orbiten.org/ofss/01.html
which is dead but is still on the archive:
http://orbiten.org/ofss/01.html">The link doesn't work!@!#@!@@!

Here is the first table Table 1: Top 10 authors ranked by contribution of code Author % of total free software foundation, inc 11.231 sun microsystems, inc 1.848 the regents of the university of california 1.359 gordon matzigkeit 1.216 paul houle 1.042 thomas g. lane 0.782 the massachusetts institute of technology 0.762 ulrich drepper 0.559 lyle johnson 0.528 peter miller 0.525

Table 1: Top 10 authors ranked by contribution of code Author % of total free software foundation, inc 11.231 sun microsystems, inc 1.848 the regents of the university of california 1.359 gordon matzigkeit 1.216 paul houle 1.042 thomas g. lane 0.782 the massachusetts institute of technology0.762 ulrich drepper 0.559 lyle johnson 0.528 peter miller 0.525 more...

--
P2P Anonymous Distributed Web Search: http://www.yacy.net/

Re:histogram of C reserved words - well, B .... by ignavus · 2005-12-05 10:36 · Score: 2, Informative

auto is a throwback to B days (the language immediately before C). B had no data types (no int, float, double, etc) but did have storage types: auto, static, and extrn.

auto was necessary in B for local variables, as a plain variable name by itself was a valid expression statement (as it is in C), not a declaration (IIRC).

1. foo() { auto bar; ... }
2. foo() { static bar; ... }
3. foo() { extrn bar; ... }
4. foo() { bar; ... }

All mean something different in B: the first three instances of bar are declarations, the fourth is an expression statement (and if I remember my B correctly, it is invalid as the first statement of foo(), because bar hasn't been declared one of auto, static, or extrn yet in this function).

In C, auto is completely redundant. Except, perhaps, in comments.

Ah, B. The days when programmers were programmers and data was data, and you could perform any operation you liked on any variable. Want to divide a pointer to a string by 3? Go ahead. Self-disciplined programmers don't need training wheels. Just a choice between auto, static and extrn.

--
I am anarch of all I survey.

Group functions by usage by jerometremblay · 2005-12-05 11:10 · Score: 1

What I would like is something like

"people who used functionX also used functionY".

It would bring to light what libraries are often used together.

Users of Library 1 might learn of the existence of Library 2.
If you notice a lot of clustering between different libraries, it might be a good idea to make a new library that combines the needed functionalities.
etc.

Stats using bayesian filters like spam filters would be incredibly cool. Especially if the source code is somewhat parsed for what it means and not only used textually.

Statistics for change in licenses.. by carlmenezes · 2005-12-05 11:16 · Score: 1

how many times have the licenses changed?

also, maybe stuff like code overlap between projects?

--
Find a job you like and you will never work a day in your life.

GOTOs by srchestnut · 2005-12-05 11:24 · Score: 1

1) Number of GOTO statements 2) Number of comments that match (nearly) exactly the code they explain ex: string name; //name 3) Phone Numbers

correction - thanks by jab · 2005-12-05 11:33 · Score: 1

Thanks for all the sanity checks! The very simple lex program I whipped up to extract reserved words was way too simple. First, a variable like "automobile" was causing a false positive for "auto". Second, I had only run on .c files, and not .h files which explains the lack of typedefs. Finally, while I remembered to strip the comments, I'd forgotten to take care of quotations thus getting false positives from things like char *foo = "By default run on auto pilot"; Fixing all this gives much saner results. 950 if 626 return 482 static 331 for 272 const 269 void 213 else 132 char 113 case 112 break 89 typedef 82 extern 44 int 43 sizeof 41 enum 39 struct 35 switch 31 default 23 while 11 unsigned 5 float 3 signed 2 short 1 long 1 double 1 do

Wow... How's this for a comment by Anonymous Coward · 2005-12-05 11:40 · Score: 0

// we like profanity in open source software, please read the following words carefully: // fuck, pussy, dick, sperm, motherfucker // // hope our source code will now be censored by all governments that suck.

AAGHH! by Civil_Disobedient · 2005-12-05 13:37 · Score: 1

# "if (cond) {" vs. "if (cond)\n{"

I'm all for coding readability, but placing a function's open bracket on a new line is so fucking irritating and unnecessary. /pet peeve

Re:AAGHH! by GlassHeart · 2005-12-05 16:21 · Score: 1

Funny how geeks are adamant that nobody should tell them how to dress (or, in extreme cases, how to smell), yet be very irritated by an equally insignificant stylistic choice. Where the "{" is placed, in almost every usual case, is irrelevant to the readability of the code.
I would suggest reserving your indignation at something that actually makes code less readable. Having pet peeves of this sort threatens to win the battle but lose the war.
Re:AAGHH! by skink1100 · 2005-12-05 16:30 · Score: 1

Wrong, sir, wrong!

Putting the opening { on a newline makes it much easier to visually match code blocks, especially when nested a few layers deep. What's wrong with a little extra effort for readability? True, function is more important than form, but I get really sick of trying to make sense of people's crappy code.

S
Re:AAGHH! by Drawkcab · 2005-12-06 07:57 · Score: 1

What does putting the open bracket on a new line do for matching code blocks that indenting doesn't? Personally I find the code more readable if people don't waste lines on open brackets, because that just makes it that much more likely that you'll have to scroll down to see the end bracket.
Re:AAGHH! by cyclomedia · 2005-12-06 23:55 · Score: 1

i'm the opposite, i just cant get my head around the practise of missing out the \n... I find it easier to match code blocks not just by indentation but also by matching the pairs of { and }, so if one is missing i find it hard to read.

--
If you don't risk failure you don't risk success.

Codase is much better with 250M of C/C++/JAVA code by Anonymous Coward · 2005-12-05 15:35 · Score: 1, Informative

http://www.codase.com/ a new search engine, seems to have better user interface and performance. It also has a smart query search system to deal with complex queries,
quoted from their website:
"For the first time, to find relevant code, developers can simply type into a search box about the same code as they do in their daily development work. The Codase smart query system processes the input and then builds an internal query to feed into the search engine. Through this free style format, complex combinations of multiple search terms can be easily entered. For example, to find any main method that contains variable t and function calls of thread.start() and println, this query can be used: main() { var t; thread.start(); println; }",

http://www.codase.com/search/smart?join=main()+%7B var+t%3B+thread.start()%3B+println%3B+%7D&scope=jo in%2Fjoin&lang=*&project=

Mining Software Repositories by MrClean · 2005-12-05 16:18 · Score: 1

Great Work!!!

FYI - there is a whole community of researchers that are interested in studying such large software repositories

http://msr.uwaterloo.ca/

(International Workshop on Mining Software Repositories)

May be you can write something and submit it over there or at least advertise your data set to that community.

Automatic program generation by mal0rd · 2005-12-05 18:40 · Score: 1

This would become really useful if it could save people ever having to write anything programmed before. Say I want to factor a number. So I write a couple test cases:

8 -> [2,2,2]
9 -> [3,3]
12 -> [4,3]

and then I can use your database to find all the functions that will succesfully pass my test. If it got good it could even combine functions until it got something working. This would be the holy grail of IDE's - write unit tests, click search, polish up - done. The tricky part here is how to efficently search all those functions.

The first step should be to allow searching by argument types, side effects and global access, and returns values. Then add the test cases.

Hopefully someday.

-- Devin Bayer

Re:useful statistic: parent: -1 troll by Anonymous Coward · 2005-12-05 20:06 · Score: 0

Yeah. Unless he uses PHP as the database engine, it has nothing to do with it.

Probably Lucene's fault.

compression by Anonymous Coward · 2005-12-05 20:45 · Score: 0

What 500,000 line library would be the most beneficial to the corpus? Beneficial means the ability to reduce the complexity/size of the resulting corpus. You can think of this as a compression problem.

for example, if:

int a[MAX]
sum = 0;
for (i=0; iMAX; ++i)
sum += a[i]

occurred in enough of the source, we could reduce this to:

int a[MAX];
sum = r0 (a, MAX);

thus simplifying the code. As a follow on project, write a cron job to submit patches to all the authors of the code in question to use said library, and upload the library to sourceforge.

Re:useful statistic: parent: -1 troll by lachlan76 · 2005-12-05 21:07 · Score: 1

While I admit that PHP isn't the source of the slowdown, I'd hardly consider it the ideal web language. Too many problems with scoping, function naming, etc., etc.

Or perhaps it's just my tastes. I personally prefer not to have to worry about using a variable which is in another code block.

Does SCO know about this ? by Anonymous Coward · 2005-12-06 05:09 · Score: 0

How many lines did IBM illegally contribute to Linux ;-)

Nice but what about licensing ? by Anonymous Coward · 2005-12-06 09:16 · Score: 0

I don't want to be an asshole but what about putting the license of the displayed code ?
IMHO the actual display looks like this is public domain snippets

Take a Look at CCCC by SwashbucklingCowboy · 2005-12-10 11:31 · Score: 1

The C and C++ Code Counter, http://cccc.sourceforge.net/, has some interesting statistics that could be generated on a per project basis. Perhaps you could encorporate the stats from that project.

Slashdot Mirror

Searchable C/C++ DB surpasses 275 million lines

328 comments