Using Redundancies to Find Errors
gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."
NO flame intended i just have no clue as to the point of this . What exactly do these flaws cause and etc.. Maybe I'm just not l33t enough but i could sure use an explanation from someone more learned than me .
PDF usually crashes my computer (crappy adobe software). So here's a convenient text link!
: www.stanford.edu/~engler/p401-xie.ps+&hl=en&ie=UTF -8
http://216.239.37.100/search?q=cache:yuZKW8CjTqIC
More details
Appeared in FSE 2002. Finds funny bugs by looking for redundant operations (dead code, unused assignments, etc.). From empirical measurements, code with such redundant errors is 50-100% more likely to have hard errors. Also describes how to check for redundancies to find holes in specifications.
Link to PostScript file for easy viewing/printing
File
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
Modded down within 10 seconds of posting.
:(
Slashdot is a harsh mistress.
~D:
This really old Slashdot logo still in use over on Team Slashdot's page on distributed.net.
"...dead code (code that is never reached)"
Perhaps it's just shy!
html version is here.
There aint no pancake so thin it doesn't have two sides.
Dont even bother using the google cache of the pdf.
Its completely unreadable.
If you dont believe me, look for yourself:
here
Given enough lints, all bugs are shallow!
MSDOS: 20+ years without remote hole in the default install
A file (module/unit/whatever) is still a fairly high granularity to make decisions upon. I'd be more interested at finding bugs within specific functions rather than just files...
/* affect != effect */ void affect(int *thing,int effect) { *thing += effect; }
To me 'redundant' implies duplication of something already there. (a=1; a=1;)
a=a; and dead code aren't so much redundant as they are superfluous. It's still a sign of possible errors, for sure.
The redundancy checker would have a field day in code written by Porky the Pig. ...th..th..tha..that's all folks...
pardon me, but DUH???
Some drink at the fountain of knowledge. Others just gargle.
Unfortunately, this paper doesn't really offer any practical advice. Is is probably a little useful to very good, or great programmers. However, for new or moderately good programmers, it probably won't be very useful. It is certainly interesting in the academic sense, but I always want to see more practical advice. (I suppose that good practical advice flows down from good theoretical advice.)
What are some of the best ways to learn to avoid problems? I know that experience is useful. Trial and error is good, mentoring is good, education is good. What else can you think of? What books are useful?
Also, I wonder about usability problems. In other words, this article mainly hits on the problems of "hidden" code, not the interface. I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it. (Part of the answer is usability testing and gathering useful requirements, of course.) What do you think about this? How can we attack errors of omission and commission in interfaces?
How to Download YouTube Videos
Writing repetitive code only once offers the same benefits as using Cascading Style Sheets for your webpages. If there is a serious error, you only have to track it down in the one place where it exists versus every single place you re-wrote the code. Also, it makes adding features much simpler as well. I'm an old school procedural programmer that is making the rocky transition to OOP programming. THIS is where it starts coming together...
C. Griffin
"Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
They also found that:
Russian errors cause code
Incorrect code causes errors
Missing code causes errors
Untested code causes errors
Redundant codec causes redundancies
Driver code causes headaches
C code causes buffer overflows
Java code causes exceptions
Perl code causes illiteracy
Solaris code causes rashes
Novell code causes panic attacks
Slashdot code causes multiple reposts
Slashdot articles cause poor-quality posts
Microsoft code causes exploits
Apple code causes user cults
Uncommented code causes code rage
RIAA code causes computers to stop functioning
(Poor idea causes long, desperate post)
This sig intentionally left bla... dammit!
Who's got the whiteout?
...bad programming causes problems? Golly! What will those crazy scientists come up with next?
In Soviet Rush, today's Tom Sawyer gets high on you.
while(1)
findError();
while(1)
findError();
while(1)
findError();
while(1)
findError();
Your comment violated the "postercomment" compression filter. Try less whitespace and/or less repetition. Comment aborted.
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
It really is. It's a redundant holdover from ye old BSD versions. Granted, there are one or two times i've used it when -Wall -pedantic -Werror -Wfor-fuck's-sake-find-my-bug-already doesn't work, but a lot of the time it comes up with a LOT of complaints that are really unnecessary. Am i really going to have to step through tens of thousands of lines of code castind the return of every void function to (void)? Come on.
I got a sig so you would remember me.
Additional support was provided by DARPA under contract MDA904-98-C-A933
Must be for that new lean, mean killing machine they've been asking for.
I seriously hope no one paid them for this.
Are they telling me that if I write useless code intentionally I'm increasing my chance of errors as I increase my code?
Or, maybe they're saying that if I write useless code by mistake, I'm being careless which invites more errors?
Brilliant insight... I wish I had thought of all that before I turned my clear, concise 10 line application into a mangled mess of 100000 lines (hey look... I just rewrote the Windoze kernel!)
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Redundent code means the coder wasn't thinking. Hence more bugs.
The cake is a pie
Errors find YOU! find YOU!
Three letters: NIH.
Now, if you'll excuse me, I've got to get back to my text editor project.
Isn't this the job of that smart dude down the hall who runs Lunix computers and reads some Slash Period website or something?
Well, at least that's how I finish all my projects.
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
".. so run lint .."
I understand that there are non-C compilers out there that actually detect code errors without running a separate utility. Amazing!
We have MS Windows conained inside of the kernel?
A good editor could easily cut that article in half without loss of any information.
x += 0;
x += 0;
x += 0;
x += 0;
x += 0;
x += 0;
It actually caused a bug 'cuz they accidentally left the '+' off one of the lines. What an idiot.
While lint(1) "proper" doesn't exist for Linux (because copyright issues), you all could try lclint.
It's available at larch.lcs.mit.edu:/pub/Larch/lclint.
"If all you're doing is compiling programs that have lint(1) targets in the Makefile, you can probably just comment out (or remove) the lint targets and actions from the Makefile. If the program has already been ported to Linux, they won't produce anything of interest to you unless you are a developer. I hope this helps."
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
Ummm... surely if any story should be duped in the near future, it's this one. Please submit story suggestions accordingly.
deus does not exist but if he does
Maybe PDF has too much redundant code.
Some code can be labeled as dead code but that
doesn't necessarily make the code non-useful
if the parameters and software requirements change.
I've placed code (don't flame me too hard) within
my own coded creations (with necessary documentation)
for future considerations that my clientele may
consider at a later date.
It's good to see an outside group looking over
various coding issues in Linux. It gives me
the feeling that most of the community provides
some scrutiny over its development which by
and large increases support of Linux.
Now that Microsoft has opened up its code (within limitations),
I would like to see (as I am sure some of you are)
someone check it over.
Just curious.
I saw Dawson's talk at FSE (Foundations of Software Engineering). He uses static flow analysis to find problems in the code (like an advanced form of pclint). The most interesting part of his tool is in the ranking of the problem reports. He has developed a couple of heuristics that sort the problems by order of importance and they supposedly do a very good job. Static analysis tools find most of their problems in rarely run code, such as error handlers. Such problems are problematic and sometimes lead to non-deterministic problems, which are extremely hard to find with standard testing and debugging. (This is especially true, when the program under consideration is a kernel.) Dawson also verifies configurations of the kernel that no one would compile, because he tries to get as many possible drivers at the same time as he can. The more code, the better the consistency checks do at finding problems.
By making assumptions about the program and checking the consistency of the program, his tool finds lots of problems. For instance, assume there is a function named foo that takes a pointer argument. His tool will notice how many of the callers of foo treat the parameter as freed versus how many treat the parameter as unfreed. The bigger the ratio, the more likely the 'bad' callers are to represent a bug. It doesn't really matter which view is correct. If the programmer is treating the parameter inconsistently, it is very likely a bug.
He also mentioned that counter to his expectations, the most useful part of his tool was to find 'local' bugs. By local, I mean bugs that are local to a single procedure. They are both easier for the tool to find, more likely to actually be bugs, and much easier for the programmer to verify if they are in fact bugs.
He analyzed a couple of the 2.2.x and 2.4.x versions of the kernel and found hundreds of bugs. Some of them were fixed promptly. Others were fixed slowly. Some were fixed by removing the code (almost always a device driver) from the kernel. Others he couldn't find anyone that cared about the bug enough to fix it. He was surprised at the amount of abandonware in the Linux kernel.
It is extremely frustrating that Dawson won't release his tool to other researchers (or even better to the open source community at large). Without letting other people run his tool (or even better modify it), his research ultimately does little good other than finding bugs in linux device drivers. *heavy sigh* Oh well, eventually someone WILL reimplement this stuff and release it to the world.
On a snide comment, if he was a company he would no doubt have been bought by Microsoft already. Intrinsa was doing some interesting stuff with static analysis and now after they were bought a couple of years ago, their tool is only available inside of Microsoft. *sigh*
As any good kernel volk know, concurrent programming can contain certain blocks of code that might seem redundant, even meaningless to the average procedural programmer. Hope they're not getting confused.
I suppose they've hit my pet peeve. I've seen many simple problems turned into hideous monstrosities with many bugs by people trying to handle bugs that can't ever happen and imaginary special cases because they were never taught how to abstract a function. Perhaps it can't be taught. In 20+ years of programming, its been a very rare time when I've picked up code and not been able to cut out large chunks without replacing them.
the article is in a pdf. now that redundancy.
are they just too lazy to "export to html" and put it up as a webpage?
and no, i don't want to load the adobe viewer. 30 megs of ram for a viewer program? there's probably 80% redundant code loaded into memory in that program alone
Why read the article when I can just make up a snap judgement?
http://www.pdos.lcs.mit.edu/~engler/jr-calif.html
Interestingly enough, 163% zoom doesn't cause the problem, nor does 165%. After a bit of experimenting i couldn't find any other isolated case that had the same results. There's a sudden transition to illegibility at 131%, but everything below that is also illegible. 164% is just odd, strange that that happened to be picked as the default when i opened it.
This Space Intentionally Left Blank
Is that why there are so many double postings of articles on Slashdot? Trying to use redundancies to find errors?
Hopefully, we can expect much more of such valuable breakthroughs from the academic community in the future, complete with papers full of badly formatted C code!
In the great CONS chain of life, you can either be the CAR or be in the CDR.
Go Dawson Engler! I almost took an advanced OS course from him this quarter. But I must say, if you know where to look (his MIT website *hint* *hint*) you can find pictures of him all oiled up for a bodybuilding competition. The man is ripped.
...but seriously tho, I've always found that it's best if you go out of your way to make sure that code is duplicated as little as possible. Sometimes it takes some major refactoring to move a method when you discover that it's needed someplace else, but it's almost always worth it in the time saved testing, debugging and keeping the methods in sync.
... about all those seemingly harmless redundant articles, then?
vi could do it!
I know exactly what you mean. I've spent the last few hours helping customers and selling Verizon phones, only hoping that another post would show up to cure my boredom. Slashdot should queue the posts with a schedule so we'll know exactly when to check. But they never will because of the FPs. heh
Beep. Boop. Beep. You have questions. I have answers and your home address.
For the record, it's been moved...
Larch FTP Site
January 28, 1999
Many files formerly on this site were moved elsewhere after a disk
crash in March, 1998.
The LCLint distribution can be found at
ftp://ftp.sds.lcs.mit.edu/pub/lclint
or http://www.sds.lcs.mit.edu/lclint
"Great men are not always wise: neither do the aged understand judgement." Job 32:9
Servers are dotted! Maybe they should have used redundant servers. Not all redudancy is bad.
Using Linux for academic research is hardly a new idea. In my group alone one of the profs has been publishing papers and giving talks about research using Linux since 2000.
d f - about the evolution of Linux
An example of such is http://plg.uwaterloo.ca/~migod/papers/evolution.p
Second, use standard idioms. For some, that may mean learning the standard idioms. These should become second nature. Programmers should express their creativity in the logic, structure, and simplicity of the code, not the non standard grammar. Standard forms allow more accurate coding and easier maintenance.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
a floor cleaner... and a desert topping!
...What's been happening to Slashdot's servers for the past 3 hours? Did Kuro5hin take them down?
The ______ Agenda
The Stanford Checker is great. I was blown away when I read their papers last year. Their checker is not released yet, so I wrote a similar checker (smatch.sf.net) based on their publications.
The poster mentions Lint, but I did not have any success using Lint on the kernel sources. The source is too unusual.
Also Lint does not look for application specific bugs. For example, in the Linux kernel you are not supposed to call copy_to_user() with spinlocks held. It took me about 10 minutes to modify one my existing scripts to check for that in the Linux kernel last Monday. It found 16 errors. (It should have found more but I was being lazy.)
A lot of the time, you can't tell what will be a common error until you try looking for it. One funny thing was that there were many places where people had code after a return statement. On the other hand, I didn't find even one place where '=' and '==' were backwards.
It's fascinating stuff playing around with this stuff. I have been learning a lot about the kernel through playing around with Smatch.
They have regularly provided helpful posts to linux-kernel listing huge number of bugs, see their most recent message listing potential buffer overruns.
This would be an extremely valuable tool for any Software project, proably even more useful than e.g. valgrind.
DNA code also has high redundancy, which allows error-correcting transcription and other hacks ( see Parity Code And DNA or DNA's Error Detecting Code)
In both cases factors yielding robust DNA code are found to indicate bad digital computer code.
flip
(background: Ars Technica's Computational DNA primer
That's nice, but have you actually caught Osama yet? Dead -or- alive? How long's it been?
Perhaps you'd have better luck if more of the Americans in Afghanistan were actually looking for Osama and not building an oil pipeline.. or was that the real objective here all along?
455fe10422ca29c4933f95052b792ab2
So many people have made silly comments about this being obvious, useless or whatever. This is probably because they did not actually READ the paper.
The paper is not about obvious code redundancy bugs, it is about subtle errors which are not as simple as just duplicate code. It is about code that *appears* to be executed but actually is not.
Go take a look at the examples and see how long it takes you to notice the different errors...now imagine have a thousand pages of code to peruse..would you catch it? Many of them probably not.
The conclusion of the paper is basically, errors cluster around errors; finding a trivial unoptimal syntactical constructions tends to point to real bugs.
Where there's smoke, there's fire.
- Every programs contains bugs.
- Every program contains redundancies and so can be made smaller without changing behavior.
Therefore the empty program is redundant but still buggy.CQual
It's been used to find security holes.
They made fools out of themselves with this one:
/* make this _really_ smp-safe */
if (!cam || !cam->ops)
return -ENODEV;
if (down_interruptible(&cam->busy_lock))
return -EINTR;
if (!cam || !cam->ops)
return -ENODEV;
Their comment: 'We believe this could be indication of a novice programmer...blabla...shows poor grasp of the code'.
BZZZZZZZZZT
Nice try kids, but unlike you, this piece of code was probably written by an experienced guy that has actually written code for parallel systems before. Since it's tricky, you would be excused if not for the 'novice programmer' comment above and the fact that the code itself says it's there for SMP safety.
Here's a hint: UNTIL you acquire the lock on 'cam', any other process can change the value, including at the point BETWEEN the first check and the acquisation of the lock.
--
GCP
shouldn't that be
if(strcpy(buf, "hello")==0){
return (OUT_OF_MEMORY);
}
Couldn't you also search for seeminly copy and pasted code blocks as a sign of errors (or code that at least needs refactoring)
thank God the internet isn't a human right.
as they are posting dupes to ensure quality of stories
Errr...
Is this dead code going to get removed?
No.
Why not?
Because, one, it's only an opinion that it's dead code. There could be some obscure case that no one imagined that could use it. Two, if some programmer removed it and it turned out that it was needed or the programmer screwed up the removal, the programmer would be blamed and take a lot of grief for it. If it ain't broke, don't fix it.
Now, it could be that the dead code doesn't work properly for the obscure case. But how could you tell? Do you want to write a test case for code that no one can figure out how it gets invoked?
If you look at a CVS repository and identify those files that have high revision numbers, there's a good chance they are full of errors and need to be rewritten.
One visualization is to color code according to it's age - old code blue, and new code red - then look at the results. You will often see that the red code clusters, and there are huge regions of blue that have been stable for years. You will also see relatively small clusters of differening shades of red, as people need to keep banging on the same problematic code.
I don't understand what he's trying to say.
for (std::vector<mytype>::size_type i = 0; i < myvar.size(); ++i) { ... }
And hoping that mytype is indeed the type of myvar.
Or creating a whole new class just to use for_each
When in the olden days it was
for (int i = 0; ...
Or could we cheat a little and try
for (size_t i = 0; ...
One of my "favourite" bugs is a simple accessor:
public Integer theVal;
public void setTheVal(Integer theval) {
this.theVal = theVal;
}
Which, of course, is legal java but does nothing when the accessor is called, and can be difficult to find that the parameter is incorrect case.
Eclipse warns you if you have just made this mistake. Man, I love that IDE!
And I think there is a c++ plug in for it.
Stupid people have obviously never worked with big, complicated software before. Take something like gcc or gdb. You have a bunch of functions that you implement to support a new target and give them to gdb so it can call them. These have a defined API so you have to implement them with the same arguments. If you don't use one of the arguments, you get a compiler warning. Hence a=a. The compiler will optimize all that redundancy out anyway.
In Soviet Russia, hot grits put YOU down THEIR pants.
No wonder I use FreeBSD. They only have one pointy hat to pass around, not one for every developer.
Try this: pmd
Like more aphorisms, you can argue this, but my point is this - every line of code in a program is a potential bug. Every line of code requires a bit more grey matter to process, making your code just that much more difficult to understand, debug, and maintain.
So I ruthlessly remove dead code. Often, I'll see big blocks like this:
#ifdef old_way_that_doesnt_work_well
blah;
blah;
blah;
#endif
And I will summarily remove them. "But they were there for archival purposes - to show what was going on" some will say. Bullshit! If you want to say what didn't work, describe it in a comment. As for preserving the code itself - that is what CVS is for!
By stripping the code down to the minimum number of lines, it compiles faster, it checks out of and in to CVS faster, and it is easier to understand and maintain.
I will often see the following in C++ code:
void foo_bar(int unused1, int unused2)
{
unused1 = unused1;
unused2 = unused2;
}
And I will recode it thus:
void foo_bar(int , int )
{
}
That silences the "unused variable" warning, and makes it DAMN clear in the prototype that the function will never use those parameters. (True, you cannot do this in C.)
Code should be a lean, mean state machine - no excess fat. (NOTE - this does NOT me remove error checking, #assert's, good debugging code, or exception handlers).
www.eFax.com are spammers
You could use GhostScript.
And if the authors had used pdfTeX, their PDF would actually be readle on-screen.
So that explains it: Duplicate articles are the editor's way of debugging Slashdot. Cool!
-- @rjamestaylor on Ello
The best way to avoid problems is peer reviews and well written unit test plans. Point blank.
...
If you have good reviewers (someone that know what the code is supposed to do and could write it themselves) code reviews will pick out atleast 50% of the bugs. With a well written unit test (test all possible paths and all possible functionality) you will find all your functionality bugs (hopefully)
These are two things that you can implement easily and effectivly. It also provides an early frame work for some software process
...using this thing here:
http://pmd.sourceforge.net/cpd.html
CPD uses a variant on Greedy String Tiling to find duplicated code in Java programs. There's also a JavaSpaces version since finding duplicated code is fairly parallelizable....
Yours,
Tom
The Army reading list
Thanks for the link!
This isn't a revelation for me. For some time it's been clear to me that sloppy code is a sign of sloppy thinking generally. I see code that is formatted oddly, indented irregularly, and I know I'm in for trouble. Throw in unused variables, or redundancies like the article talks about, and for sure that program is going have nasty bugs.
Using Redundancies to Find Errors
/. !!!
So this is what's up with redundant posts on
They're looking for errors!
After analysing the data from this experiment, the error is clearly apparent; Timothy.
Thank you, thank you, you've been a great audience. I'm here all week!
- I am made of meat.
So it's ok to return from the first check without the lock, and ok to return from the second check with the lock?
Methinks it is YOU who needs a hint, preferably delivered with a clue by four.
Infuriate left and right
Now, if you'll excuse me, I've got to get back to my text editor project.
There are valid reasons for rewriting code because it's "not invented here". One is for homework, where you are expected to write all the code in the program by yourself with no outside help. Another is that no code exists under an appropriate free software license. Another is that the popular text editors do not support features of your constructed language such as text-direction or multi-color glyphs.
Will I retire or break 10K?
"The minor errors were operations that seemed to
follow a nonsensical but consistent coding pattern.."
Reality is what we taste, smell, see, hear and touch yet we cannot comprehend it...only approximate it.
These researchers obviously have a good hold on compiler technology, since they implemented their checkers with xgcc. They also seem to understand logic quite well, since their code uses and extends on gcc's control-flow analysis algorithms. And they do, actually, understand what's going on here.
As for your particular example, the check really is redundant, but it was almost definitely intentional. It's true that another processor could change the cam variable between the first check and the lock -- but taking the first check out would have no impact on the functionality or correctness of the code. It's just a performance enhancement so that the routine can exit early in the error case, without the overhead of locking the lock. Removing the bit of redundant code would just add a little overhead to the error case.
In short, their checker found a true redundancy. They may have not realized its purpose since they don't have specific experience with this kind of parallel programming, but it's a redundancy. If you had actually read the paper instead of merely glancing over it, you would have seen that their checker respects the volatile nature of variables declared as such -- the checker is fully aware that a second thread can change the value between one operation and the other -- and it still figures out that the check is redundant.
Here's a hint: don't go around claiming people are fools unless you've got some evidence. These guys had hundreds and hundreds of bugs to go through, and expecting them to perfectly analyze every last one of them is unfair.
Oh, and -10 points for using "BZZZZZZT".
I suspect a lot of those redundant code instances were caused by people recycling code via cut-and-paste - the quick and dirty, programming-by-pattern-matching approach we all take when we just want to get something done without taking the time to deeply understand someone else's code.
The current word for making this stuff go away is "refactoring" - noticing redundant patterns in code and abstracting them out. The problem is that refactoring is hard in the most popular languages (C, C++, Java) because of their requirements for picky, static type declarations on all the built-in types. Type declarations are necessary for low-level bit mashing and the last order of magnitude improvement in performance, but most of the time they make it hard to refactor code by pushing implementation details too far down into it.
Common Lisp got this one right a long time ago. Objects in CL are strongly typed, and variables are untyped by default, but you can add declarations which the compiler can then use to speed up code. The CL developer's mantra is "make it work, profile it, then make it fast." That way, you only do the declaration work on the measured hot spots in the code.
(BTW, to all you Smalltalk/Dylan/other-dynamically-typed langauge fans out there - peace, brothers and sisters. Your stuff is good too; my beef is with strong static typing that cripples abstraction.)
To a Lisp hacker, XML is S-expressions in drag.
Will you get off your fucking high horse? What the hell is it with all this "BZZZZZZZZZT" and "Nice try kids" bullshit? Grow a fucking brain you asshole.
I'll spell it out for you: If what you said was true, and cam can be set to NULL after the check but before the lock is called, then cam->busy_lock *WOULD CRASH*.
Therefore, the second check, after the lock, *is* redundant and *is* overly cautious, and you *are* an arrogant asshole who should think a little bit more before bashing legitimate research in a very childish way, hopping for that karma bonus from an unearned +5 moderation. Fuck you.
And if you would have even read more closely, you would have noticed that the "novice programmer" comment you got so hot-headed about *WASN'T EVEN SPECIFICALLY DIRECTED AT THAT PIECE OF CODE*. Here it is again: "This includes cases where the programmer checks the same condition multiple times within very short program distances. We believe this could be an indication of a novice programmer..." The actual code example is in the *next paragraph* as an *example* of checking the same condition multiple times. They *NEVER* said that that programmer was a novice. You, however, *are* an asshole as I said above.
The end.
Would you question other stories a journalist writes, when you find one that has typeos and gramatical mistakes in it?
[Insert obligatory jab at slashdot here]
I'd suggest you don't use Slashdot as your only news source, or you will suffer permanent brain damage.
Hmm, are you sure? If so, Java must have different scoping rules than C++ ( I don't know any Java), because the following C++ program prints "5 6", since the formal parameter name makes the class variable inaccessible.It's still rather confusing and should be changed, of course, but the meaning is clear if you understand C++'s scope rules.
The ocean parts and the meteors come down
Laid out in amber, baby.
these lint-like tools have always been an irritant. stuff like flagging a non void-casted strcpy or printf call which has been mentioned below, etc. and eventually a stellar C programmer can create code that can be compiled but not actually understood by the compiler itself.
it's been my experience that the best code is written by a fully engaged programmer who actually reads the code as they create it. this sounds blindingly obvious but i know how easy it is to go on autopilot when coding what seems like a trivial (or even complex) piece of code and that's probably the source of 95% of bugs (outside of architectural defects of course).
my own case in point. in the last 3 years i've bootstrapped out of nothingness a series of libraries totaling around 7 megs of source that runs in 6 operating systems. a very complex data transformation product (commercial) has been built using those libraries, the first of many products i hope (sorry i'm not partial to open source). in that time maybe 3 - THREE - bugs have been found in probably 3/4 million lines of code. This is supposed to be alpha code but it's more stable (and fast) than most v5 code bases. The secret? read the damn source code. put your finger on the monitor and read every damn line of it and don't glaze over when you do. Then do it again. i think i've done that 4 times for each module, and they seemingly always come out sterilized. if you truly want bug free code the *original* programmer has to pay the price in time. and simply testing just doesn't cut it.
this is all IMHO of course and your mileage may vary.
Take another look at the example bug, which is a problem in Java and C++ - you overlooked the buggy variable name change in the assignment("theVal" instead of "theval"), which means the method argument is ignored. You can introduce the bug in your code thusly:
void setA (int A) { this->a = a; }
though of course it's more insidious when the case shift is hidden in an intercap. Hmm. A single character variable is better. Let's not go there.
This is information that's of use to a team lead trying to decide what modules should receive some attention. Ideally before QA, or worse the customer, finds bugs related to that module.
It's a bit of a truism that sloppy code is buggy code. This paper is showing some solid evidence of that. It's not surprising. If there are simple mistakes in the code that could have been fixed with just a small amount of attention, that indicates that the probability of deeper errors is also greater. It's a Bayesian filter for code, instead of spam.
One of my favorite metrics for finding buggy modules is rather simple. Number of commits on the module. In most of the projects I've worked on, I've found high correlation with number of commits and number of outstanding bugs related to a module. It indicates that the whole module needs to be reworked, rather than the small, incremental, patches that have been applied to it so far. Or that the requirements have drifted from what the code was originally spec'd against, and the code is no longer a good fit.
I've had some programmers argue that it's OK to leave dead code in a system, or unused variables, since the compiler will take care of them. The problem is that the compiler is not the most important audience for the code. The code needs to speak to other programmers. And if it's harder to understand than it needs to be, then the code doesn't do its job.
My bad. This code could never be optimized the way I suggested.
If cam==0, cam->ops is meaningless. In user space it would result in a SIGSEGV. I don't know what would happen in the kernel (oops, panic?)
The way C works the second test is done only when the first gives 'true'. That allows writing the two tests in one 'if' statement.
Personally, I tend to use either no name, or a descriptive name in function declarations, depending on how obvious the purposes of the arguments are. Then I use single-letter names in function definitions.
It's much harder to get a clash then, since the names in the declaration don't matter at all, and the names used in the code are hard-to-confuse single letters. But then, I've mostly been writing math code lately, where each function doesn't have many arguments, and single letters make sense anyway.
The ocean parts and the meteors come down
Laid out in amber, baby.
This piece of paper shows why coding is not be considered as an easy job!!! and why experience coders should be paid much more...
It also shows why C is a difficult programming language, no matter what others say.
It would also be quite interesting to apply this paper to MS Windows and then compare it to Linux. Then, the power of the open source will be revealed.
Maybe the authors of the paper apply it on MFC...
On about page 3 of the article the author is discussing idempotent errors (like assigning a variable to itself). The author points out macros as being a particularly rich source of "false positives" because the expansion and be reduced to an idempotent statement.
As an example he refers to the following line:
"x = ntofs(x)"
which on some machines will reduce down to x=x. The author states that such things were ignored.
It would seem to me that this is still an error, and that the way to resolve it would be to redesign the macro to be something like:
ntofs(x,x)
or
ntofs(x)
that would expand to nothing on machines where x is already in network order. At the very least, this would improve performance on those machines (provided the compiler doesn't make a similar check and remove such lines on its own). Regardless, I would think that would be a better way to resolve the "bug" than to just ignore it.
Just my USD $0.02
As a bonus the OR form does look more like the assembly output as well as being more readable, imo.
Kudos to you! You must be seriously bored though. Perhaps a hobby (or a job?) is in order. :)
Lasers Controlled Games!
I took a class he was teaching last spring. The girls in the class were all drooling over him.
Java actually make some dead code a compilation error (The Java Language Specification 14.20). The author claim redundant assignment signal most bug. Interestly I think some of them can be classified as dead code as well. Fig.3 is obviously contain dead code. For Fig.2, if the example were a more stright forward link list traversal instead of deleting, it would become a dead code case. E.g. for (entry = xxx; entry != NULL entry = entry->next) { /* next = entry->next the redundant assigment is not needed for simple traversal */ ...
return 0;
}
The entry=entry->next would be a dead code. No redundant assignment in this case.
Wai Yip Tung
Am I missing something?
Did these guys just find out that people who write redundant and unreachable code also write generally crappy code, which is likely to comtain logic bugs?
Isn't this just plain obvious? Isn't it like saying that people who don't brush their hair, probably don't brush their teeth either, and don't wash their hands after using the bathroom?
Where do I sign up for some of this research grant money?
The REAL jabber has the user id: 13196
What you do today will cost you a day of your life
Java detect some dead code and treat it as compiler error (Java Language Specification 14.20 - Unreachable Statements).
In the paper, some cases classified as redundant assignment are also dead code cases. Fig 3 is a dead code case. Fig 2 is would be a dead code case instead if it is a more straight forward link list traversal case. E.g.
the expression entry = entry->next would become dead code.
Wai Yip Tung
When you find more than one or two of these redundancies (or other errors) per screen of code, the way to fix these and other problems is to throw the bit-rotten code away and rewrite it from scratch (or from specifications).
The grammar and logic redundancy checking system is designed to check for errors caused by overly ambitious or inexperienced paper writers. Our relatively uncontroversial hypothesis is that confused or incompetent writers in English tend to make mistakes. Worse yet, they repeat themselves. We experimentally test this hypothesis by taking a large sample of words written by two programmers and subjecting them to redundancy analysis. In our tests, each page of the academic paper was found to be 45% to 100% more likely to suffer from mistakes in logic and grammar than papers written by graduates in the Humanities. This difference holds across different types of programmers.
With the exception of a few stylized cases, programmers are generally attempting to perform useful work. /* BUG - This
statement always evaluates to true. (Or does it?) */ (p.1) /* BUG -- missing ELSE case */ (p.1)
If they perform an action, it was because they believed it served some purpose.
Both statements say the same thing.
This difference holds across different types of redundancies. (p.2)
Of course differences have the distinct quality of being different. This is the sign of an overly cautious academic writer, possibly attempting to make his prose seem non-volatile.
Redundancy correlates with confused programmers who will probably make a series of mistakes. (p.7).
It strongly suggests that redundancies often signal confused programmers. (p.7)
Redundancies seemed to flag confused or poor programmers. (p.9)
Should we conclude that redundancies, confusion, and programmers are highly correlated?
Assuming programmers do not do redundant permission checks... (p.9).
Wasn't it the thesis of this paper that programmers make redundancy errors, and those errors are significant?
In To appear in IEEE Symposium ... (p.10)
Redundancy: in to appear in ?
2) switch conditions with impossible case's (p.5)
The plural of case is cases. True in all cases.
It is hard to believe that this code was ever tested. (p.4)
Or the paper proof-read. Try eliminating that. It's superfluous.
Xie and Engler cite themselves three times in their endnotes.
One final thought: did Xie and Engler run their redundancy checking code on their own redundancy checking code?
Was browsing through the source code of init today and stumbled across this:
...
<code src="/usr/src/linux/init/main.c">
unsigned long loops_per_jiffy = (1<<12);
#define LPS_PREC 8
void __init calibrate_delay(void)
{
loops_per_jiffy = (1<<12);
</code>
Any reason for the duplication?
Jw