The 25-Year-Old BSD Bug
sproketboy writes with news that a developer named Marc Balmer has recently fixed a bug in a bit of BSD code which is roughly 25 years old. In addition to the OSnews summary, you can read Balmer's comments and a technical description of the bug.
"This code will not work as expected when seeking to the second entry of a block where the first has been deleted: seekdir() calls readdir() which happily skips the first entry (it has inode set to zero), and advance to the second entry. When the user now calls readdir() to read the directory entry to which he just seekdir()ed, he does not get the second entry but the third. Much to my surprise I not only found this problem in all other BSDs or BSD derived systems like Mac OS X, but also in very old BSD versions. I first checked 4.4BSD Lite 2, and Otto confirmed it is also in 4.2BSD. The bug has been around for roughly 25 years or more."
Isn't the first entry usually '.'? How often would that be deleted?
...of the superiority of Microsoft.
but they had more important things to do. At least until Balmer started throwing chairs.
After all, someone closed a 25-year bug... how many hidden bugs will remain that way in os/2 warp? windows 95? other proprietary systems?
And sometimes, people who collaborate on a given open-source project are from different parts of the world and not subjected to corporate rules. So when you get creative people with different mindsets and no reason to pull back their punches when they see something wrong, you have quality.
Will we be hearing alot of "How did that get there?" as versions of BSD get patched up?
That is, wouldn't somebody in the past quarter century have exploited this bug to hide a malicious executable file or two?
Not sayin', just wonderin'...
Height: 38U, Weight: 0 Newtons, Eyes: #0000FF, OS: Gray Matter 1.0 (Alpha)
Isn't it? The bug WAS found, wasn't it?
How many "eyes" were watching BSD systems use Samba for a DOS filesystem? Seems to me, someone saw behavior and exactly because it was open source, looked into it, found the coding error and filed a bug report. It will be fixed, because everyone now knows about this, and that too is a side effect of open source, even if it's related to the politics.
From the sounds of it, this was a bug that was not triggered very often. When it was finally triggered, investigated and fixed the person who found it released the info publicly, thanks to the beauty of Open Source, and everyone affected, commercial entities and FOSS users using the code alike, benefited. If this were a proprietary system that were licensed out to various companies stricken by NDAs etc. it's quite likely that if one company discovered the bug the others would never learn about it.
This is the power of Open Source!
With all those eyes looking at the code, stuff like this gets ID'd and fixed LICKITY SPLIT!
(runs and hides)
If you want news from today, you have to come back tomorrow.
This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir. This is quite uncommon behaviour, and was incredibly uncommon 25 years ago when filesystems were much smaller and directories almost never contained enough files to require more than one or two disk blocks to store the directory.
When the Samba people found it, they decided to just code a work-around and not bother to report it to any of the BSD teams. If they had done, it would probably have been fixed in 22 years.
Now that it has been fixed in OpenBSD, the change can easily be taken and incorporated into FreeBSD, NetBSD, DragonFlyBSD and Darwin.
I am TheRaven on Soylent News
Except that the bug had been triggered many times before, seeing as how Samba had code in place to work around it.
Sometimes you need some heavier unit tests just to pick up little bugs like these. Just because everyone assumes code is correct and a few mission critical systems have been working fine on it, doesn't mean some corner case won't blow the world away tomorrow.
It is official. Netcraft now confirms: *BSD is dying Get over it and move on with your lives.....
This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir.
But that's exactly my point, isn't it? The bug was only "visible" through its behavior, not its manifestation in code. The shallow bugs argument basically says that if enough people stare at the code, they will find the bugs. Clearly that did not happen here.
Whether the bug fix can propagate rapidly has nothing to do with what I'm talking about. I'm not trying to disparage the concept of open source, I'm arguing that the shallow-bugs argument should be rejected.
Bug tracking software missed this because it's bug #1. lol.
The most telling thing in TFA for me was that the bug had been identified by the Samba team and a workaround implemented for Samba.
Surely both the samba communities and the *BSD communities are active enough that this could have been passed on for further investigation by the *BSD crowd? (Sure, samba probably would still need a workaround, particularly given the long uptimes and widespread deployment of *BSDs)
I know nothing of the devs at Samba and *BSD, but seems a bit strange. Perhaps they did try..
Meanwhile, congrats to Marc on fixing a bug. One of the most touted benefits of open source (whatever your license) code.
--Q
If no one has cared enough to fix it for 25 years, I'm guessing this should be rated as "inconsequential."
Must have been a really slow news day at OSNews.
Invenio via vel creo
One would think that after all these years, Windows source would have leaked, Microsoft being what they are. Everyone knows that your most secret secrets are the first things to leak... And yes, I've seen the "secret MS code" joke.
If you want news from today, you have to come back tomorrow.
This is like saying global warming either does exist because today was the hottest on record, or does not exist because today was the coldest on record. Why are these analogous? Because in both situations, you're only considering one data point, which does not even begin to indicate a trend.
If you define BSD as a collection of bugs, this story proves that BSD is dying.
--
make install -not war
seekdir and readdir are unreliable when you're modifying the directory being read. The API requirement is that readdir return each directory entry that existed at the time the directory was opened exactly once. In the traditional UNIX implementation, the directory is simply a sequential stream of bytes; this is pretty easy -- you can simply lseek to the position that telldir returned. However, other systems don't use something that simple -- Mac OS X, for example, uses a B-Tree for the catalog file. Worse, they use a single catalog file for the entire filesystem's catalog, meaning that any modifications (adding, removing, or renaming files anywhere) causes the layout to change.
And telldir only returns a long, which -- in most implementations -- is smaller than off_t, so a simplistic implementation can have some problems. Of course, having a directory that's larger than 4GBytes could result in some other problems 8-).
You see what you're looking for, most of the time. This sounds like a subtle bug that you're not going to find until you go looking for it; it's hard to invoke under normal usage patterns. Nobody stared at that code looking for this problem until now. But if it was closed source, the guy who fixed it wouldn't have been able to look at it and find the problem.
A quick googling of "many eyes make all bugs shallow" brings me the more complete statement that adage is simplified from: "Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone." (Linus via ESR). Clearly this 25-year-old bug is one of the exceptions that calls for the 'almost'.
egypt urnash minimal art.
Erm. That's not what "many eyes make bugs shallow" means.
Well. Just reading the source is part of it, but not all.
Fact is, if I run into odd behaviour when testing/using - if the source is available I can read it, I can breakpoint.
I cannot do that with a binary.
So yes. Things did occur as they were supposed to. Someone found something odd, they were able to look at code in question, and fix it.
The shallowness is the fact that there is a direct connection between the thousands of testers/users and the code in question.
Instant turnaround. No "user reports behaviour in detailed fashion, including testcase, to some corporate e-mail address, and maybe it eventually gets a to a developer three layers down who may be able to figure it out and fix it if he has the time"
-- perl -e'print pack"H*","6e656d6f406d38792e6f7267"'
After all it had survived for 25 years. Are we sure that it really is dead and did not just scurry away when the light was shined on it?
Undetectable Steganography? Yep, there's an app fo
I wonder how it still remained in MacOSX :>
Patents Drive Free Software as Hurricanes Drive Construction Industry
Can Marc look at that file dialog in GNOME? That's been around for a while as well ...
After this long would this not be considered a feature? :)
My Web Site - www.ocean-liners.com
Considering how old this bug is, and how much work-around code probably exists as a result, I wonder how many new bugs this bug fix will create.
I don't think it should be rejected, but taken as part of the set of advantages OSS has. After all that one argument of many, which sometimes applies others doesn't. There's bugs you have in the code itself and others in the way the code is supposed to work, the first you can catch by looking at the code, the second is more difficult if you don't know what the code intends to do, or don't have lots of experience with the language and paradigm. The argument that many eyes make bugs in the code shallow isn't a rule that could apply to bugs in the program itself.
What really intrigues me is that this had been discovered, years earlier, by the Samba folk.
Did the Samba folk not tell the BSD folk of the issue?
It is not coincidental that I happened to find this page on stumbleupon hours before it appeared on the slashdot front page. Is there such a grave deficiency of technology news that slashdot has to rely on social bookmarking sites to generate news-worthy material??
"Many-eyes-shallow-bugs" == absolute statement "Many-eyes-yet-bug-wasn't-found" == clear contradiction of absolute statement.
3 thoughts on this:
1. I think this bug would be classified "archeological".
2. The question now is what happens to the Samba work-around patches. Now that the bug is fixed, do the patches cause a side-effect (i.e. "a new bug")?
3. This gives rise to a new meme of nerd insults. "You call yourself a programmer? Why I've fixed bugs older than you!" Of course, only one man is entitled to use that line.
I see a lot of people making excuses for this bug not being fixed sooner. If this happened to Windows the comments would be over flowing with "MS SUCKS" and "Switch to Linux/BSD" comments. Yet, when it happens to BSD the comments are more along the line of "this is no big deal..." and "so what... it still got fixed didn't it?"
Hi Kids! Today's word is: Hypocrisy!
I don't disagree that open source is less prone to have bugs that dont get fixed for long periods of time. Nor do I disagree that open source bugs are easier to find fixes for. However... this seems to indicate that the difference between open source and closed source in this regard is not as big as open source proponents have been trying to claim. All the excuses and explanations in the world wont change that this is EXACTLY what open source proponents have been saying wouldn't happen with open source.
Were you to claim it was an overstatement, I'd agree. That doesn't seem to be what you're claiming.
"Many eyes make all bugs shallow" is clearly wrong, in detail. There do exist bugs that can't be traced that way, e.g. (The one I'm thinking of is the infamous C compiler that was jiggered to insert a binary mod into itself whenever it recompiled itself.)
However, being wrong in detail isn't the same as being wrong in principle. Very few statements that use an un-circumscribed "all" are actually TRUE!!!, but that doesn't keep them from being true. You just need to use a bit of sense in interpreting them. Once you do, you realize that the "Many eyes..." statement is a good first approximation to the truth, and possibly a good second approximation. (I.e., it's pretty close to always true.) Depending on how you define many and shallow, of course. With proper (and reasonable) definitions I could make an argument that this particular case was, indeed, an example of that rule working out in practice just as it claims to work. (My chosen approach would be to adopt a historical perspective, but there are other approaches that would also work.)
OTOH, if you adopt a definition of many == three or more, and shallow == found within a day then the rule is a massive failure. To me this would reveal more a misunderstanding of the rule than a defect in it, but it would be a legitimately defensible interpretation of it from the viewpoint of simple English rules of speech.
I think we've pushed this "anyone can grow up to be president" thing too far.
Perhaps they didn't feel like doing the "no, it's supposed to work like that, you're wrong" dance.
Sure it's found, but after 20+ years? That's not what I call a good approach.
;).
;) ) users in and watch them, and even then you MUST have _trained_ eyes to watch them. The trained eyes can often spot problems the naive users are experiencing - the naive users may not even realize they are experiencing problems or realize what is wrong...
"many eyes make bugs shallow" is equivalent to the "infinite number of monkeys..." thing.
In my experience it's better to have quality than quantity when it comes to the eyes used for finding bugs.
Any idiot can tell you about obvious bugs, and it's kind of waste of time to see 1 million duplicate bug reports, because it's too slow to search through 100 million other bugs (with dupes) for dupes
For UI stuff, you get the naive (as in not yet unexposed to your evil software
However... You are mistaken. Clearly that DID happen here. That is exactly what happened. It took a long time, granted, but it was found, it was fixed, and (seeing as no one was really bothered by it too much) it was even done in a "timely manner." (Albeit a very long time but it didn't seem to impact anyone to any great length.)
The bug was found, the code was open, the bug was fixed. Hell, it isn't even newsworthy. It's just a squished bug that had no real impact in the majority of people's experiences. The impact was so small that people coded around it, so be it. Open source doesn't even demand that it be bug free. Coding around an existing bug is perfectly okay (I think). Fixing the bug may have taken valuable developer time away from their core projects for all I know. Either way, it was found, it was fixed, and because of the open source nature of the software these two things were possible.
"So long and thanks for all the fish."
Hmmm....if the bug was found in 4.2BSD, then how do we know that that bug was not also in original AT&T UNIX that 4.2 BSD is derived from? One could always look in the source released by Caldera (now known as "The SCO Group") some years back.
My blog
...because currently, no Linux bug could possibly be older than roughly 17 years. :-)
Heck, how many other bugs have been fixed over the years?
These detracting arguments smack of FUD mongering...
Seven Days with Ubuntu Unity
You write: "This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir." Go back and read the link: the bug can be reliably reproduced in a directory with only 27 files, if the right file is deleted (the file whose directory entry is the first file in the second block). You assume that the Samba people didn't report the bug to any BSD people, but that's unclear. The problem is that they didn't have a reproducible way of demonstrating the bug, which tends to lead to responses like "we don't believe you" from upstream developers.
What the fuck is slashdot doing popping up "please take a survey" windows on me?
I am sure you will agree that the correct statement sans flamebait modifications does not warrant a "clear contradiction" as many detractors of FOSS who are jumping at this opportunity to point out a example of a fixed bug that was not necessarily a security risk and saying "see, the OSS model is clearly flawed! BSD has a 25year old bug that was only fixed now!"
Take off your paranoid hat. Holy crap. I am an open source author myself. I just have always hated this particular argument.
Well then I was mistaken. I did not react out of paranoia however.
Seven Days with Ubuntu Unity
BSD has been checked over by 'quality' eyes--when it was used as the basis of NeXT/OSX, for example. They missed it too.
If the code wasn't open (i.e. if there weren't many eyes), this bug would have remained forever, or at least until the code was dumped.
Am I the only one who thinks it's quite impressive to have 25 year old code still being used and employed on new systems?
What?
(In other words, I believe in all things moderation and think zealotry is absurd and I probably shouldn't be confused with an open source zealot or even an advocate.)
Sounds like you're a bit of a moderation zealot...
Not A Sig
"this bug would have remained forever, or at least until the code was dumped"
The "good old" weaknesses of proprietary code make it likely to get dumped way before 25 years.
a) the company having the source could go bust, or decide to do something different.
b) people could lose track and/or understanding of the source and since it's not "mirrored and documented by everyone", it's gone.
c) someone could decide to throw it away and write a new version from scratch, and so the old version will vanish.
Often you don't have to wait longer than 7 years before the code is dumped.
I have found security bugs in proprietary software, and notified the vendor to get it fixed, but they had trouble getting the fixes right, in the end they only fixed it properly in version 5 (I first found it in version 2, found a different variation of it in version 3 and 4).
Lastly, there aren't that many quality eyes, and they often have more fun things to do, so they just copy old BSD code "as is"...
Why haven't I seen a press release about this? You guys are getting lazy.
(rot13) rpbzbab@tznvy.pbz
All too true - I sometimes joke with my girlfriend about how I'm zealous about moderation or I'm zealous in my anti-zealotry. I'm not sure why she stays with me.
"So long and thanks for all the fish."
If this were a proprietary system that were licensed out to various companies stricken by NDAs etc. it's quite likely that if one company discovered the bug the others would never learn about it.
That, or the individual who discovers it would be fired, blacklisted, and possibly be arrested and charged with some criminal act.
What?
Is the MPEG Chroma bug. That was created by someone who wrote one of the original MPEG decoders that was eventually sold/distributed to most of the companies making the first DVD players (pre-1993). This one just won't go away either - initially most of the DVD manufacturers refused to acknowledge it even existed (probably because they didn't want to recall millions of DVD players with non-upgradeable firmware). I still see it every now and then on TV (indicating one of the upstream broadcasting companies is still using equipment afflicted with the bug). I notice it most often when diagonal red lines end up staircased like they're poorly interlaced (see pictures in the above link).
That only holds true if "shallow bugs" meant "All bugs are found"...which implies bug-free software, and I sincerely hope that nobody has been zealous enough to make that claim to you.
Generally speaking, nobody can actually produce bug-free software. The goal in dealing with bugs is to asymptotically reduce both their number and severity.
Now the claim I've heard generally associated with the saying "shallow bugs" is that open source will on average approach zero bugs and lowest average severity faster than other methods, due to greater exposure/more people working on fixing them.
I'm not going to weigh in on that claim one way or another--there are plenty of people here who I'm sure can argue the point better than I can. I just wanted to say that if you've heard this claim expressed in terms of averages, be careful with regard to your reasoning when you're trying to refute it. A single exception doesn't negate an average, and I've always heard "shallow bugs" to be a statement about a statistical behavior, not a claim that open source software is universally bug-free past a certain age.
Think about it this way: all software has bugs. A corollary of this is that old software will have old bugs. That's really all we're seeing here.
by saying hottest on record you are comparing more than 1 point, assuming you didn't start recording temperatures today, thus being the hottest
Maybe they didn't want to deal with the "lol, fuckoff stallman lovers" comments.
Damn you and your insensitivity to my moderate needs!
"So long and thanks for all the fish."
Surly a bug this old should be reclassified as a scrab beetle?
Steve Ballmer responded saying that not only does Windows have older and more easily accessable than BSD, but that it also holds a patent on leaving bugs unpatched for a significant amount of time.
Samba could have reported it, but perhaps when they noticed in 2005, they also noticed that the July 2003 discussion on freebsd-hackers already acknowledged the existance of the bug, but never actually fixed it.
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
Wuh? thousands of projects that prove the point, and one single bug that doesn't, so reject the whole argument?
You sound like those people that don't like evolution. The concept of shallow bugs is an approximate description of how things work, not a methodology. Also, if enough people stare at the code and use it, they will find the bugs that matter.
There is a fine line between being a cultivated citizen and being someone else's crop. - A. J. Patrick Liszkie
Because the bug was found in the libdir code in 4.2BSD, which was never in SysV. Every non-BSD system that implemented the same API did it themselves, and won't have the (same) bug.
Innovation happens elsewhere. -- Bill Joy
And most commercial companies have neither. There are only a fixed number (in absolute terms) of quality programs, what are the odds they are working for your company?
Now MS, Google, etc. may be able to attract some good people, because of the positions and influence they hold, but most companies aren't that lucky.
So it it better to bet that you managed to snag some of these stars, or to leverage open-source and, while you have a larger quantity of coders that may be so-so, you may have a better chance of having some of the excellent programs in your community? In the latter you would also have to worry about the sucky programs trampling on the work of the stars.
I like the "balmer" tag...
As in "one who balms"?
Or, some Microsoft basher that can't spell?
I don't believe that many in the open source community dogmatically accept the shallow-bugs arguement as a universally applicable rule of software development. It seems to me that any reasonable person would agree that
- Not all eyeballs are equal. Theo and Linus are vastly better at code reviews than the average Slashdot reader, for example. So, there is a wide range of eyeball quality.
- Bugs have a wide distribution of subtlety. The bug in this article seems to be high on the subtlety scale.
- The 'with enough eyes all bugs are shallow' statement is universally recoginized as simplification. For example, ESR states that a better statement is:
So, yes, if you start with the sound bite version of a 'law' and apply it to a statistical anomaly, you can 'disprove' the law. We now agree that the sound bite doesn't perfectly correlate with 'real world' So, you are now forced to decide if the relationship expressed in the 'law' is uncorrelated (bug fixing uncorrelated with eyeballs) , anti-correlated (eyeballs hinder bug fixing) or correlated (eyeballs often help bug fixing). I believe that the law is still a good rule of thumb and that the correlation between 'many eyes' and 'bug get found' is strong.Think global, act loco
And not only that, there have been 6 year old *known* bugs which people just refuse to fix (like the Firefox large tooltip bugs).
I completely agree with the you on the thousand monkeys. And it might be even worse. It may turn out to be something like "El Farol" problem. As, it is common knowledge that other people will look at the bugs, then nobody will care to look at the bugs because, at the end, other people surely have looked at the bugs.
Or, be honest, how many of you have really looked at Firefox (and done a complete verification, validation and accreditation testing after reading the test and then compiling it) before using it?
I know didn't, after all, the code is there, and a surely lots of people have already seen it
Ubuntu is an African word meaning 'I can't configure Debian'
So, it is just another Open Source bug which the developers did not care to fix because it was not the itch to scratch. Similar to plenty of other bugs which are notified in bugzilla kind of systems and the developers just ignore for months and even years.
Ubuntu is an African word meaning 'I can't configure Debian'
Well, not really. If you're saying it's the coldest/warmest day on record then you're comparing it to previous data points. Of course this only works when there are previous points on record, but you wouldn't really be saying it if there weren't, unless you're Captain Spin.
Even so, this still doesn't indicate a trend.
I know this... because Tyler knows this
No matter what legacy app you are using. If it is open source, you could fix it, if the bugfix would break it. Or you could pay someone to fix it for you.
The problem is that the stdio directory scanning routines cache multiple directory entries with a single getdirentries() system call, but then may try to 'seek' into the middle of that buffer later on.
Any filesystem based on a non-linear-file directory format, such as a B-Tree, will simply never produce consistent offsets or indices within such a buffer.
The only way to *REALLY* fix this is to add a cookie field to the filesystem-independant dirent structure (and if your BSD isn't using a filesystem-independant dirent structure, it needs to be first fixed to do that). lseek()ing to a directory cookie works just fine, and always will (or at least will far more robustly then trying to scan a re-cached buffer from getdirentries()).
When DragonFly went to a filesystem-independant dirent structure I very stupidly only added ~40 reserved bits to the dirent structure, instead of the 64 we need to properly implement per-entry directory cookies. I'm still pissed at myself for that gaff.
In anycase, a per-entry directory cookie effectively solves the problem. The only other way to get such cookies, if it can't be embedded in the dirent structure, is to create a new system call similar to getdirentries() but which also populates an array of directory cookies. FreeBSD and DragonFly have kernel implementations of readdir which supply per-entry directory cookies so it is really just a matter of creating the new system call and then making libc use it.
-Matt
>large directories
Large as in with more than 25 files, so no not so large.
As written above, Samba had reported the issue in 2005, but apparently it wasn't handled properly..
That's a different bug.
Of course, you're likely to be violating the EULA by running a non-free program in a debugger in the first place. In many ways, the value of free software is as much the freedom to do what you like with it, as the availability of source code...
"Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone."
Lurch's Corollary: Even after a hundred detailed bug reports have been posted on the official forums, Blizzard still won't fix it.
You can't say reporting it twice means no one ever saw it, since clearly at least three times it was noticed, the first two times when it was reported, then later when the samba devs coded around it after they were ignored by the bsd devs.
In this case, I would say the shallow bugs argument holds up perfectly, and annoying politics let us down.
No, we did report it. The answer at the time was "this is allowed by POSIX, deal with it", can be seen in the bug report here :
:-). Eventually I just added a parameter that allowed our open directory cache to be turned off on *BSD. Once it got into the hands of Marc Balmer he took us seriously and fixed the bug.
https://bugzilla.samba.org/show_bug.cgi?id=4715
I did point out that no other POSIX system behaved like that, but that didn't seem to make much difference
Jeremy.
Why do you think this bug exists in NetBSD? Have you looked at the source?
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
No, it's the same bug. If you'd read TFA and the posts I'd linked to, you'd have noticed that all 3 mention seekdir() and telldir(), in reference to seekdir() not moving to the correct location using the value returned by telldir().
From the link in my previous post titled July 2003 Discussion:
"On FreeBSD, seekdir() doesn't seem to behave as I expect it to."
"On other platforms, the first and second telldir() return the same value.
On the two FreeBSD machines I've tried it on, the first telldir() returns 1
and the second returns 0."
From the link in my previous post titled acknowledged the existance of the bug:
"It seems you're wrong; what follows is a quote from the SUSv3:
'If the most recent operation on the directory stream was a
seekdir(), the directory position returned from the telldir() shall
be the same as that supplied as a loc argument for seekdir().'"
From TFA:
"Samba makes use of telldir()/readir(), seekdir()/readdir() to build an internal cache of (large) directories to speed up directory accesses by CIFS clients (Windows machines)."
"Apparently there are two problems: seekdir() not returning to the position initially retrieved using telldir() and a performance problem."
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
I wonder if one could sue SCO for crappy code?
I mean, a burglar can sue the owner of the property they're burgling for leaving it in a dangerous condition, so why not this too?
That's if it were true, of course.
Max.
i find it kind of ironic that the samba developers, rather than finding the code, and fixing it, coded a workaround instead.
I realize, that they got the whole 'we know about this bug and it's not a priority' business, but this guy came a long and fixed the bug... even though it's been know about since 2003, and samba wrote a work around in 2005, and now in 2008 someone finally wrote a fix for it...
what was so special about this code that the samba developers couldn't have just written a patch rather than a work around?
https://www.gnu.org/philosophy/free-sw.html
You think this only happens in the open source world? Let me show you what the "defect priority analysis" would look like at my work were we to receive a report about this bug: Reproducible: Yes Frequency of occurrence: Extremely low, only comes manifests for a very rare corner case. Systems known to be impacted: None, systems that have noticed defect previously have already implemented a workaround. Current known impact upon the functionality of the system: None Systems currently using code where defect is present with no impact: All systems accessing a directory Potential negative impact of an incorrect fix: Extremely high, potentially crippling filesystem traversal. Proposed solution: Wait till people stop using DOS filesystems.
Why would I write a fix to give to you when I know you wont be adding it to your program at all?
Why would I write it at all? Specifically in this case, its usually considered 'bad' for a user level application to require you to patch your kernel, and only on a certain OS but none of the others. I realize this is done, but no one likes it.
Especially when you can replace 'if you are on bsd, you must patch your kernel' with the current 'you do nothing, the software will figure it out and it will just work'.
The samba team can not force the BSD team to change bits of their software. All the samba team can control is their own code, which they used to code the work around so it would still work, without having to tell bsd users "sorry, a feature of your os (that every other os calls a bug) prevents our software from running". Someone else would patch samba to work around this non-bug 'feature' and then you get the same situation as now.
Of course of the original bsd dev would have dropped his ego and code-political agenda for a minute and just admit it was a bug, none of this would have happened, and it would have been fixed right after it was reported the first time.
That's a rather weak nitpick.
We had a consistent crasher in the ATI driver when I was doing windows dev.
The most frustrating thing was to breakpoint the crash, dive past our lovely debugged source into incomprehensible graphics driver assembly.
All we could do was send a reproducing testcase to ATI and hope for the best.
It never did get fixed.
-- perl -e'print pack"H*","6e656d6f406d38792e6f7267"'
"But that's exactly my point, isn't it? The bug was only "visible" through its behavior, not its manifestation in code."
No, actually it was quite easy to identify from it's code. the problem wasn't that it wasn't visible in the code, it was you had to look at the code of 2 different programs to figure out why what was happening was happening. Since the problem was first noticed in very large directories (250,000+ entries) it was hard to reproduce and track what was happening.
but this guy came along and found something out, if you made 28 files, deleted file 25, then seeked file 26 you'd get file 27!
the reason why it always did this is simple, block 25 is the first entry of the second block (24 entries per block) but until this guy had done all this work, figuring out how to reproduce problem (without 250,000 files) nobody had thought of looking at the code of 2 different programs and how they 'didn't work together' it was a really simple rewrite, he just added a 'if called from subcall' function that fixed the automatic skipping of the first entry code.
So really the hard part was looking at 2 pieces of code and how they work together, neither code had a bug, the bug was only triggered when a specific call, called on a piece of code that tried to always skip the first entry of every block.
the annoying thing is that while this bug was 25 years old anyone complaining about it got to play the 'prove that this bug is real' conundrum, and it's really hard if you don't know the 'trick' that this guy found to make it reproducible.
Frankly, this kind of bug is easy to code, you write a function to work a certain way, then a different function calls it and uses it differently than normal, and create a hard to track down bug. with windows, this bug might never get caught, because reproducing it's hard, and Microsoft will just dump the code a few years later, with all the bugs still in tact, for those using an 'unofficially supported' operating system.
With open source, there are very few reasons to abandon old source code, so hard to reproduce bugs (or hard for the people who've noticed the bug) will take the kind of person willing to write programs to try and reproduce the bug to actually pin down where the bug is being triggered, when it's not immediately 'obvious' where in the code the bug is. It would be nice if there were MORE of these pedantic types, willing to spend a couple days messing around with code trying to trigger a bug in a reproducible way..
but unless more people use open source, these pedantic types won't be exposed to the problems that others just gave up on reproducing.
I remember coding old irc scripts 'around' bugs in various versions of mirc, it really annoyed me because those kind of bugs wound up being fixed in later version of mirc, which would break my code, that depended on things working the way i had noticed things worked(regardless of how they were documented)... so eventually i stopped upgrading mirc, and then finally gave up on mirc entirely and then my coding script days ended.
https://www.gnu.org/philosophy/free-sw.html
That's "scarab" beetle, and don't call me Surly.
"Also, if enough people stare at the code and use it, they will find the bugs that matter."
In this case it's not a matter of staring at code until your eyes bleed, you could stare at the 4.2 BSD code all week and not ever notice how this bug got triggered.
In this case, you have to USE the program, Detect when the bug happens, and then Reproduce the bug. That last part was where everyone else failed.
They could create a program that created and deleted files, and did seeking the nodes, and detect the bug, but then the next time that ran the code, the bug didn't show up, or it showed up in a different place every time.
the lack of consistency was why no one staring at the code could figure out what was happening... looking at the code of a half dozen basic operating system functionality source files without knowing why the problem was happening would have been useless..
https://www.gnu.org/philosophy/free-sw.html
a) large "tooltips" (by which I assume you mean img title attributes) render "properly" instead of truncated in FF3b5
b) truncating title attributes is not a bug
If you're going to work in Open Source projects related to Operating Systems, stay away. The dreaded "trade secrets" accusation could ruin your whole career.
Samba actually uses seekdir? Where?
My first guess hearing about this bug was that the actual reason for this problem is that any program doing anything with directories is going to either look up individual entries or read the entire thing, either all-at-once or incrementally, neither of which use would benefit in any way from telldir/seekdir.
BSD and the *nixes were designed to be simple, effective, modular operating systems. As long as you have the drivers and know how, you can easily port them over and install them on a variety of hardware. Then, thanks to their modular nature, you can then plug in all the extra bells and whistles you need for your particular system and go to town.
That is why they are still around and still popular. They are K.I.S.S., work as they are supposed to, and the modular code that is plugged into them can just be sloughed away when it becomes out dated, and newer, better code plugged in to modernize the OS as you go.
That's also why Windows has had so many problems over the years. Windows was designed to be everything you need in a single package. That means everything is all tied up together. So, unlike BSD and the *nixes, when part of the OS becomes out dated, MS can't just unplug the old stuff and plug in new stuff. It's all interlinked from the ground up. That means a large portion of development time getting is spent fixing bugs caused by new additions, which then cause even more problems down the line when you go to update again. It also makes it bloated as legacy code ends up stuck in the mix because without it the patched together additions wouldn't function right.
And, unfortunately for MS, their market dominance is based on the windows "feel" being familiar and backwards compatibility. If they could, I'm sure they'd re-write windows from the ground up, but now they are in a catch 22 where doing so might significantly kill their market share.
I'm guessing Bill and company sometimes look back and kick themselves for not having the guts to go for broke and re-do the OS from the ground up for Windows XP. Because, back then MS was still king, Apple was at its low point with a very small and stagnant market share, and the *nixes were still primarily a hard core enthusiast hobby. Today, if MS were to completely change Windows, they'd probably lose a significant amount of market share to a variety of alternatives.
You are who you are, let no one tell you different. But, never close your mind to a new point of view.
Not exactly - a test case was written that did it in exactly 28 files, and considering that block sizes were significantly smaller back then, it might've actually been even MORE common in old days.
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
You seem to argue that the BDS developers working on not "quality" developers. Are you suggesting that commercial developers are better?
I've worked for two large commercial development groups. There were some excellent programmers, engineers and thinkers in all of them. But one thing about commercial development work is that it is often limited by financial constraints. Finding bugs doesn't add to the bottom line. When we would finish one software project we were on to the next.
I find that open source developers have a sense of personal responsibility for "their" code and they want to see it perfected.
Did you even read my post? I quote myself:
enough people stare at the code and USE IT
LS
There is a fine line between being a cultivated citizen and being someone else's crop. - A. J. Patrick Liszkie
This is not the first time the OpenBSD time does an excellent job at finding obscure bugs that were lying around for one or two decades in every BSD derivative. Congratulations !
{{.sig}}
The change has already been incorporated to DragonFlyBSD and FreeBSD.
{{.sig}}
It's all to do with the old perspective of having only small directories.
If you're using a stream, that means that someone will have to fill a pipe/piece of memory somewhere at the call to opendir(). That works, but starts to have an impact when you have very large directories. Also, you risk seeing 'ghost' entries; entries that have been deleted in between the call to opendir() and readdir(). You can defend that by saying it's a transaction; the user 'sees' the directory as it was at the time of creation of the cursor (your DIR*). But then again, that would be only useful if you could do other things during the same transaction. And there is no such mechanism in UNIX.
In the mean time, the whole issue could be fixed by saying that when there are changes to the structure of a directory during an iteration of it, you forfeit your cursor (as you would in any other database like system outside a transaction). Essentially, you would flag the DIR* to become useless. On top of that, of course, you would like to have an exclusive lock on a directory, so you can make sure that those changes can't be made. Either that, or you have to make your filesystem fully transactional.
Sorting is another thing; at opendir() the user must be able to choose offset, limit, field to sort on and search criteria. The WIN32 API went this way a bit, but it didn't go far enough. To expect userland to gobble, match, sort and paginate when the kernel can do it so much better - well, it works in a buffered way for small directories, as I said. But if we want true database-like capabilities on our filesystem (and I know I do) then the spec must be changed. Since all of the calls involved in this refer to process-global resources, you wouldn't necessarily have to change the C-API, you could just extend it.
Religion is what happens when nature strikes and groupthink goes wrong.
Dude, don't dis my lexia.
Sorry Curt :(
Caesar si viveret, ad remum dareris.
Thanks for the link, it made for an interesting read. The "allowed by POSIX" argument is amusing given that if you flip the manual page (in the linked POSIX site) over to seekdir it clearly states that seekdir(telldir()) works as the Samba code needed it to. Ah well, at least the fix finally happened three years later.
I'm quite certain that my anecdotal evidence is sufficient for establishing trends without statistical studies thank you. For one, the bug got fixed, so obviously open source is better. For two, the bug existed so obviously open source sucks. For three, contradiction, so obviously banana sunday!
Bugs Slowly Decomposing
that's longer than microsoft!
I judge by developer track record, not by whether some thing is OSS or commercial.
;).
;).
:). So far people don't seem to complain about it that much. Except the config part, which honestly is not my fault - someone had the bright idea of outsourcing the development of the web config frontend for all our stuff to India (and we're cheaper than the Indians! ). Summary - now some people in other depts actually prefer to use SQL statements to configure our stuff... Sad.
I was in the IT security line for some years, in my experience developers who produce good code, continue to produce good code (and often better). Whereas developers who produce crap, tend to continue producing crap (maybe a bit better after years).
Just look at bugtraq, bug reports, and the sort of bugs that get found, and you'll have a good idea how good the developers are, and what the future holds for the code. Of course with OSS you often can see the mailing lists so that gives you even better insight on how the developers think.
So for example, I have reasonable confidence in the Postgresql developers, and that they will try to do the right thing for the long run.
The Linux kernel bunch seem a lot more "yeehaw cowboy", so it's a good thing the various distros are around, and you can pick a distro that matches your risk aversion levels
As for "personal responsibility", I'm not an OSS developer and I also have a sense of personal responsibility for my code. So far I have had very few bugs reported with the stuff I wrote for the company. I don't actually think I'm a great programmer just my IT security background makes me more paranoid than most coders (hey it's not paranoia when they're really out to get you!), and I'm really lazy, so I try to write/do things as correctly the first time round. Maybe I pick the easy projects too (I stick to using perl and avoid things that require C or C++ if I can). Leaves me a lot more time to post on slashdot
I think my dhcpd is better than the ISC's, but it's closed source so you'll just have to take my biased word for it
And even more important, they would have to code a workaround anyway, to unbreak older BSD systems running Samba.
Xfce: Lighter than some, heavier than others. Just right.
I am astounded that some people are claiming that this shows OSS does not work, BSD sucks, etc.
Operating Systems are extremely complex and created by man. It is therefore impossible for any of them to be absolutely perfect.
For a bug to be hidden for 25 years (the bug itself and not the symptom), it would have to be obscure and occur only in fringe cases. Well, this bug was obscure and only occurs in fringe cases!
If this were happening in closed source software, the bug would be near the bottom of priorities, since it only occurs under rare conditions. Nobody will bother going to the trouble of analyzing the binary, so to be fixed it would need to be taken seriously by the company behind to closed code.
In this OSS case, it only had to be taken seriously by one talented person.
BSDi and Apple didn't find it. Some person who was empowered with the code being open found it and fixed it. In fact, this fix to Apple's OSX, comes courtesy of the O in OSS.
Proof that OSS works, from one of the oldest OSS bases around. You would expect a bug to be pretty obscure in such old OSS code and that is exactly what this bug is. Is it any wonder that the BSD's are so incredibly stable?
Any chance this bug might be in AIX, HPUX, or Solaris? Just curious. Or did they fix it and not give it back!
"Of course of the original bsd dev would have dropped his ego and code-political agenda for a minute and just admit it was a bug, none of this would have happened, and it would have been fixed right after it was reported the first time."
well, it is true they did handle the problem poorly, the bug was reported first by hackers, then by a fellow FOSS development team, and both times the BSD developers wanted code that would reproduce the bug reliably.
But the other thing is though, none of this code was ever really written by any of the Free/Net/Open-BSD developers it was written by the 4.2 BSD guys... and the *BSD guys had never run into the problem(or if they had, they ignored it), so they believed it wasn't a real bug...
the problem is Very Hard to run into, outside of complex server usage... which makes me wonder, how come the yahoo BSD coder guys never ran into the problem? did they code their own code for file management?
https://www.gnu.org/philosophy/free-sw.html