>The whole reason I object to this objection is because you don't seem to understand the idea of "development release".
I understand the concept of "development release" just fine. I've been involved in more of them than you. I'll put this simply so even a moron like you can understand it.
Even for a development release, there are certain kinds of breakage and certain levels of carelessness that are unacceptable.
It's not a difficult concept, and what we're seeing in 2.3.x is that we're on the wrong side of that line. Some people just need to clean up their acts, and that's all there is to it.
>But this software is BETA, which is why it's in the UNSTABLE kernel numbering scheme.
Unlike some of the other BS people have posted in response to my original comments, this may be a genuine philosophical difference. You see, I've been working for ten years in a world where "beta" means that the people responsible for the product have done everything they could to ensure that it's free of major defects, but they want to get some real-world experience before they "close the book" and call it done. A beta isn't expected to be perfect, but it shouldn't have any _known_ defects of a certain severity level and adequate testing should already have been done to verify that nothing's in there that will later seem "obvious". Linux 2.3.x clearly does not meet this standard. Maybe this standard is too stringent and inappropriate for Linux 2.3.x, what with the cooperative development model and all. In fact, I believe that is probably the case. However, I think the current state of Linux 2.3.x fails to meet _any_ reasonable quality standard, even one more appropriate to the situation.
I still believe that there's a certain level of diligence that should be expected before beta, before alpha, before hand-off to QA, even before other team members get their hands on a new piece of code. Obvious standard tests should be run to check for regressions, for one thing, and the code should not leave that developer's hands while such obvious regressions exist. For example, I run Connectathon tests before I check in even to a development branch, and if I see a new failure I don't check in. That doesn't mean I'm special, either; it's basic stuff that should be required of every developer. What I'm seeing, and what I'm complaining about, is that even these very basic rules clearly are not being followed in Linux development.
Nobody in their right mind expects any software to be perfect, least of all something clearly labelled "unstable", but I think it's perfectly reasonable to expect that things won't be broken _more than they have to be_. And that's not just a philosophical thing, either. Detecting and fixing bugs sooner rather than later is also more efficient. What if fixing O_SYNC requires more than superficial changes, requiring everyone else to change their code yet again after they'd already changed it once to go along with the new VFS stuff? Wouldn't it have been better if the person who broke O_SYNC had been required to fix it _before_ the whole suite of related VFS changes was sent out to affect everyone?
>The same is true of O_SYNC: it is not vital to many people
Are you kidding? True, it's not vital to that many people, but to those who do need it there is no substitute. A filesystem that doesn't properly support O_SYNC (and yes, I'm aware that ext2fs never did, even before the current breakage) is simply not suitable for mission-critical apps. It's that simple.
>Basically, Linux's VFS layer is a VERY powerful concept,
...which has been around longer than Linux itself. As a filesystem developer, I was stunned to find that Linux didn't have a VFS layer to speak of years ago. The value of such an abstraction, and the methods for implementing it, were old news even ten years ago when I started working on UNIX.
>Obviously, you have never developed software on such a huge scale as the linux kernel.
Actually, I have. I was in the OS groups at two of the early UNIX-SMP companies, and I've worked on a half-dozen other kernel-oriented projects where the complexity was comparable. The point is that it's precisely _because_ of the large scale of something like a kernel that a more disciplined approach is needed. Fast and loose works fine for small stuff. What we're seeing is precisely the sort of breakdown that always occurs when you apply the same approach to big stuff. It's avoidable, but only if the developers exhibit some maturity. As you have amply demonstrated, though, that is a quality often lacking in this community.
>Before you start insulting the (majority) of people that give their FREE TIME to develop YOUR KERNEL, realize there is a lot more to programming than your simplistic view.
Typical slashdot. Someone says something critical of Linux, and respondents assume that the poster is a non-programmer. Au contraire. The whole reason I object to seeing this kind of breakage is because I know this stuff can be done better. I've done it better myself. I've worked alongside hundreds of others who know about basic software hygiene, who developed many of the techniques and algorithms (and sometimes the code) that has since been recycled in Linux. Linux can be improved by a willingness to learn not just technical stuff but also organizational stuff from the experts, instead of having to learn every lesson and reinvent every wheel the hard way.
An example: on my current project, I have often made changes that have caused breakage in msync in the Connectathon tests. This is exactly correspondent to the msync breakage in Linux right now. The difference? I go out of my way to _find_ that breakage by running appropriate tests, and then I _fix_ it before anyone else - even my own team members - is affected by it. There's no reason at all why an open-source developer can't do exactly the same thing. The reason many don't has nothing to do with philosophy or organization structures. It has only to do with individual laziness and lack of self-discipline.
It's really sad to see so many items on the list that indicate regressions caused by earlier checkins/merges. For example:
>msync fails on NFS >UMSDOS was broken by the fs changes >Restore O_SYNC functionality
These are important pieces of functionality that worked once. Why were checkins/merges allowed that broke these? Wasn't the code reviewed? Didn't anyone test for these? Say all you want about "open source is better" and "debugging is parallel" but these sorts of things would never have slipped through the checkin/review process in any decent OS group (I should know, I've been in a few).
The msync and O_SYNC bugs would have shown up using any number of public-domain standard tests, and the people who broke them should never have forwarded the code to _anyone_ else with bugs like these still present. That's basic "software hygiene", and failing to require even that much just gives the whole Linux community a black eye. Why are we doing MS's PR folks' job for them?
They didn't release source and the drivers don't use DRI in Xfree86 4.0 While it's nice they released drivers, they could at least follow standards.
There's a new "standard" every freaking week, whether it's being pushed down our throats by ISO, by MS, or by some bunch of geeks in a basement. It's unreasonable to suppose that anyone - open source or closed - will continue to support every piece of new hardware (not just their own, but new CPUs, new chipsets) and fix every bug in old drivers and support every "standard" in the universe, all in time for last night's dinner. They have to make a resource-allocation decision, and I don't think you or I can say we'd make that decision any better.
And please don't trot out the old canard about open source making everything better. With any complex piece of software someone has to make the calls about what goes into the release and what doesn't. And then somebody has to test it all. Changes made to benefit platform A still have to be tested for platforms B through Z. For every thousand people who whine about this feature or that standard, you're lucky to find one who can make any sort of positive contribution to help the process along.
And of course, with closed source drivers, if it crashes you can't tell why, nor can you possibly fix it.
Yeah, like the average Linux user has clue one how to fix a bug in a video-card driver. There are drivers for much more common and much simpler devices - e.g. some Ethernet cards - that are unmaintained because there's nobody in the entire Linux community with both the skills and the interest to maintain them. You think you're going to find lots of people who will be able to fix problems in a driver for a device more complex than a modern CPU without messing things up even further? Dream on.
>Open-source software, subject to constant peer review, evolves and gets more secure over time. But as more crackers seek and find the better-hidden flaws in opaque binaries, closed-source software gets *less* secure over time.
If I wanted to go looking for security bugs, such as buffer underruns, it would be far easier with source than without. So it might appear that it would be _easier_ to compromise an open-source system.
But wait, you say, they'd be easier to find _if they were there_ but because everything's open to scrutiny all such "low-hanging fruit" would have been found, reported, and fixed in the open-source system long ago, whereas they'd remain "latent" in the closed-source program. Oh really? Are there more people who find and report bugs to people who can fix them, or more people who find bugs and either keep quiet or only distribute the information to other script kiddies? Are there more people of either type targeting Linux, or Windows? Do these factors perhaps make just a little bit of a difference in how the security of a system changes over time?
My point is not to say that open-source software is or is not in reality more secure than closed-source software, but that any such difference has little to do with availability of source. There may be a difference based on source availability, but that difference is overwhelmed by the basically non-technical difference based on how many "good guys" and how many "bad guys" have an interest in a particular platform. The statement that closed-source software becomes less secure over time is not based in any kind of facts or logic, though within any particular small sample it may seem true. As responsible, reasoned advocacy ESR's piece is barely half a step above "open good, closed bad" which is itself not even half a step above "Linux roolz, Windows sux". At least these almost-equivalent statements have the merit of brevity, in stark contrast to the pompous and verbose style we've all come to "enjoy" in ESR's writing.
>The source KPI remains consistent within a stable series. Since the source is open and the source KPI is consistent, who cares about the binary KPI?
I thought I was answering that question, but apparently not clearly enough. In brief, then: relying on people to change their source when you change an interface is an invitation to error, because sometimes they don't. Sometimes they're not around to do it. Sometimes they don't recognize that they need to change their code. Sometimes they change their code and introduce a whole new bug in the process. It's better for system stability if their code continues to work without intervention, thank you very much.
>Try 2.3. You aren't the only one who thought the old way sucked.
I'll believe it when I see it. I've seen enough false claims about what has improved in past releases that I'm just a little bit skeptical. That code had a _long_ way to go, and kt seems to have a lot of stuff about this filesystem or that filesystem still needing updates to comply with the new interface. My build of 2.3.99-pre4 just last night kept failing in umsdos because of this stuff. Still seems to be very much a work in progress, from where I sit.
>While one correct, fast, and maintainable solution can be replaced with another that is easier to interface with, no decrease in correctness, performance, or maintainability is considered acceptable
That's pretty much what I've been saying. Unfortunately, the new interface is often not provably faster or more correct than the old one - nobody even bothers writing or running the tests that would allow them to make such a statement out of knowledge rather than bravado - while the changes have severe implications for the correctness of other code that worked fine with the old interface.
In my experience, the people who are most afflicted by the "rewriting fixes everything" are junior programmers. Senior programmers have learned the hard way that rewriting something that's anywhere near functional _will_ introduce errors not present in the original, that doing it properly involves ten times more effort in regression testing than in coding, and that if you're not going to do it right you should leave well enough alone no matter how it offends your aesthetic sense. It's too bad that in the open-source community the people whose egos won't be satisfied until they've left their mark on the code - no matter how poorly justified their changes are - tend to outnumber and outyell the people who know and care about producing quality software. The signs of the creeping rot that results from all this marking of territory are everywhere in Linux, and some of the big dogs are the ones pissing on things the most.
>The nice thing about Linux is that when a failed effort to think of everything starts to cause problems, it gets replaced. In the proprietary world it stays around forever (IDE, the 640 limit, the list goes on).
Neither of those examples is of an attempt to think of everything. In fact, both are examples of the _lack_ of foresight that is antithetical to what I'm talking about.
>I've used Solaris. I admin it sometimes
And you suppose that have less experience with Solaris? Or, for that matter, with Linux? Why? Remember, people who disagree with you may occasionally do so because _you_ are the one who needs educating.
Argh. It would have been nice if they had mentioned Dolphin Interconnect Solutions as a (the?) vendor of SCI-based interconnects, or that Dolphin supported VIA before and better than GigaNet. Dolphin had all sorts of problems, including production problems, but at a basic technology level they were always way ahead of anyone else. 3.2Gbps, 2.5us latency - that's the real deal. If they'd had the resources that Myricom's hype machine generates, they'd have cascadable mega-switches and much better economies of scale (lower prices) and nobody would ever consider buying GigaNet or MyriNet pieces of crap.
*sigh* But it was not to be. Dolphin ran up against compatibility and performance problems with PCI chipsets before anyone else did, and the drivers weren't stabilized soon enough, and they never really figured out who their market was, and they made the major mistake of being honest with customers while competitors were bullshitting about stuff as though they actually had it ready to ship when in fact it was barely even on the drawing boards. In the end the liars and cheats got the mind and market share, and Dolphin is barely eking out an existence nowadays.
I also take issue with the following from the article:
>HA clusters may perform load-balancing, but systems typically just keep the secondary servers idle while the primary server runs the jobs.
Bull. Idle standby is just _so_ early-90s. I worked on eight-node mutual-standby (i.e. load sharing with potential for full failover) clusters in '94. We were before most people, but not first. Nowadays almost nobody would buy an HA solution without this capability.
>Linux has no consistent or guaranteed kernel binary PI. That's right, none. Linus and everyone else doesn't see the need
Then they're idiots. Consistent interfaces are the lifeblood of continued interoperability, whether you're talking about GUI fluff or kernel grunge. If lots of people call a function or use a data structure, great care should be taken to preserve that item's behavior or change it only for very good reasons. (To this end, BTW, documenting that behavior in detail is a valuable step but all too rare in the Linux world. Documenting dependencies on specific behavior would be nice too.) Failure to preserve behavior will lead to other components failing, often in very subtle and non-obvious but no less severe ways, and in ways which require more than a mere recompile to address.
One of the biggest problems with Linux is the lack of good internal abstractions that allow people to do different things without interfering with one another. For example, as a filesystem developer I can only shake my head at the Linux VFS and buffer-cache layers. It seems like there's always some asshole changing a function or data structure or locking schema to make their own "neat idea" work, without considering the impact their change will have on others. I've seen project after project lose time from this lack of development discipline. Many times I've seen people make the situation worse by countering one hack with another instead of improving the underlying abstraction so it serves everyone's needs better.
Some code is "politically protected" by the fact that some Major Figure in Linux development will scream at you when you touch it. Other code is protected by being hard to understand, or in other words bad code is less likely to change. What's missing is a widespread recognition that the most important reason to protect code from change is the importance to other components of maintaining its current behavior even in small details.
>In Linux, when it's determined that an existing KPI (it's not an API if they're not applications after all) sucks, it gets replaced in the next development series.
And that's great. Sucky interfaces should be replaced with better ones. However, "sucky" includes "incompatible" as a major component. Major surgery should be undertaken only after understanding and agreement have been reached on the goals and ramifications, and all too often that is not the case.
Even when an interface is completely overhauled, I see no good reason not to maintain compatibility with at least version N-1 of that interface. Unfortunately, doing this usually requires foresight in the _original_ version of the code, leaving room for versions or capability flags so that both called and callers know which behavior to expect or provide. Sadly, the code most in need of an upgrade is inevitably also most marked by lack of programmer foresight. Humility is not a common trait in this community. Everyone thinks their version can never be improved upon, so they don't build in the features that allow future improvements to be made with minimum pain.
>In Solaris or Windows, where most people don't get decent performance anyway
That's a pretty offhand and inaccurate statement, at least for Solaris. It's not great, but neither is Linux.
>I don't, in this case, see how a tabbed MDI interface is much better than a tabbed SDI interface. In fact I've seen many Windows users work exactly like this; all windows maximised, and switching between them using tabbing or the taskbar.
When I referred to a tabbed MDI interface, I meant something much like the taskbar provided by the application (EditPlus) rather than using the Tab key. I want to be careful of comparing anything to bad SDI here, but the main reason I can't really get the same effect with virtual desktops and the original taskbar is that the original taskbar has some pathologically stupid behavior when it comes to things like clicking on the icon for a window that's already open but not frontmost.
>What MDI - in full, subwindow mode - does is to restrict the control I have over placement of windows, by confining all windows owned by one application to a rectangle, which itself obscures everything under it.
Well, yes, that's kind of what MDI is all about. I don't find MDI very useful except with subwindows maximized, and I can't recall seeing anyone else use it much that way either. Seems to me that it reproduces the original screen-clutter problem in a much smaller space, which only makes things worse. Again, this is a matter more of implementation than approach.
>With MDI, you're limited to grouping by owner application. I do not often find such grouping terribly useful.
Yes, this is a problem/limitation, which brings us back to the "one window, one application" fallacy. In a more component-based approach, such as a web browser or DevStudio, it's much less of an issue because components (rather than applications) which perform multiple tasks are allowed to coexist harmoniously within one frame and don't have to each open their own top-level windows to interact with the user.
>For one thing, there's an index to all toplevel windows, the taskbar
...which gets filled up rather quickly if the user has even the slightest ability to multitask. At that point the icons in the taskbar become indistinguishable from one another (which one of these telnet windows is on the test machine I need?) and the only way to resolve it is to use a multi-row taskbar - sucking up more precious screen space.
>M$'s idiocy in tying focus to stacking order
Interesting. The other pro-SDI advocate seemed to like this behavior.
First off, I've already explained how MDI solves the particular problem I face better than virtual desktops do. It's a common problem. I'd think it would be hard to argue that virtual desktops are better than MDI without addressing that scenario, but I guess that here on slashdot anything is possible.
>MDI kills that, because every application has its own way of selecting documents. >... >What makes it worse is that you can't easily tile documents side-by-size, only top-bottom, and because of the MDI nature
This sounds like a complaint about _poorly implemented_ or inconsistent MDI, not with MDI in general. I could create an SDI application that pops up 18 different windows to perform one task, and it would be every bit as annoying as MDI could ever be, but that doesn't invalidate SDI in general. Comparing well-done SDI to poorly-done MDI doesn't prove much of anything. The Office apps are notorious for flouting MS's own interface guidelines; despite their prevalence, they're not good examples when discussing the _inherent_ advantages or disadvantages of MDI.
This is why user-interface guidelines are a good thing. Such guidelines often don't so much say "do _this_" as they say "_if_ you do this, do it _this_ way" so that the user is presented with consistent and predictable behavior.
>Virtual desktops are the more elegant solution to the problems you describe.
I'd say about equally elegant, and then only when VD works (which it often doesn't on Windows, where any number of programs seem to circumvent the hooks that VD code relies on). More in a moment.
>It gives you more screen real estate
No, it does not. My screen is still exactly the same size, capable of displaying exactly the same amount of information simultaneously.
>In MDI apps I tend to maximize the top-level window anyway, so how is that worse than switching screens?
That's sort of my point: that virtual desktops and MDI are mostly interchangeable. If people - including Microsoft, adhered to UI standards in implementing MDI, much of the reason for having virtual desktops would go away, especially since you can have as many MDI windows as you want but very rarely is the number of virtual desktops dynamically changeable. The only thing VD can do that MDI can't is allow windows from multiple separate programs to be grouped in a workspace, and even that wouldn't be much of an issue if plugin/component interfaces were more commonly used. We could have some fun discussing the "one window, one application" fallacy that underlies much of the MDI vs. SDI debate, if we want.
An interesting question might be: how much would you like SDI if you didn't have a virtual-desktop crutch to lean on? Personally, a system without either MDI or virtual desktops (which I have also used for many years, BTW) is significantly less usable than a system with either one. We're really arguing MDI vs. virtual desktops, which is a fun exercise, but not really the same as comparing MDI vs. SDI.
>MDI is, IMHO, not a suitable interface for anything at all.
That's not a very humble opinion.
IMNSHO, MDI is very useful to avoid screen clutter. As a programmer, I often need to switch between viewing/editing a dozen or more source files at once, all as part of the same basic task. If I had each of these documents in separate windows, they'd be overlapping every which way and I'd go nuts trying to find the file I want. Virtual desktops don't offer a solution either, because these files are all related to the same task and I'm not going to split the views for one task arbitrarily across two screens. A simple tabbed-MDI interface provides the only reasonable way for me to manage this scenario. It's the same interface that generations of emacs programmers have used happily and productively, even if they never called it MDI.
Personally, I'd rather live in an all-MDI world than an all-SDI one, but I can accept that both have their place. Perhaps you can explain to me why it's A Good Thing to have a bunch of separate top-level windows instead of hierarchical windows, because in general I find the former a pain in the ass. Virtual desktops give some but not all of the benefits I derive from MDI, and generally strike me as a mere kludge to make SDI a little less awful, when a solution to SDI's problems already existed.
>If I recall, there were no trolls in Lord of the Rings; they're not organized by Mordor
Trolls are explicitly mentioned as being among the troops commanded by the Captain of Minas Morgul on the battle before the Gates of Mordor, staged to distract Sauron from the hobbit who was at that same moment climbing the slopes of Mount Doom with the Ring. Actually these were a special kind of troll bred by Sauron, called Olog-Hai. Pippin saved Beregond by killing one.
The Council of Elrond discussed this very issue, and two reasons were given.
Such a large well-armed war party would have drawn way too much unwelcome attention, and the "they'll never suspect this motley crew could possibly have anything so valuable" approach seemed more likely to succeed. This turned out to be true.
Many of the big decision-makers at the CoE believed that there was more to hobbits in general, and this hobbit (Frodo) in particular, than met the eye*, particularly with respect to being able to resist the ring's mental "pull". This also turned out to be true.
Would you rather have a single program abend, or have the whole system crash? You get the former with the mainframe, and all too often even on "mature" flavors of UNIX (forget Windows) you get the latter instead.
>you mean apart from the fact that CE's kernel is based on NT's
Wrong. If I recall my history correctly, when faced with a new set of processors and a new type of system (in terms of memory hierarchies, I/O capabilities, etc.) MS tried two different approaches in parallel. One was to port NT, and one was to write a new OS originally called Pegasus. The latter approach won out.
I got this information from the intro of a book called Essential Windows CE Application Programming, by Robert Burdick. It may therefore not be totally authoritative, but it seems a little more believable than an unsubstantiated claim on Slashdot.;-)
>Am I the only one who sees some possible big, huge, gaping security holes here?
No, you're not. Your concern is right on target; this is one of many cases where you simply cannot link the files. Metadata, especially security metadata, affects the interpretation of data, so the same data with a different owner or permissions is in a very important sense not really the same data after all. I hope and expect that the folks who implemented this are well aware of such concerns.
>Encrypted files. Sounds like these would break the system
It depends on where the link interpretation happens relative to where the encryption/decryption happens, but you're probably right. There is an obvious solution, though: don't use this feature on EFS.
>Swap space. From what I know, M$ systems store swapped data as "files." Suppose two of these had the same content?
Actually, this would work as long as the virtual memory subsystem was above the link interpretation - which is a tough call given the hairy interdependent way these things work on NT. Nonetheless, there are plenty of reasons to make swap files exempt from this sort of linkage, and it would be easy to do so.
>The other piece implements the links, >... >there are no copy on write
Here's the piece you're missing, Sparky: "implementing the links" might or might not involve copy-on-write semantics. You don't know, I don't know, the evidence isn't there for us to know.
Now, here's what we do know. If they just follow links without doing COW, you're right: it's easy to do, it's nothing special, etc. It's also pretty darn useless, and even worse than useless, in ways that other posters have pointed out. Maybe MS is stupid enough that it took them this long to do something so trivial, and that they're willing to release such a broken version of the functionality. You and I might _want_ to think they're that stupid, but maybe they're not. These are the same people who wrote a brand-new filesystem that doesn't have "undetected data corruption" problems if you turn on async metadata writes - unlike the most commonly used filesystem on Linux. Unlike you, they do have Clue One, even if they're not gods and the MS marketroids got a little carried away portraying them as such.
We can't tell from one marketing bumsheet which way they actually did it, but he theory that they actually took the time to implement COW - doing the right thing for once - fits the evidence we do have much better than your "MS sux, perl roolz, crontab=transparent" theory does.
So much for "Plain Old Text". It thought the angle brackets around "linux/errno.h" were some weird sort of HTML tag, and stripped the whole construct out.
Error 28 from and similarly on other UNIX flavors: "no space left on device". It's the error code an FS uses when it needs to allocate a block - for any purpose - and can't find one.
>So then what happens when your file system is full? Modify a file, even make the file smaller than it once was,...
You're right, it's a tricky case, but I don't think MS's trick makes it all that much worse than it has already been for ages. In general, you don't know when a write to a file might cause a new block to be allocated - creating the potential for an ENOSPC - where there was only a hole before. In order to know that, your program would have to know when it's crossing a block boundary, and building that kind of knowledge into a program is generally a bad idea. Of course, any modern OS/FS allows you to explicitly preallocate space, and any even non-idiotic implementation on MS's part would prevent files containing such explicit preallocations from being "coalesced". Of course, the folks at MS might well be idiots.;-)
Another problem that has also existed for ages is that many programs write a file X by actually writing out tempfile Y, then renaming Y to X. There are actually some good reasons for doing this, but in the process the potential for an ENOSPC is increased. C'est la vie.
The trick MS is using doesn't necessarily create any new opportunity for this type of error, and it's something any decent program needs to deal with anyway.
>Symlinks are great, but not ALL duplicate files should become them.
I strongly suspect that the real innovation here is using the ancient virtual-memory trick of "copy on write" to files. In your example, foo.conf and foo.conf.old would be links to the same data at first, but the moment you write foo.conf the link gets broken and a fresh copy of the data is created automagically so your updates to foo.conf don't affect foo.conf.old. Problem solved.
For reasons described in my earlier post I'm not sure this is a great idea, and it certainly isn't likely to "free up as much as 80 to 90 percent of the space on a server, allowing users to store as much as five to 10 times the information" as MS claims, but I'd be amazed if even MS would allow the problem you describe to occur. It's too obvious even for then. In fact, this may provide the answer to the "why did it take them so long" question. Plain old symlinks would be easy to add (been there, done that, on an MS platform) but adding the COW behavior could be tricky.
>The whole reason I object to this objection is because you don't seem to understand the idea of "development release".
I understand the concept of "development release" just fine. I've been involved in more of them than you. I'll put this simply so even a moron like you can understand it.
Even for a development release, there are certain kinds of breakage and certain levels of carelessness that are unacceptable.
It's not a difficult concept, and what we're seeing in 2.3.x is that we're on the wrong side of that line. Some people just need to clean up their acts, and that's all there is to it.
>But this software is BETA, which is why it's in the UNSTABLE kernel numbering scheme.
Unlike some of the other BS people have posted in response to my original comments, this may be a genuine philosophical difference. You see, I've been working for ten years in a world where "beta" means that the people responsible for the product have done everything they could to ensure that it's free of major defects, but they want to get some real-world experience before they "close the book" and call it done. A beta isn't expected to be perfect, but it shouldn't have any _known_ defects of a certain severity level and adequate testing should already have been done to verify that nothing's in there that will later seem "obvious". Linux 2.3.x clearly does not meet this standard. Maybe this standard is too stringent and inappropriate for Linux 2.3.x, what with the cooperative development model and all. In fact, I believe that is probably the case. However, I think the current state of Linux 2.3.x fails to meet _any_ reasonable quality standard, even one more appropriate to the situation.
I still believe that there's a certain level of diligence that should be expected before beta, before alpha, before hand-off to QA, even before other team members get their hands on a new piece of code. Obvious standard tests should be run to check for regressions, for one thing, and the code should not leave that developer's hands while such obvious regressions exist. For example, I run Connectathon tests before I check in even to a development branch, and if I see a new failure I don't check in. That doesn't mean I'm special, either; it's basic stuff that should be required of every developer. What I'm seeing, and what I'm complaining about, is that even these very basic rules clearly are not being followed in Linux development.
Nobody in their right mind expects any software to be perfect, least of all something clearly labelled "unstable", but I think it's perfectly reasonable to expect that things won't be broken _more than they have to be_. And that's not just a philosophical thing, either. Detecting and fixing bugs sooner rather than later is also more efficient. What if fixing O_SYNC requires more than superficial changes, requiring everyone else to change their code yet again after they'd already changed it once to go along with the new VFS stuff? Wouldn't it have been better if the person who broke O_SYNC had been required to fix it _before_ the whole suite of related VFS changes was sent out to affect everyone?
>The same is true of O_SYNC: it is not vital to many people
Are you kidding? True, it's not vital to that many people, but to those who do need it there is no substitute. A filesystem that doesn't properly support O_SYNC (and yes, I'm aware that ext2fs never did, even before the current breakage) is simply not suitable for mission-critical apps. It's that simple.
>Basically, Linux's VFS layer is a VERY powerful concept,
...which has been around longer than Linux itself. As a filesystem developer, I was stunned to find that Linux didn't have a VFS layer to speak of years ago. The value of such an abstraction, and the methods for implementing it, were old news even ten years ago when I started working on UNIX.
>Obviously, you have never developed software on such a huge scale as the linux kernel.
Actually, I have. I was in the OS groups at two of the early UNIX-SMP companies, and I've worked on a half-dozen other kernel-oriented projects where the complexity was comparable. The point is that it's precisely _because_ of the large scale of something like a kernel that a more disciplined approach is needed. Fast and loose works fine for small stuff. What we're seeing is precisely the sort of breakdown that always occurs when you apply the same approach to big stuff. It's avoidable, but only if the developers exhibit some maturity. As you have amply demonstrated, though, that is a quality often lacking in this community.
>Before you start insulting the (majority) of people that give their FREE TIME to develop YOUR KERNEL, realize there is a lot more to programming than your simplistic view.
Typical slashdot. Someone says something critical of Linux, and respondents assume that the poster is a non-programmer. Au contraire. The whole reason I object to seeing this kind of breakage is because I know this stuff can be done better. I've done it better myself. I've worked alongside hundreds of others who know about basic software hygiene, who developed many of the techniques and algorithms (and sometimes the code) that has since been recycled in Linux. Linux can be improved by a willingness to learn not just technical stuff but also organizational stuff from the experts, instead of having to learn every lesson and reinvent every wheel the hard way.
An example: on my current project, I have often made changes that have caused breakage in msync in the Connectathon tests. This is exactly correspondent to the msync breakage in Linux right now. The difference? I go out of my way to _find_ that breakage by running appropriate tests, and then I _fix_ it before anyone else - even my own team members - is affected by it. There's no reason at all why an open-source developer can't do exactly the same thing. The reason many don't has nothing to do with philosophy or organization structures. It has only to do with individual laziness and lack of self-discipline.
It's really sad to see so many items on the list that indicate regressions caused by earlier checkins/merges. For example:
>msync fails on NFS
>UMSDOS was broken by the fs changes
>Restore O_SYNC functionality
These are important pieces of functionality that worked once. Why were checkins/merges allowed that broke these? Wasn't the code reviewed? Didn't anyone test for these? Say all you want about "open source is better" and "debugging is parallel" but these sorts of things would never have slipped through the checkin/review process in any decent OS group (I should know, I've been in a few).
The msync and O_SYNC bugs would have shown up using any number of public-domain standard tests, and the people who broke them should never have forwarded the code to _anyone_ else with bugs like these still present. That's basic "software hygiene", and failing to require even that much just gives the whole Linux community a black eye. Why are we doing MS's PR folks' job for them?
There's a new "standard" every freaking week, whether it's being pushed down our throats by ISO, by MS, or by some bunch of geeks in a basement. It's unreasonable to suppose that anyone - open source or closed - will continue to support every piece of new hardware (not just their own, but new CPUs, new chipsets) and fix every bug in old drivers and support every "standard" in the universe, all in time for last night's dinner. They have to make a resource-allocation decision, and I don't think you or I can say we'd make that decision any better.
And please don't trot out the old canard about open source making everything better. With any complex piece of software someone has to make the calls about what goes into the release and what doesn't. And then somebody has to test it all. Changes made to benefit platform A still have to be tested for platforms B through Z. For every thousand people who whine about this feature or that standard, you're lucky to find one who can make any sort of positive contribution to help the process along.
Yeah, like the average Linux user has clue one how to fix a bug in a video-card driver. There are drivers for much more common and much simpler devices - e.g. some Ethernet cards - that are unmaintained because there's nobody in the entire Linux community with both the skills and the interest to maintain them. You think you're going to find lots of people who will be able to fix problems in a driver for a device more complex than a modern CPU without messing things up even further? Dream on.
>Open-source software, subject to constant peer review, evolves and gets more secure over time. But as more crackers seek and find the better-hidden flaws in opaque binaries, closed-source software gets *less* secure over time.
If I wanted to go looking for security bugs, such as buffer underruns, it would be far easier with source than without. So it might appear that it would be _easier_ to compromise an open-source system.
But wait, you say, they'd be easier to find _if they were there_ but because everything's open to scrutiny all such "low-hanging fruit" would have been found, reported, and fixed in the open-source system long ago, whereas they'd remain "latent" in the closed-source program. Oh really? Are there more people who find and report bugs to people who can fix them, or more people who find bugs and either keep quiet or only distribute the information to other script kiddies? Are there more people of either type targeting Linux, or Windows? Do these factors perhaps make just a little bit of a difference in how the security of a system changes over time?
My point is not to say that open-source software is or is not in reality more secure than closed-source software, but that any such difference has little to do with availability of source. There may be a difference based on source availability, but that difference is overwhelmed by the basically non-technical difference based on how many "good guys" and how many "bad guys" have an interest in a particular platform. The statement that closed-source software becomes less secure over time is not based in any kind of facts or logic, though within any particular small sample it may seem true. As responsible, reasoned advocacy ESR's piece is barely half a step above "open good, closed bad" which is itself not even half a step above "Linux roolz, Windows sux". At least these almost-equivalent statements have the merit of brevity, in stark contrast to the pompous and verbose style we've all come to "enjoy" in ESR's writing.
>The source KPI remains consistent within a stable series. Since the source is open and the source KPI is consistent, who cares about the binary KPI?
I thought I was answering that question, but apparently not clearly enough. In brief, then: relying on people to change their source when you change an interface is an invitation to error, because sometimes they don't. Sometimes they're not around to do it. Sometimes they don't recognize that they need to change their code. Sometimes they change their code and introduce a whole new bug in the process. It's better for system stability if their code continues to work without intervention, thank you very much.
>Try 2.3. You aren't the only one who thought the old way sucked.
I'll believe it when I see it. I've seen enough false claims about what has improved in past releases that I'm just a little bit skeptical. That code had a _long_ way to go, and kt seems to have a lot of stuff about this filesystem or that filesystem still needing updates to comply with the new interface. My build of 2.3.99-pre4 just last night kept failing in umsdos because of this stuff. Still seems to be very much a work in progress, from where I sit.
>While one correct, fast, and maintainable solution can be replaced with another that is easier to interface with, no decrease in correctness, performance, or maintainability is considered acceptable
That's pretty much what I've been saying. Unfortunately, the new interface is often not provably faster or more correct than the old one - nobody even bothers writing or running the tests that would allow them to make such a statement out of knowledge rather than bravado - while the changes have severe implications for the correctness of other code that worked fine with the old interface.
In my experience, the people who are most afflicted by the "rewriting fixes everything" are junior programmers. Senior programmers have learned the hard way that rewriting something that's anywhere near functional _will_ introduce errors not present in the original, that doing it properly involves ten times more effort in regression testing than in coding, and that if you're not going to do it right you should leave well enough alone no matter how it offends your aesthetic sense. It's too bad that in the open-source community the people whose egos won't be satisfied until they've left their mark on the code - no matter how poorly justified their changes are - tend to outnumber and outyell the people who know and care about producing quality software. The signs of the creeping rot that results from all this marking of territory are everywhere in Linux, and some of the big dogs are the ones pissing on things the most.
>The nice thing about Linux is that when a failed effort to think of everything starts to cause problems, it gets replaced. In the proprietary world it stays around forever (IDE, the 640 limit, the list goes on).
Neither of those examples is of an attempt to think of everything. In fact, both are examples of the _lack_ of foresight that is antithetical to what I'm talking about.
>I've used Solaris. I admin it sometimes
And you suppose that have less experience with Solaris? Or, for that matter, with Linux? Why? Remember, people who disagree with you may occasionally do so because _you_ are the one who needs educating.
Argh. It would have been nice if they had mentioned Dolphin Interconnect Solutions as a (the?) vendor of SCI-based interconnects, or that Dolphin supported VIA before and better than GigaNet. Dolphin had all sorts of problems, including production problems, but at a basic technology level they were always way ahead of anyone else. 3.2Gbps, 2.5us latency - that's the real deal. If they'd had the resources that Myricom's hype machine generates, they'd have cascadable mega-switches and much better economies of scale (lower prices) and nobody would ever consider buying GigaNet or MyriNet pieces of crap.
*sigh* But it was not to be. Dolphin ran up against compatibility and performance problems with PCI chipsets before anyone else did, and the drivers weren't stabilized soon enough, and they never really figured out who their market was, and they made the major mistake of being honest with customers while competitors were bullshitting about stuff as though they actually had it ready to ship when in fact it was barely even on the drawing boards. In the end the liars and cheats got the mind and market share, and Dolphin is barely eking out an existence nowadays.
I also take issue with the following from the article:
>HA clusters may perform load-balancing, but systems typically just keep the secondary servers idle while the primary server runs the jobs.
Bull. Idle standby is just _so_ early-90s. I worked on eight-node mutual-standby (i.e. load sharing with potential for full failover) clusters in '94. We were before most people, but not first. Nowadays almost nobody would buy an HA solution without this capability.
>Linux has no consistent or guaranteed kernel binary PI. That's right, none. Linus and everyone else doesn't see the need
Then they're idiots. Consistent interfaces are the lifeblood of continued interoperability, whether you're talking about GUI fluff or kernel grunge. If lots of people call a function or use a data structure, great care should be taken to preserve that item's behavior or change it only for very good reasons. (To this end, BTW, documenting that behavior in detail is a valuable step but all too rare in the Linux world. Documenting dependencies on specific behavior would be nice too.) Failure to preserve behavior will lead to other components failing, often in very subtle and non-obvious but no less severe ways, and in ways which require more than a mere recompile to address.
One of the biggest problems with Linux is the lack of good internal abstractions that allow people to do different things without interfering with one another. For example, as a filesystem developer I can only shake my head at the Linux VFS and buffer-cache layers. It seems like there's always some asshole changing a function or data structure or locking schema to make their own "neat idea" work, without considering the impact their change will have on others. I've seen project after project lose time from this lack of development discipline. Many times I've seen people make the situation worse by countering one hack with another instead of improving the underlying abstraction so it serves everyone's needs better.
Some code is "politically protected" by the fact that some Major Figure in Linux development will scream at you when you touch it. Other code is protected by being hard to understand, or in other words bad code is less likely to change. What's missing is a widespread recognition that the most important reason to protect code from change is the importance to other components of maintaining its current behavior even in small details.
>In Linux, when it's determined that an existing KPI (it's not an API if they're not applications after all) sucks, it gets replaced in the next development series.
And that's great. Sucky interfaces should be replaced with better ones. However, "sucky" includes "incompatible" as a major component. Major surgery should be undertaken only after understanding and agreement have been reached on the goals and ramifications, and all too often that is not the case.
Even when an interface is completely overhauled, I see no good reason not to maintain compatibility with at least version N-1 of that interface. Unfortunately, doing this usually requires foresight in the _original_ version of the code, leaving room for versions or capability flags so that both called and callers know which behavior to expect or provide. Sadly, the code most in need of an upgrade is inevitably also most marked by lack of programmer foresight. Humility is not a common trait in this community. Everyone thinks their version can never be improved upon, so they don't build in the features that allow future improvements to be made with minimum pain.
>In Solaris or Windows, where most people don't get decent performance anyway
That's a pretty offhand and inaccurate statement, at least for Solaris. It's not great, but neither is Linux.
>I don't, in this case, see how a tabbed MDI interface is much better than a tabbed SDI interface. In fact I've seen many Windows users work exactly like this; all windows maximised, and switching between them using tabbing or the taskbar.
When I referred to a tabbed MDI interface, I meant something much like the taskbar provided by the application (EditPlus) rather than using the Tab key. I want to be careful of comparing anything to bad SDI here, but the main reason I can't really get the same effect with virtual desktops and the original taskbar is that the original taskbar has some pathologically stupid behavior when it comes to things like clicking on the icon for a window that's already open but not frontmost.
>What MDI - in full, subwindow mode - does is to restrict the control I have over placement of windows, by confining all windows owned by one application to a rectangle, which itself obscures everything under it.
Well, yes, that's kind of what MDI is all about. I don't find MDI very useful except with subwindows maximized, and I can't recall seeing anyone else use it much that way either. Seems to me that it reproduces the original screen-clutter problem in a much smaller space, which only makes things worse. Again, this is a matter more of implementation than approach.
>With MDI, you're limited to grouping by owner application. I do not often find such grouping terribly useful.
Yes, this is a problem/limitation, which brings us back to the "one window, one application" fallacy. In a more component-based approach, such as a web browser or DevStudio, it's much less of an issue because components (rather than applications) which perform multiple tasks are allowed to coexist harmoniously within one frame and don't have to each open their own top-level windows to interact with the user.
>For one thing, there's an index to all toplevel windows, the taskbar
...which gets filled up rather quickly if the user has even the slightest ability to multitask. At that point the icons in the taskbar become indistinguishable from one another (which one of these telnet windows is on the test machine I need?) and the only way to resolve it is to use a multi-row taskbar - sucking up more precious screen space.
>M$'s idiocy in tying focus to stacking order
Interesting. The other pro-SDI advocate seemed to like this behavior.
First off, I've already explained how MDI solves the particular problem I face better than virtual desktops do. It's a common problem. I'd think it would be hard to argue that virtual desktops are better than MDI without addressing that scenario, but I guess that here on slashdot anything is possible.
>MDI kills that, because every application has its own way of selecting documents.
>...
>What makes it worse is that you can't easily tile documents side-by-size, only top-bottom, and because of the MDI nature
This sounds like a complaint about _poorly implemented_ or inconsistent MDI, not with MDI in general. I could create an SDI application that pops up 18 different windows to perform one task, and it would be every bit as annoying as MDI could ever be, but that doesn't invalidate SDI in general. Comparing well-done SDI to poorly-done MDI doesn't prove much of anything. The Office apps are notorious for flouting MS's own interface guidelines; despite their prevalence, they're not good examples when discussing the _inherent_ advantages or disadvantages of MDI.
This is why user-interface guidelines are a good thing. Such guidelines often don't so much say "do _this_" as they say "_if_ you do this, do it _this_ way" so that the user is presented with consistent and predictable behavior.
>Virtual desktops are the more elegant solution to the problems you describe.
I'd say about equally elegant, and then only when VD works (which it often doesn't on Windows, where any number of programs seem to circumvent the hooks that VD code relies on). More in a moment.
>It gives you more screen real estate
No, it does not. My screen is still exactly the same size, capable of displaying exactly the same amount of information simultaneously.
>In MDI apps I tend to maximize the top-level window anyway, so how is that worse than switching screens?
That's sort of my point: that virtual desktops and MDI are mostly interchangeable. If people - including Microsoft, adhered to UI standards in implementing MDI, much of the reason for having virtual desktops would go away, especially since you can have as many MDI windows as you want but very rarely is the number of virtual desktops dynamically changeable. The only thing VD can do that MDI can't is allow windows from multiple separate programs to be grouped in a workspace, and even that wouldn't be much of an issue if plugin/component interfaces were more commonly used. We could have some fun discussing the "one window, one application" fallacy that underlies much of the MDI vs. SDI debate, if we want.
An interesting question might be: how much would you like SDI if you didn't have a virtual-desktop crutch to lean on? Personally, a system without either MDI or virtual desktops (which I have also used for many years, BTW) is significantly less usable than a system with either one. We're really arguing MDI vs. virtual desktops, which is a fun exercise, but not really the same as comparing MDI vs. SDI.
>MDI is, IMHO, not a suitable interface for anything at all.
That's not a very humble opinion.
IMNSHO, MDI is very useful to avoid screen clutter. As a programmer, I often need to switch between viewing/editing a dozen or more source files at once, all as part of the same basic task. If I had each of these documents in separate windows, they'd be overlapping every which way and I'd go nuts trying to find the file I want. Virtual desktops don't offer a solution either, because these files are all related to the same task and I'm not going to split the views for one task arbitrarily across two screens. A simple tabbed-MDI interface provides the only reasonable way for me to manage this scenario. It's the same interface that generations of emacs programmers have used happily and productively, even if they never called it MDI.
Personally, I'd rather live in an all-MDI world than an all-SDI one, but I can accept that both have their place. Perhaps you can explain to me why it's A Good Thing to have a bunch of separate top-level windows instead of hierarchical windows, because in general I find the former a pain in the ass. Virtual desktops give some but not all of the benefits I derive from MDI, and generally strike me as a mere kludge to make SDI a little less awful, when a solution to SDI's problems already existed.
>If I recall, there were no trolls in Lord of the Rings; they're not organized by Mordor
Trolls are explicitly mentioned as being among the troops commanded by the Captain of Minas Morgul on the battle before the Gates of Mordor, staged to distract Sauron from the hobbit who was at that same moment climbing the slopes of Mount Doom with the Ring. Actually these were a special kind of troll bred by Sauron, called Olog-Hai. Pippin saved Beregond by killing one.
The Council of Elrond discussed this very issue, and two reasons were given.
* ...or Eye. Get it?
Would you rather have a single program abend, or have the whole system crash? You get the former with the mainframe, and all too often even on "mature" flavors of UNIX (forget Windows) you get the latter instead.
>you mean apart from the fact that CE's kernel is based on NT's
;-)
Wrong. If I recall my history correctly, when faced with a new set of processors and a new type of system (in terms of memory hierarchies, I/O capabilities, etc.) MS tried two different approaches in parallel. One was to port NT, and one was to write a new OS originally called Pegasus. The latter approach won out.
I got this information from the intro of a book called Essential Windows CE Application Programming, by Robert Burdick. It may therefore not be totally authoritative, but it seems a little more believable than an unsubstantiated claim on Slashdot.
>Am I the only one who sees some possible big, huge, gaping security holes here?
No, you're not. Your concern is right on target; this is one of many cases where you simply cannot link the files. Metadata, especially security metadata, affects the interpretation of data, so the same data with a different owner or permissions is in a very important sense not really the same data after all. I hope and expect that the folks who implemented this are well aware of such concerns.
>Encrypted files. Sounds like these would break the system
It depends on where the link interpretation happens relative to where the encryption/decryption happens, but you're probably right. There is an obvious solution, though: don't use this feature on EFS.
>Swap space. From what I know, M$ systems store swapped data as "files." Suppose two of these had the same content?
Actually, this would work as long as the virtual memory subsystem was above the link interpretation - which is a tough call given the hairy interdependent way these things work on NT. Nonetheless, there are plenty of reasons to make swap files exempt from this sort of linkage, and it would be easy to do so.
>Speed considerations.
Yep. Major hog there.
>The other piece implements the links,
>...
>there are no copy on write
Here's the piece you're missing, Sparky: "implementing the links" might or might not involve copy-on-write semantics. You don't know, I don't know, the evidence isn't there for us to know.
Now, here's what we do know. If they just follow links without doing COW, you're right: it's easy to do, it's nothing special, etc. It's also pretty darn useless, and even worse than useless, in ways that other posters have pointed out. Maybe MS is stupid enough that it took them this long to do something so trivial, and that they're willing to release such a broken version of the functionality. You and I might _want_ to think they're that stupid, but maybe they're not. These are the same people who wrote a brand-new filesystem that doesn't have "undetected data corruption" problems if you turn on async metadata writes - unlike the most commonly used filesystem on Linux. Unlike you, they do have Clue One, even if they're not gods and the MS marketroids got a little carried away portraying them as such.
We can't tell from one marketing bumsheet which way they actually did it, but he theory that they actually took the time to implement COW - doing the right thing for once - fits the evidence we do have much better than your "MS sux, perl roolz, crontab=transparent" theory does.
So much for "Plain Old Text". It thought the angle brackets around "linux/errno.h" were some weird sort of HTML tag, and stripped the whole construct out.
Error 28 from and similarly on other UNIX flavors: "no space left on device". It's the error code an FS uses when it needs to allocate a block - for any purpose - and can't find one.
>So then what happens when your file system is full? Modify a file, even make the file smaller than it once was, ...
;-)
You're right, it's a tricky case, but I don't think MS's trick makes it all that much worse than it has already been for ages. In general, you don't know when a write to a file might cause a new block to be allocated - creating the potential for an ENOSPC - where there was only a hole before. In order to know that, your program would have to know when it's crossing a block boundary, and building that kind of knowledge into a program is generally a bad idea. Of course, any modern OS/FS allows you to explicitly preallocate space, and any even non-idiotic implementation on MS's part would prevent files containing such explicit preallocations from being "coalesced". Of course, the folks at MS might well be idiots.
Another problem that has also existed for ages is that many programs write a file X by actually writing out tempfile Y, then renaming Y to X. There are actually some good reasons for doing this, but in the process the potential for an ENOSPC is increased. C'est la vie.
The trick MS is using doesn't necessarily create any new opportunity for this type of error, and it's something any decent program needs to deal with anyway.
>Symlinks are great, but not ALL duplicate files should become them.
I strongly suspect that the real innovation here is using the ancient virtual-memory trick of "copy on write" to files. In your example, foo.conf and foo.conf.old would be links to the same data at first, but the moment you write foo.conf the link gets broken and a fresh copy of the data is created automagically so your updates to foo.conf don't affect foo.conf.old. Problem solved.
For reasons described in my earlier post I'm not sure this is a great idea, and it certainly isn't likely to "free up as much as 80 to 90 percent of the space on a server, allowing users to store as much as five to 10 times the information" as MS claims, but I'd be amazed if even MS would allow the problem you describe to occur. It's too obvious even for then. In fact, this may provide the answer to the "why did it take them so long" question. Plain old symlinks would be easy to add (been there, done that, on an MS platform) but adding the COW behavior could be tricky.