Domain: lkml.org
Stories and comments across the archive that link to lkml.org.
Comments · 526
-
Re:is it shipping to customers ?
RTS could make Red Hat happy by running a Black Duck analysis on their proprietary code and sharing the result
Bradley Kuhn addressed this already with two objections:
- Blackduck can only confirm that the code in question doesn't copy directly from code in it's look-up database. It can't determine whether a given bit of modified code is a derivative work under copyright law and hence a possible GPL violation (where GPL code is involved).
- The Blackduck software is proprietary. While their clients may feel assured (and are perhaps indemnified against mistakes), copyright holders have no assurance that the software is exhaustive or accurate in its analysis.
In other words, a Blackduck assurance is a proprietary, "black box" assurance...worthless to third parties.
-
Re:Guilty by confusion.
Not so. The key issue is that NVIDIA does not ship a kernel with their driver, and that the driver is very obviously not based off the Linux kernel. But don't take my word for it.
-
Re:is it shipping to customers ?
You cannot, however, then turn around and distribute a Linux kernel with those proprietary modules, which is what RTS is doing. See this.
-
Re:is it shipping to customers ?
But then Alan finishes off nicely:
But RH could always sue him, or simply provide an open alternative I
guess (or indeed let secure boot and the RHEL plans for it put him out of
business) ;) -
Red Hat can surely do better than speculate...
From the LKML
Your company appears to be shipping kernel features in RTS OS that are
not made available under the GPL...I've heard such statements before. They remind me of SCO and their lawyers back in the last decade, when they accused Linux of containing copyrighted source code.
Result: Not good. I hope it isn't the case for Red Hat.
-
Well, after scaring anyone about FS corruption...
I guess it has come time to tell the truth.
First of all, the bug has never been bisected, and the whole story that hit Slashdot and some other news sites was based solely on Ted's speculation, which was never confirmed. In fact, at the of the same day, Ted admitted that his hypothesis was wrong.
After a few days of investigation, the problem was traced to an experimental mounting option, which is not turned on by default and was intended for developers only. Accidentally, this option was not marked as "experimental", so it is available to users. https://lkml.org/lkml/2012/10/26/570
-
more clarifications
Perhaps the author of this summary could have been more precise. The bug is very unlikely to be triggered, here are some examples: https://lkml.org/lkml/2012/10/24/535 and http://phoronix.com/forums/showthread.php?74697-EXT4-Data-Corruption-Bug-Hits-Stable-Linux-Kernels&p=293446#post293446 . Indeed is a good measure to downgrade to a safe version and wait for a patch to come. I have been using the 3.6.2 on my two Gentoo boxes for a couple of days and nothing happened. As a precaution I will downgrade till they release such fix.
-
Re:Hmmm...
Most binary kernel modules are assumed to be 'not derived works'
... just 'cause nobody wants to argue about every little thing.
This assumption nature of derived was later refined and codified by the use of EXPORT which defines the Kernel ABI which can reasonably be presumed is common to to non-Linux kernels and GPL_EXPORT which defines a 'GPL ONLY' Kernel ABI which one would call specific to Linux. The presumption is then that a kernel module using a GPL_EXPORT feature is being *written for* Linux and not being *ported to* Linux.In other words by DMA-BUF implies that NVIDIA is writing a driver *for* and not porting an existing driver (from Darwin or Solaris for example).
Here is a discussion of a similar situation http://lwn.net/Articles/73121/
Here one of Linus original statements on kernel modules: http://linuxmafia.com/faq/Kernel/proprietary-kernel-modules.html
And here is a more recent one: https://lkml.org/lkml/2003/12/3/228My (hypothetical) opinion is that if NVIDIA where to show their driver on OSX and that the use of DMA-BUF was an insignificant architectural change then the would likely prevail as a non-derived work should they choose to go forward. If however the use of DMA-BUF would be a signficant architectural change then the non-derived argument would be not be easy to make.
-
Re:Still no TRIM on software RAID (md)
Support for TRIM on RAID linear/0/1/10 md devices was quite recently added. The patch series is here: https://lkml.org/lkml/2012/3/11/261. I can't find the actual merge now, but I believe it'll be in 3.7.
-
Dynamic ticks
Linux has had the dynamic ticks (CONFIG_NO_HZ) feature for a while, but that only shuts down the timer tick when the system is completely idle. There is a new feature in the works named "adaptive tickless", see announcement and a recent progress update, that will also shut down the timer tick when the system is running a single task.
-
Re:O_Direct Works Quite Nicely
Except on the many Linux versions where O_DIRECT doesn't work properly. I have kernels where it works as expected; ones where it quietly fails to sync to disk; and ones where using it causes a PANIC. It's never been a priority for that API to function correctly given that Linus thinks direct IO is totally braindamaged.
OMFG I missed that.
Another reason to move Linux away from being a real enterprise OS.
Why?
Because Linus is completely WRONG about there never being a reason for O_DIRECT.
If you ARE doing synchronous writes, why bother moving the data from userland memory, to kernel memory, and then to disk? The app's going to block anyway, so why do the extra copy? How about when you're reading or writing a few hundred gigabytes or more just once? Why cache that? It uses more memory just to slow things down. Yay, wonderful.
The only reason a page cache is useful is for coalescing small writes, or caching data that will be read again.
If you're doing large writes, or know you're not going to read that data again, the cache is wasted time and memory.
-
Re:O_Direct Works Quite Nicely
Except on the many Linux versions where O_DIRECT doesn't work properly. I have kernels where it works as expected; ones where it quietly fails to sync to disk; and ones where using it causes a PANIC. It's never been a priority for that API to function correctly given that Linus thinks direct IO is totally braindamaged.
-
Oblig. reference for humorless mods
Linus gives a quick rundown of kernel version numbering.
The upside is even.x.x means Linus is no longer crazy and he might revert to the slower version rollout.
-
Actual source material
Very disappointed that the geniuses at "Network World" did not include a link to the original article. For articles like this it's much better to read the source material yourself and come to your own conclusions, without the sensationalism and ad-baiting.
-
slashverdicrap
This networkworld.com article gets submitted to
/.:A host of small modifications and a large number of system-on-a-chip and PowerPC fixes inflated the size of release candidate No. 7 for Version 3.5 of the Linux kernel, according to curator Linus Torvalds' RC7 announcement, made on Saturday.
LAST TIME AROUND: Linux kernel 3.4 released
Torvalds wasn't happy with the extensive changes, most of which he said he received Friday and Saturday, saying "not cool, guys" in the announcement. However, the occasionally combustible kernel curator didn't appear to view this as a major setback.
"Now, admittedly, most of this is pretty small. The loadavg calculation fix patch is pretty big, but quite a lot of that is added comments," he wrote, referring to the subroutine that measures system workload.
However, he noted, there were also the assorted changes for SoCs, PowerPC compatibility, USB and audio to be folded in, forcing a comparatively large RC7.
"Ok, so it's still not *huge*, but it's bigger than -rc6 was. I had hoped for less," wrote Torvalds.
He also hopes that it won't be necessary to deploy an eighth release candidate before Version 3.5 of the kernel can be properly rolled out, and urged the community to "go forth and test."
Among the biggest new features expected in Linux 3.5 is enhanced compatibility with the ARM processor family, which are used in a wide array of low-cost computing devices. Several ARM-related fixes are part of 3.5-RC7, according to the official announcement email and changelog.
The H-Online reported earlier today that the final version of Linux 3.5 should be deployed next weekend, if all goes well with RC7.
The h-online.com article the networkworld one is a rehashing of:
Over the weekend, Linus Torvalds reluctantly published a seventh release candidate (RC7) for the 3.5 Linux kernel. In the LKML announcement email, the Linux creator says that he originally thought another RC would not necessarily be required; however, a large number of small pull requests submitted by developers late last week necessitated an additional RC for testing, leading Torvalds to tell the developers, "Not cool, guys. Not cool."
These changes include media fixes, random SOC fixes and PowerPC fixes, as well as patches for the leap second bug that caused Linux systems to freeze because of permanent high CPU loads that resulted in increased power consumption and wasted electricity. "Ok, so it's still not *huge*, but it's bigger than -rc6 was," said Torvalds, adding, "I had hoped for less."
Linus has asked the kernel developers to test the rc7 release to "make sure it's all good", and is hoping that he "won't have to do an -rc8". Barring any major problems over the coming week, Linux3.5 will likely be released next weekend. An overview of the changes made in the 3.5 kernel can be found in TheH's Kernel Log mini-series "Coming in 3.5" which examines the various subsystem developments in the upcoming release.
Review each article and notice what is and what is not a link, and where the links lead.
-
Re:Hold on a second.In the actual e-mail, it's about both size and change velocity:
Because I last week I thought that making an -rc7 was not necessarily realy required, except perhaps mainly to check the late printk changes. But then today and yesterday, I got a ton of small pull requests, and now I find myself releasing an -rc7 that is actually bigger than rc6 was.
-
Re:What about Windows and Mac?
And apparently neither did any desktop Linux systems.
Sigh, have to post anonymous because I don't want to undo a whole heap of moderating.
That is an unwarranted assumption. At 0000 UTC I was running a fully up to date RHEL6 on my desktop and both Firefox and Thunderbird went to 100% CPU and became non-responsive, VirtualBox got strange, and the entire desktop went to a low level of responsiveness as a consequence. Neither Firefox nor Thunderbird would close; I had to use kill on both of them. I elected to just reboot and be done with any lingering effects, and the system was fine after that.
What I saw exactly met this description.
-
Re:Our Red Hat servers had no issues at all
The bug is related to kernel version, IIRC (introduced somewhere in the 2.6 series, resolved in 3.2 or somesuch). So it depends what kernel the distros ran.
More like resolved yesterday (today being July 2, 2012 where I'm typing this).
-
Please read this lkml thread before commenting
This linux-kernel mailing list thread discusses a kernel bug that causes futexes to repeatedly time out, so that code using them (which might include POSIX mutexes and condition variables, if that's what glibc uses for them on Linux) might spin.
That's not the kernel-leap-year-handling bug that was fixed back in March, so it's not as if a properly-patched kernel wouldn't get hit by this (unless you define "properly-patched" as "includes the patch John Stultz came up with on July 1, 2012").
So, yes, this particular bug is Linux-specific (i.e., there's a reason why it hit Linux servers), and might not be the fault of the userland code running atop it (so it might not, for example, be Java's fault).
-
Re:Only Linux affected?
I'm managing a cluster of 2,400 nodes running FreeBSD, and AFAICS, none was tripped off by leap second NTP adjustments. On the other hand, 4 out of 180 Linux nodes crashed simultaneously at that very moment. All this is exceedingly weird, but may indeed point to a subtle bug in the Linux kernel (only?)
Could be, if "crashed" means "had some processes start spinning like mad". If it was a kernel-mode crash, that might be another bug.
-
Re:Extremely weird
Bad idea. It would have prevented kernels affected by the race-condition from crashing, but would have meant most of your running software would have been either hit by this bug or would have been on the mercy of a 17 year old pimple-faced coder.
I think I prefer a crash over the mayhem caused by banking-software not handling a leap-second correctly. That could bankrupt whole countries.
OK, I'm all for having UN*X kernels (including but not limited to Linux kernels) keep their internal time value as a counter initialized to (as best an approximation as possible of) the number of seconds that have elapsed between the Epoch and the time the counter is initialized, and have those calls that are expected to return "seconds since the Epoch" do so by converting a count of seconds that have elapsed since the Epoch into "seconds since the epoch" by subtracting out positive leap seconds and adding in negative leap seconds (preferably in userland). Then the 17-year-old pimple-faced coders can use the POSIX calls and pretend leap seconds don't exist, and the kernel can presumably not have to care about leap seconds and thus not have to worry about the insertion of leap seconds.
Oh, and "this bug" appears, from this LKML thread, to be due to the kernel caring about leap seconds, so it's not as if your software would have been hit by this bug if the stuff that caused the bug didn't happen to exist in the kernel in the first place.
-
Re:Extremely weird
Bad idea. It would have prevented kernels affected by the race-condition from crashing, but would have meant most of your running software would have been either hit by this bug or would have been on the mercy of a 17 year old pimple-faced coder.
I think I prefer a crash over the mayhem caused by banking-software not handling a leap-second correctly. That could bankrupt whole countries.
OK, I'm all for having UN*X kernels (including but not limited to Linux kernels) keep their internal time value as a counter initialized to (as best an approximation as possible of) the number of seconds that have elapsed between the Epoch and the time the counter is initialized, and have those calls that are expected to return "seconds since the Epoch" do so by converting a count of seconds that have elapsed since the Epoch into "seconds since the epoch" by subtracting out positive leap seconds and adding in negative leap seconds (preferably in userland). Then the 17-year-old pimple-faced coders can use the POSIX calls and pretend leap seconds don't exist, and the kernel can presumably not have to care about leap seconds and thus not have to worry about the insertion of leap seconds.
Oh, and "this bug" appears, from this LKML thread, to be due to the kernel caring about leap seconds, so it's not as if your software would have been hit by this bug if the stuff that caused the bug didn't happen to exist in the kernel in the first place.
-
Re:Extremely weird
From my own machines and comparing notes with some other people (all in all, about 3k servers) the bug seems to affect machines randomly. Known facts:
There's a kernel patch that fixes the supposed issue: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b43ae8a619d17c4935c3320d2ef9e92bdeed05d
I don't think that's the issue. The issue discussed in this lklm thread is a different issue with, presumably, a different John Stultz fix.
The fix has been posted at lot of places:
/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date; /etc/init.d/ntp startPresumably meaning "workaround" rather than "fix".
(I'm all for switching unix time to a simple counter and leaving it to the calendar libs to put the leap seconds where necessary)
Sounds good to me, but I thought that was a good idea back in the late '80's; the POSIX people thought otherwise, so....
At least as I read RFC 5905, time stamps in NTP packets are essentially "simple counters", and count positive leap seconds and don't count seconds removed with negative leap seconds. I'm not sure what an NTP implementation is supposed to do with the "leap indicator"; that might be dependent on what sort of time the system is supposed to provide to applications. I don't know whether the Linux kernel giving a damn about leap seconds is due to it trying to supply "POSIX time", i.e. time represented as "seconds since the Epoch" rather than as the number of seconds that have elapsed since the Epoch (yes, the two are different), or if it has to do that to function as an NTP client.
-
Re:FUD?
And there I was thinking that my 3.2.19 kernel was fairly up to date...
At this point, I'm not sure which Linux kernels, other than perhaps the one on John Stultz's machine, are sufficiently up to date.
-
Re:FUD?
the difference being this bug was patched already it only affected systems the were not kept up to date.
A bug, perhaps. This bug, perhaps not.
-
Re:FUD?
The bug has already been fixed for months now
A bug might have been fixed for months now, but I don't think that's the bug here.
-
Re:Linux kernel unable to cope? I think not.
Possibly refers to some of the issues covered here: https://access.redhat.com/knowledge/articles/15145?amp
In particular, to this issue, which apparently first materialized with the recent leap second, and probably not to this issue, which might be the one fixed by this patch.
-
Re:Linux kernel unable to cope? I think not.
yes, an old one that was patched before this became an issue.
And a new one that is either in the process of being patched today (July 1, 2012) or that was patched today, as per the lkml thread that starts here.
the issue if for un-updated/unpatched versions of Linux
"Unpatched" with a patch that didn't exist before the problem showed up...
and shoddily written apps and java
Where "shoddily written" means "using futexes or using something that uses futexes"? I'm not sure I'd be so harsh about using futexes; something that lets you do locking mostly in userland doesn't seem like a bad idea offhand....
-
Re:Why now?
Considering leap-seconds happen every now and then, it seems odd that such fundamental things as Linux and Java can not handle it. AFAIK, it was just about for years ago since we last had a leap-second.
Perhaps the bug that was mentioned in the lkml thread that started with this message was introduced less than four years ago, so the code in question had never gotten exposed to a leap second except perhaps in testing (I don't know how hard it is to reproduce it; John Stultz wasn't initially able to reproduce it in his testing, but eventually succeeded).
-
Re:Why now?
Considering leap-seconds happen every now and then, it seems odd that such fundamental things as Linux and Java can not handle it. AFAIK, it was just about for years ago since we last had a leap-second.
Perhaps the bug that was mentioned in the lkml thread that started with this message was introduced less than four years ago, so the code in question had never gotten exposed to a leap second except perhaps in testing (I don't know how hard it is to reproduce it; John Stultz wasn't initially able to reproduce it in his testing, but eventually succeeded).
-
Re:Why now?
Considering leap-seconds happen every now and then, it seems odd that such fundamental things as Linux and Java can not handle it. AFAIK, it was just about for years ago since we last had a leap-second.
Perhaps the bug that was mentioned in the lkml thread that started with this message was introduced less than four years ago, so the code in question had never gotten exposed to a leap second except perhaps in testing (I don't know how hard it is to reproduce it; John Stultz wasn't initially able to reproduce it in his testing, but eventually succeeded).
-
Re:
The hard system lock bug due to a leap second was patched in 2.6.29, so either you've got some weird related bug, or something is very wrong.
Well, the weird related bug would arguably count as something being wrong. Apparently there is a bug in the handling of the insertion of positive leap seconds that could cause weird behavior with futexes, and that bug appears not to have been fixed until at least July 1, 2012 (I'm guessing John Stultz has worked up a patch).
-
Re:
The hard system lock bug due to a leap second was patched in 2.6.29, so either you've got some weird related bug, or something is very wrong.
Well, the weird related bug would arguably count as something being wrong. Apparently there is a bug in the handling of the insertion of positive leap seconds that could cause weird behavior with futexes, and that bug appears not to have been fixed until at least July 1, 2012 (I'm guessing John Stultz has worked up a patch).
-
Re:FUD?
the difference being this bug was patched already it only affected systems the were not kept up to date.
I would believe that except some of the recent Linux kernels did NOT properly handle the leap second. https://lkml.org/lkml/2012/7/1/19 It was this improper handling of the time change associated with the leap second that sent some software into a tizzy, with the most common side effect being heavy CPU consumption. Some software seems to have have issues regardless of this bug as well.
I agree with the original statement, the if MS had done this the tone of this article would be different.
-
Re:All of my servers were fine
Our problem was with a third party monitoring solution - its daemon process brought every single one of our servers to a near halt by consuming all available cpu cycles at the stroke of gmt midnight.
The OS itself was fine.
Well, if you're talking a Linux kernel, the part of the OS that dealt with leap seconds was not OK, and was "not OK" in a fashion that could cause processes using futexes to spin and consume all available CPU cycles when a leap second is introduced.
This monitoring software is common enough that it likely was behind a lot of the issues seen around the 'net.
...perhaps by virtue of either using futexes (in what I'm presuming is a legitimate fashion) or using something that uses futexes.
-
Re:You probably don't do much Java, then
So are you saying that, in addition to the Linux kernel glitch in question (which appears to cause some userland processes to spin)
Actually, I'm not sure that's the case. John Stultz's mail from July 1, 2012 speaks of a bug where clock_was_set() wasn't called after the leap second was added, and of a patch he was working on, so the bug in question might not have been fixed in March.
-
Re:You probably don't do much Java, then
So are you saying that, in addition to the Linux kernel glitch in question (which appears to cause some userland processes to spin)
Actually, I'm not sure that's the case. John Stultz's mail from July 1, 2012 speaks of a bug where clock_was_set() wasn't called after the leap second was added, and of a patch he was working on, so the bug in question might not have been fixed in March.
-
Re:What about Windows and Mac?
As far as I can tell, all current operating systems handled it fine. It's applications that have problems, mainly server-type apps that actually use the clock for important things.
Linux being heavily affected is just a side-effect of most servers running Linux (although apparently some older versions don't handle leap seconds so cleanly - maybe that has something to do with it?).
Yes, at least one of the problems appears to be a Linux kernel problem. However, as that thread indicates, the consequence of this isn't a kernel crash; it causes futexes to repeatedly time out (or, at least, causing futexes with timeouts to repeatedly time out). I'm guessing, perhaps incorrectly, that this might mean that code waiting for a futex gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, lathers, rinses, repeats, so it makes no progress and chews up tons of CPU.
If so, then:
- this particular problem is specific to systems running Linux kernels with the problem (and hence specific to Linux);
- applications that don't themselves have issues with leap seconds might be affected by this;
so Linux being heavily affected might also be a side-effect of, well, some versions of the Linux kernel having a bug that's triggered by leap seconds.
However, unless an application happens to use futexes in a fashion that trips over the bug, they won't be affected. It might be server applications that are most likely to do so, meaning that you might not see it on, say, a desktop or handheld Linux machine, or even on some servers.
-
Re:You probably don't do much Java, then
As it turns out my biggest problems was customer-supplied software which uses their own java jre's. We install a jre by default and update it whenever possible, but some software (Adeptia, VLTrader, Alfresco) comes with their own ancient jre and scripts to call that over system-supplied java.
Not a single machine crashed (we are very explicitly in charge of what OS-version there's running) but a lot of java locked up and had to be restarted.
So are you saying that, in addition to the Linux kernel glitch in question (which appears to cause some userland processes to spin), there are purely-userland problems? Or, if you're running on a Linux that doesn't have John Stultz's fix, is it that some JREs are vulnerable to the Linux kernel glitch and others aren't?
-
Re:What about Windows and Mac?
My guess ist that Windows simply ignored it, so there never was a 61st second in a minute.
Well, if Microsoft's documentation of the SYSTEMTIME structure reflects the implementation, GetSystemTime() , the claim in that man page^W^WMSDN page that "The system time is expressed in Coordinated Universal Time (UTC)" nonwithstanding, cannot acknowledge the existence of a 61st second in a minute ("The second. The valid values for this member are 0 through 59.", as the SYSTEMTIME page says).
But, just as on UN*X, you have "counter" and "human-style label" times (time_t, struct timeval, struct timespec are examples of the former, and a struct tm as returned by, for example, gmtime() is an example of the latter, on UN*X), with the Windows versions of those being SYSTEMTIME and FILETIME respectively. That page on FILETIME says nothing about leap seconds - does it just keep counting over a positive leap second or does it stop or what? And, if it doesn't just keep counting over a positive leap second, does it just freeze for a while second, or does it slow down over some period of time so that it eventually syncs up, or what?
As for NTP, Microsoft has a page on "How the Windows Time service treats a leap second", which says
When the Windows Time service is working as a Network Time Protocol (NTP) client
The Windows Time service does not indicate the value of the Leap Indicator when the Windows Time service receives a packet that includes a leap second. (The Leap Indicator indicates whether an impending leap second is to be inserted or deleted in the last minute of the current day.) Therefore, after the leap second occurs, the NTP client that is running Windows Time service is one second faster than the actual time. This time difference is resolved at the next time synchronization.
(the author of which needs to be told what "inserted or deleted" implies - do they mean that, regardless of whether a leap second is inserted or deleted, the NTP client that is running Windows Time service is one second faster than the actual time?)
And then there's one more question: if there's anything in the NT kernel that deals with leap seconds, does any version have a glitch, as some versions of the Linux kernel do?
If not, then many of the other problems might not exist on Windows. This email from John Stultz, the author of the fix linked to in the previous paragraph, seems to indicate that at least some of the problems, if not all of them, stem from a kernel bug, so it might be that Java and company might be Just Fine on systems that don't have a kernel glitch of that sort (so they might work fine on at least some non-Linux systems, as well as on Linux systems with the bug fixed).
-
Re:How about...
How about...
...enabling users to upgrade the devices themselves?No kidding. The fragmentation problem with Android comes from the fact that every hardware manufacturer effectively spins its own Linux distro for each device that they manufacture. There should be one or two Android distributions in the world, just like we have with desktop Linux distros.
It's probably related to this comment by Linus (in the same thread where he threatened to stop merging ARM patches altogether):
The long-term situation should be that you should be able to have ONE binary kernel "just work". That's where we are on x86. Really.
[...]
Now, some of it is quite understandable - ie real drivers for real hardware. But a _lot_ of it seems to be just descriptor tables, and I'm getting the very strong feeling that ARM people aren't even _trying_ to make it sane, and trying to standardize things, or trying to aim for the whole notion of "one kernel image, with much more hw description done elsewhere".
-
The explanation, deadlock...do kill the messengerDescribed here (w/dump): https://lkml.org/lkml/2009/1/2/373
Simple version:
"dont kill the messenger" except when the messenger is going to kill you. Its printk sending notice that the leap second happened that deadlocks against the timer doing the leap second (both vying for xtime_lock). Call it a "feature" of the NTP code. Hence the "turn off NTPD" workaround, if NTP doesnt get notified it should implement the leap second from somewhere upstream, it wont notify about it to the kernel, and the printk shouldnt happen.-T
-
Re:btrfs needed the work
This is known as featuritis, and is anathema to the Unix way, where each part should do just one thing, and do it extremely well.
All btrfs does is manage a B-tree filesystem. All grep does is apply a regular expression to a string.
However, the UNIX way is not always even a good thing.
It is also the UNIX way to duplicate a single thing a hundred times for each little feature variation (grep, egrep, fgrep, most of Perl.) That can also be unpleasant for the end user (xterm, gnome-terminal, kterm, gterm, LXterm, terminator, editing Perl.) Great for a system administrator who is expert at their particular tool and only that tool but horrible for everyone else.
That's without getting into the UNIX Way for (lack of) documentation. Or how that one thing is so often the wrong thing so it doesn't matter how well that one tool does it.
btrfs is famously called a rampant layering violation. The roll-up of filesystem-management features in one place actually lets the developers avoid duplicating code (which may actually be about as non-UNIXy as you can get in some ways.) Code that now knows about certain information normally hidden from it can do things differently. This is sometimes better (rapid mkfs) or worse (fsck tool was apparently hard to write.)
In my opinion, it's not interesting for enterprise because you get mediocre features, like RAID support that doesn't cover RAID5, no online file system check
In my opinion, if your enterprise system depends on fsck and not good backups then you don't have an enterprise system. Yes, xfs_repair can do amazing things to mostly trashed disks. But one day your data will take a good fscking where only surviving copy will be the backup copy.
RAID5 implementation from Intel is in the tree, but waiting until after the fsck is done. And btrfsck has been around since, oh, February? And the btrfs-progs you should be using with the 3.4 kernel have btrfsctl included?
I was hoping the RAID5 code was going to land in 3.4, actually. Reading the pull request says that RAID5/6 should be in 3.5. Oh, well.
Of course, if you have enough money to buy an "enterprise" solution, your SAN/NAS should do the thing doing RAID for you anyway.
My major criticism of btrfs is the horrid sync performance. Hosting virtual machines tends to require lots of small writes to disk that make btrfs incredibly non-performant.
btrfs has many sexy, sexy features for a world of enterprise SAN storage and virtual machine hosting. It has thin disks, balanced meta-data, flexible storage, SSD optimized modes, multiple snapshot layers, checksummed data on disk. All of this just because it does one thing and does it well: manage a B-Tree database.
Today it's is just not there in the I/O department, sadly. Probably good for inside the virtual machine guests, though. Only testing will tell.
My money is on NILFS, if nothing else because Oracle gives people a bad taste in their mouths, but ICBW.
Wow, speaking of niche file systems. Log file systems have quite a long history. Of horrible performance and fragmentation. But if we all end up on SSDs, that won't matter. Underlying any file system you put on it, an SSD implements storage as a circular log and performance is fast enough to not depend on huge uncommitted disk caches.
-
Re:Please stop....
You [should] incrememt your major version 4.x.x.x when you release new major features
We include what I'd consider to be major new features in almost every release. Maybe you disagree about what should be considered "major". The point of the numbers is that it prevents us from bikeshedding about these minutia every six weeks and lets us concentrate on what's important.
possibly one can make the argument ALL projects BUT chrome and ff use something in this light
I suppose you could make that argument, but it wouldn't be a very good one. Linux went from version 2.6 to version 3.0 with, in Linus's words, "no special landmark features or incompatibilities related to the version number change".
I wish we could stop arguing about the version numbers. It's the least important part of Firefox. You don't like it. We heard you. I'm sorry. Can we just let it go already?
-
Re:Good News
If the firmware has given us control of PCIe capabilities then it's valid for an operating system to configure ASPM more aggressively than the firmware did. A small number of devices object to this and exhibit various failure modes. Windows provides a mechanism to disable ASPM in the driver, indicated by the Needs=PciASPMOptOut statement in the
.inf file. Trawling through Windows drivers has indicated the following set of hardware that disables ASPM in Windows but doesn't currently disable it in Linux. It makes sense for us to mimic Windows in this situation. (V2: send the version that actually builds)Matthew Garrett, these patches did get noticed by Phoronix.
-
Re:Good job, wants some cheese for your whine?
> Instead of being so high and might
Are we really talking about the same developer who said:
"The number of bug reports we get from people with virtualbox loaded are
truly astonishing. It's GPL, but sadly that doesn't mean it's good.
Nearly all of these bugs look like random corruption. (corrupt linked lists,
corrupt page tables, and just plain 'weird' crashes).This diff adds tainting to the module loader to treat it as we do with stuff
from staging/ (crap). With this tainting in place, automatic bug filing tools
can opt out of automatically filing kernel bugs, and inform the user to file
bugs somewhere more appropriate." -
Re:Maybe they should switch to OpenBSD...
Last time I checked Apple runs their stuff on Windows Azure so maybe Kernel.org should do the same. I mean, Kernel.org have been hacked what now, two or three times? How many times have Windows Azure been hacked? Zero. So, just by looking at statistics moving to that platform could be a good move.
I mean, since we just went odd-version and have the Visual Basic rewrite imminent, being open towards new hosting platforms should be an option.
-
Link broken
The link to LKML in the article is broken. Here's the real one:
-
sounds very similar, but what are the solutions?
linus's views sound very similar to what i've written about, at some length on this subject: https://lkml.org/lkml/2011/7/1/473
the thing is that absolutely nobody has come up with any solutions. the only solution i've heard is the one that i recommended, and there's been no reaction or response to it, as of yet.
the problem is the sheer overwhelming diversity. therefore, the solution is to prioritise linux kernel patches that come from hardware syndicates or specifications that cover more than just the one hardware device. for example, a patch to add in ARM USB3 support would instantly be accepted, because it covers multiple hardware devices. for example, a reference board would be instantly accepted, because it allows companies plural to develop platforms based around it.
what people are forgetting is that because there is no BIOS in the ARM world and no "common hardware platform" (PCI, PCIe, Northbridge, Southbridge - in most cases all of those things are gone. ARM CPUs with PCIe are exceptionally rare). and often there are massive differences between CPUs even on a minor upgrade from the same manufacturer, each hardware device has to have a custom-tailored device layout, and that means a custom-tailored linux kernel.
-
Re:version inflation
I think what he said was that if he ever went to 3.0, it would mean he had gone insane and rewritten the entire thing in Visual Basic.