This linux-kernel mailing list thread discusses a kernel bug that causes futexes to repeatedly time out, so that code using them (which might include POSIX mutexes and condition variables, if that's what glibc uses for them on Linux) might spin.
That's not the kernel-leap-year-handling bug that was fixed back in March, so it's not as if a properly-patched kernel wouldn't get hit by this (unless you define "properly-patched" as "includes the patch John Stultz came up with on July 1, 2012").
So, yes, this particular bug is Linux-specific (i.e., there's a reason why it hit Linux servers), and might not be the fault of the userland code running atop it (so it might not, for example, be Java's fault).
That's a nice ideal, but the reality is that many up-to-date "stable" distribution releases are still using kernels which are susceptible the leap second problem (and haven't had the patch back-ported to them).
To which of the, apparently, two or more leap second problems are you referring? (The latest one, causing the bogus futex timeouts and subsequent CPU-eating spinfests, is, apparently, having a fix developed today, July 1, 2012, so getting that patched would be a little difficult - especially getting it patched before the leap second is introduced.:-))
I'm managing a cluster of 2,400 nodes running FreeBSD, and AFAICS, none was tripped off by leap second NTP adjustments. On the other hand, 4 out of 180 Linux nodes crashed simultaneously at that very moment. All this is exceedingly weird, but may indeed point to a subtle bug in the Linux kernel (only?)
Could be, if "crashed" means "had some processes start spinning like mad". If it was a kernel-mode crash, that might be another bug.
Bad idea. It would have prevented kernels affected by the race-condition from crashing, but would have meant most of your running software would have been either hit by this bug or would have been on the mercy of a 17 year old pimple-faced coder.
I think I prefer a crash over the mayhem caused by banking-software not handling a leap-second correctly. That could bankrupt whole countries.
OK, I'm all for having UN*X kernels (including but not limited to Linux kernels) keep their internal time value as a counter initialized to (as best an approximation as possible of) the number of seconds that have elapsed between the Epoch and the time the counter is initialized, and have those calls that are expected to return "seconds since the Epoch" do so by converting a count of seconds that have elapsed since the Epoch into "seconds since the epoch" by subtracting out positive leap seconds and adding in negative leap seconds (preferably in userland). Then the 17-year-old pimple-faced coders can use the POSIX calls and pretend leap seconds don't exist, and the kernel can presumably not have to care about leap seconds and thus not have to worry about the insertion of leap seconds.
Oh, and "this bug" appears, from this LKML thread, to be due to the kernel caring about leap seconds, so it's not as if your software would have been hit by this bug if the stuff that caused the bug didn't happen to exist in the kernel in the first place.
The fix has been posted at lot of places:/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;/etc/init.d/ntp start
Presumably meaning "workaround" rather than "fix".
(I'm all for switching unix time to a simple counter and leaving it to the calendar libs to put the leap seconds where necessary)
Sounds good to me, but I thought that was a good idea back in the late '80's; the POSIX people thought otherwise, so....
At least as I read RFC 5905, time stamps in NTP packets are essentially "simple counters", and count positive leap seconds and don't count seconds removed with negative leap seconds. I'm not sure what an NTP implementation is supposed to do with the "leap indicator"; that might be dependent on what sort of time the system is supposed to provide to applications. I don't know whether the Linux kernel giving a damn about leap seconds is due to it trying to supply "POSIX time", i.e. time represented as "seconds since the Epoch" rather than as the number of seconds that have elapsed since the Epoch (yes, the two are different), or if it has to do that to function as an NTP client.
yes, an old one that was patched before this became an issue.
And a new one that is either in the process of being patched today (July 1, 2012) or that was patched today, as per the lkml thread that starts here.
the issue if for un-updated/unpatched versions of Linux
"Unpatched" with a patch that didn't exist before the problem showed up...
and shoddily written apps and java
Where "shoddily written" means "using futexes or using something that uses futexes"? I'm not sure I'd be so harsh about using futexes; something that lets you do locking mostly in userland doesn't seem like a bad idea offhand....
Considering leap-seconds happen every now and then, it seems odd that such fundamental things as Linux and Java can not handle it. AFAIK, it was just about for years ago since we last had a leap-second.
Our problem was with a third party monitoring solution - its daemon process brought every single one of our servers to a near halt by consuming all available cpu cycles at the stroke of gmt midnight.
Actually, I'm not sure that's the case. John Stultz's mail from July 1, 2012 speaks of a bug where clock_was_set() wasn't called after the leap second was added, and of a patch he was working on, so the bug in question might not have been fixed in March.
As far as I can tell, all current operating systems handled it fine. It's applications that have problems, mainly server-type apps that actually use the clock for important things.
Linux being heavily affected is just a side-effect of most servers running Linux (although apparently some older versions don't handle leap seconds so cleanly - maybe that has something to do with it?).
Yes, at least one of the problems appears to be a Linux kernel problem. However, as that thread indicates, the consequence of this isn't a kernel crash; it causes futexes to repeatedly time out (or, at least, causing futexes with timeouts to repeatedly time out). I'm guessing, perhaps incorrectly, that this might mean that code waiting for a futex gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, lathers, rinses, repeats, so it makes no progress and chews up tons of CPU.
If so, then:
this particular problem is specific to systems running Linux kernels with the problem (and hence specific to Linux);
applications that don't themselves have issues with leap seconds might be affected by this;
so Linux being heavily affected might also be a side-effect of, well, some versions of the Linux kernel having a bug that's triggered by leap seconds.
However, unless an application happens to use futexes in a fashion that trips over the bug, they won't be affected. It might be server applications that are most likely to do so, meaning that you might not see it on, say, a desktop or handheld Linux machine, or even on some servers.
What we should have is what I've described above, time-zero and a counter. And translations from that to localized date time should be handled by a library.
A value that approximates the number of seconds that have elapsed since the Epoch. A Coordinated Universal Time name (specified in terms of seconds (tm_sec), minutes (tm_min), hours (tm_hour), days since January 1 of the year (tm_yday), and calendar year minus 1900 (tm_year)) is related to a time represented as seconds since the Epoch, according to the expression below.
If the year is <1970 or the value is negative, the relationship is undefined. If the year is >=1970 and the value is non-negative, the value is related to a Coordinated Universal Time name according to the C-language expression, where tm_sec, tm_min, tm_hour, tm_yday, and tm_year are all integer types:
The relationship between the actual time of day and the current value for seconds since the Epoch is unspecified.
How any changes to the value of seconds since the Epoch are made to align to a desired relationship with the current actual time is implementation-defined. As represented in seconds since the Epoch, each and every day shall be accounted for by exactly 86400 seconds.
Note:
The last three terms of the expression add in a day for each year that follows a leap year starting with the first leap year since the Epoch. The first term adds a day every 4 years starting in 1973, the second subtracts a day back out every 100 years starting in 2001, and the third adds a day back in every 400 years starting in 2001. The divisions in the formula are integer divisions; that is, the remainder is discarded leaving only the integer quotient.
If there were a UN*X API to get a count of seconds since the Epoch (in addition to, or instead of, a call to get "seconds since the Epoch"), and a UN*X API to convert those to UTC and local time labels, that would get what you want. Modulo making it work with NTP, the former could be implemented with less difficulty than a call to get "seconds since the Epoch", and the latter is called "the Olson code complete with the leap seconds database".
However, that would then require some mechanism to allow code to schedule something to happen at a given UTC label; simply calculating the UNIX time for that UTC label, getting the current UNIX time, and scheduling it for then-now seconds in the future is insufficient, as the UNIX time for a given UTC label in the future might change if a leap second is scheduled between then and now. (Note that if you support scheduling something to happen at a given local civil time label would already require correction of that sort to handle DST rule changes.) This would also have to do something if you schedule an event for YYYY-DD-MM 23:59:59 and a negative leap second occurs so that there is no 23:59:59 on YYYY-DD-MM; "something" might be "let somebody know and ask them to correct it" or "do it at 00:00:00 on the next day", perhaps depending on the reason why it's scheduled.
As it turns out my biggest problems was customer-supplied software which uses their own java jre's. We install a jre by default and update it whenever possible, but some software (Adeptia, VLTrader, Alfresco) comes with their own ancient jre and scripts to call that over system-supplied java.
Not a single machine crashed (we are very explicitly in charge of what OS-version there's running) but a lot of java locked up and had to be restarted.
So are you saying that, in addition to the Linux kernel glitch in question (which appears to cause some userland processes to spin), there are purely-userland problems? Or, if you're running on a Linux that doesn't have John Stultz's fix, is it that some JREs are vulnerable to the Linux kernel glitch and others aren't?
My guess ist that Windows simply ignored it, so there never was a 61st second in a minute.
Well, if Microsoft's documentation of the SYSTEMTIME structure reflects the implementation, GetSystemTime(), the claim in that man page^W^WMSDN page that "The system time is expressed in Coordinated Universal Time (UTC)" nonwithstanding, cannot acknowledge the existence of a 61st second in a minute ("The second. The valid values for this member are 0 through 59.", as the SYSTEMTIME page says).
But, just as on UN*X, you have "counter" and "human-style label" times (time_t, struct timeval, struct timespec are examples of the former, and a struct tm as returned by, for example, gmtime() is an example of the latter, on UN*X), with the Windows versions of those being SYSTEMTIME and FILETIME respectively. That page on FILETIME says nothing about leap seconds - does it just keep counting over a positive leap second or does it stop or what? And, if it doesn't just keep counting over a positive leap second, does it just freeze for a while second, or does it slow down over some period of time so that it eventually syncs up, or what?
When the Windows Time service is working as a Network Time Protocol (NTP) client
The Windows Time service does not indicate the value of the Leap Indicator when the Windows Time service receives a packet that includes a leap second. (The Leap Indicator indicates whether an impending leap second is to be inserted or deleted in the last minute of the current day.) Therefore, after the leap second occurs, the NTP client that is running Windows Time service is one second faster than the actual time. This time difference is resolved at the next time synchronization.
(the author of which needs to be told what "inserted or deleted" implies - do they mean that, regardless of whether a leap second is inserted or deleted, the NTP client that is running Windows Time service is one second faster than the actual time?)
And then there's one more question: if there's anything in the NT kernel that deals with leap seconds, does any version have a glitch, as some versions of the Linux kernel do?
If not, then many of the other problems might not exist on Windows. This email from John Stultz, the author of the fix linked to in the previous paragraph, seems to indicate that at least some of the problems, if not all of them, stem from a kernel bug, so it might be that Java and company might be Just Fine on systems that don't have a kernel glitch of that sort (so they might work fine on at least some non-Linux systems, as well as on Linux systems with the bug fixed).
So far all I've heard about is affected Linux systems, did Windows and OS X just fine?
The glitch mostly affected POSIX compliant operating systems as POSIX specifies a day as 86400.
So you're saying the glitch could affect OS X (or, at least, OS X Snow Leopard - although Leopard was also registered - but I'll bet Lion behaves, and Mountain Lion will behave, the same way)?
Not sure if this will convince you, but drawing down my feeds outside of a web browser helps my productivity--if I've got a browser open, then I'm viewing about 18 different tabs and refreshing feeds when I should be working on other thngs. Having Akregator running lets me read my rss feeds without the temptation of a browser when I have internet connectivity, yet still lets me read downloaded feeds without internet access.
What are you doing reading RSS feeds when you should be working on other things?
Given that a lot of the stuff I work on requires looking stuff up on the Intarwebs, I'm pretty much fucked there; there's always the temptation to browse, and sometimes I just need a break in the middle of hacking. So maybe that works for you, but it's 100% unconvincing, and 100% wrong, for me.
I'm no power user, but neither am I a complete idiot. I really like KDE4. I hit the kickoff, then type in the name of the program and I can run it. I know how to get under the hood and clank around if I need or want to with linux/KDE4. With OSX, everything is very kindergarten-simple...as long as you work their way. If you want to work your way and not theirs, it's up to you to change.
So if I want to view an RSS feed in my browser in KDE - i.e., work my way and not theirs - is it up to me to change, or does Konqueror now support reading feeds itself?:-)
I'm not a command-line commando, either. I'm smart enough to know how to use an application and just finicky enough to want to use it my way. Maybe I've been lucky, but KDE 4 has worked for me from 4.0 on.
KDE 1 worked fine for me, as my primary desktop environment, atop FreeBSD 3.0 many many years ago, and KDE3+FreeBSD 6-or-so was OK when my Mac notebook was getting its disk recovered, but that was when I decided that separate RSS reader applications were not the answer for me.
(Speaking of command lines, hopefully most modern UN*X+X11 combinations, whether Linux distributions or PC-BSD-style desktop *BSDs or..., have XSel as a standard package or even pre-insalled, so you can do the same thing there that you can do on OS X with pbcopy and pbpaste.)
And others don't. Opinions differ on merits of different desktops; story at 11. "Desktop A rules, desktop B sucks" is, absent data from a broad population of users, a personal opinion, not a statement of fact
Yes, but what IS a fact is that some desktops allow users to configure their desktop the way they like it, with focus-follows-mouse, click-to-focus, and other properties. The problem is that most desktops do not; the designers think they know what's best for everyone, and refuse to allow any configuration at all. If all or most desktops allowed users to set these things, you wouldn't see all this complaining.
Yes, but "more configurable" and "less configurable" are't ipso facto statements of objective merit.
"More configurable" is an advantage to the people who don't like the default configuration, and may be completely irrelevant to those who do.
As for "less configurable", at least when it comes to click-to-focus vs. focus-follows-mouse, some question are:
whether introducing a vendor-supported focus-follows-mouse option would require work on the GUI code that takes away resources that could work on other parts of the GUI - it's quite possible that it would;
whether it would add a point of potential confusion for users - I personally don't think an extra knob, especially under an "advanced" pane, would be a problem here);
whether it would cause problems for existing applications - I have the impression that the focus-follows-mouse tweaks may break some Windows apps, although, apparently, Vista has a configuration option that gives focus-follows-mouse+autoraise, so perhaps those issues have been fixed; I don't know whether similar issues exist with OS X focus-follows-mouse+autoraise tweaks such as the one in MondoMouse, or whether it combines poorly with the single menu bar model of OS X; I don't know whether focus-follows-mouse-without-autoraise would be harder or have other issues on either of those platforms; note also that the GNOME Human Interface Guidelines speak of some issues that app developers have to worry about with focus-follows-pointer:
Note that point-to-focus places a number of restrictions on GNOME applications that are not present in environments such as MacOS or Windows. For example, utility windows shared between multiple document windows, like the toolbox in the GIMP Image Editor, cannot be context-sensitive— that is, they cannot initiate an action such as Save on the current document. This is because while moving the mouse from the current document to the utility window, the user could inadvertantly[sic] pass the pointer over a different document window, thus changing the focus and possibly saving the wrong document.
so I'm curious how Windows or OS X apps handle that case if focus-follows-pointer is turned on - autoraise might make that inadvertent focus change more obvious, but not everybody wants autoraise.
I'm a click-to-focus user myself, these days (I went that way when I had a Windows machine on my desk, even if most windows ended up being terminal emulators sshed into a UN*X box, as I figured if I got used to it I'd have fewer problems switching between different desktop environments, given that I could always turn it on for UN*X+X11), so I personally am fine with OS X in that department; people who like OS X and, presumably, don't mind click-to-focus shouldn't confuse that with "OS X IS THE BESTEST UNIX DESKTOP EVAR FOR EVERYBODY!!!!!!11111ONE!!!!!!!".
Somebody who loves focus-follows-mouse may want to ram his or her fist through the screen when using OS X, and that's a perfectly legitimate response for them - as long as they don't confuse it with "OS X IS TEH SUXXXOR FOR ALL UNIX USERS!!!!!111ONE!!!!!!".
Meeting increasing challenges of hardware, web standards, etc. is necessary (maybe,) but the thing that XP-7-8 has taught me is that needless complications are needless. Maybe it's time the open source community starts asking *why* a particular change is desirable or necessary to the userbase.
The user interfaces tend to become simpler and easier to the eye, while the functionality of the application itself has increased. Hiding a complex functionality behind an easy to use interface are not known strengths of "typical" developers;-)
The complexity of the non-user-interface-parts of applications has increased a lot. Web-browsers are a good example: While the interface got simplified during the last years, the engines showing web-pages got really complex and are maintained mostly by fulltime-developers in the meantime. There seems to be a similar trend in PIM-applications ("cloud"), chat-clients (one simple user-interface, a various number of protocols) and for desktop-search-engines (simple user-interface, really complex stuff going on behind the scenes).
At least for the example in the second paragraph, it's necessary to those members of the user base who want to be able to see Web sites that use Shiny Modern Web Features. If you want to have the developers of the Web standards or technologies that include those features, or the Web developers who use those features on those sites, ask themselves why that stuff is desirable or necessary to the user base, that might be a good idea, but the developers of the free-software Web browsers are probably somewhat stuck here, unless they want to limit themselves to a user base that doesn't use any sites that require the Shiny New Features.
You like the OSX desktop?
I hate it. It is like it was designed for children and gets in the way too often. I want focus follows mouse, I want to get rid of the idiot dock bar thing, I want menus on every screen not just the main monitor.
And others don't. Opinions differ on merits of different desktops; story at 11. "Desktop A rules, desktop B sucks" is, absent data from a broad population of users, a personal opinion, not a statement of fact (regardless of whether desktop A is the OS X desktop or $OTHER_UN*X_DESKTOP and whether desktop B is $OTHER_UN*X_DESKTOP or the OS X desktop); to make it a statement of fact, prepend "for me" and append "your mileage may vary" (and, yes, this applies to you and the person to whom you're replying).
(But it sounds as if Apple may be killing one thing I really liked about Safari relative to, for example, Konqueror - Safari, at least, had an RSS feed reader built in, so I didn't have to fuck around with Akregator. Note: if you want to defend the separation of RSS feed reading from Web browsing, please explain to me - in a fashion convincing to me; convincing to you, by itself, doesn't even come close to sufficing - why I would not want to read a feed of Web pages in a Web browser. But I digress....)
On top of it, SHIP WITH THE FUCKING GNUTOOLS YOU MORONS. The half baked commercial versions of these tools lack way to many features.
To which GNU tools are you referring? Developer tools? They used to ship GCC, but when it went to GPLv3 they decided to put their efforts behind Clang and LLVM instead. I don't know whether the current version of GDB is GPLv3, but they're putting their effort behind LLDB. (They may be "commercial" in the sense of being supported by a vendor, but they're free software.) They never used the GNU assembler or linker; they have their own APSL 2.0-licensed assembler and APSL 2.0-licensed linker; presumably if "half baked commercial versions of these tools" is referring to the assembler or linker, "commercial versions of these tools" means "...commercial assembler and linker" not "...commercial versions of the GNU assembler and linker".
Any time I parse something like "(Partnership).*(American|(Econom(y|ic)))", I immediately lump it into the right wing propoganda bin.
The Partnership for a New American Economy sounds more corporatist than hard-right; there's more than one right-wing propaganda bin, and they're not in, for example, the right-wing nativist bin.
(Oh, and given who heads up the list of co-chairs, the "chair" part is a bit amusing....)
This linux-kernel mailing list thread discusses a kernel bug that causes futexes to repeatedly time out, so that code using them (which might include POSIX mutexes and condition variables, if that's what glibc uses for them on Linux) might spin.
That's not the kernel-leap-year-handling bug that was fixed back in March, so it's not as if a properly-patched kernel wouldn't get hit by this (unless you define "properly-patched" as "includes the patch John Stultz came up with on July 1, 2012").
So, yes, this particular bug is Linux-specific (i.e., there's a reason why it hit Linux servers), and might not be the fault of the userland code running atop it (so it might not, for example, be Java's fault).
That's a nice ideal, but the reality is that many up-to-date "stable" distribution releases are still using kernels which are susceptible the leap second problem (and haven't had the patch back-ported to them).
To which of the, apparently, two or more leap second problems are you referring? (The latest one, causing the bogus futex timeouts and subsequent CPU-eating spinfests, is, apparently, having a fix developed today, July 1, 2012, so getting that patched would be a little difficult - especially getting it patched before the leap second is introduced. :-))
I'm managing a cluster of 2,400 nodes running FreeBSD, and AFAICS, none was tripped off by leap second NTP adjustments. On the other hand, 4 out of 180 Linux nodes crashed simultaneously at that very moment. All this is exceedingly weird, but may indeed point to a subtle bug in the Linux kernel (only?)
Could be, if "crashed" means "had some processes start spinning like mad". If it was a kernel-mode crash, that might be another bug.
Bad idea. It would have prevented kernels affected by the race-condition from crashing, but would have meant most of your running software would have been either hit by this bug or would have been on the mercy of a 17 year old pimple-faced coder.
I think I prefer a crash over the mayhem caused by banking-software not handling a leap-second correctly. That could bankrupt whole countries.
OK, I'm all for having UN*X kernels (including but not limited to Linux kernels) keep their internal time value as a counter initialized to (as best an approximation as possible of) the number of seconds that have elapsed between the Epoch and the time the counter is initialized, and have those calls that are expected to return "seconds since the Epoch" do so by converting a count of seconds that have elapsed since the Epoch into "seconds since the epoch" by subtracting out positive leap seconds and adding in negative leap seconds (preferably in userland). Then the 17-year-old pimple-faced coders can use the POSIX calls and pretend leap seconds don't exist, and the kernel can presumably not have to care about leap seconds and thus not have to worry about the insertion of leap seconds.
Oh, and "this bug" appears, from this LKML thread, to be due to the kernel caring about leap seconds, so it's not as if your software would have been hit by this bug if the stuff that caused the bug didn't happen to exist in the kernel in the first place.
From my own machines and comparing notes with some other people (all in all, about 3k servers) the bug seems to affect machines randomly. Known facts:
There's a kernel patch that fixes the supposed issue: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b43ae8a619d17c4935c3320d2ef9e92bdeed05d
I don't think that's the issue. The issue discussed in this lklm thread is a different issue with, presumably, a different John Stultz fix.
The fix has been posted at lot of places: /etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date; /etc/init.d/ntp start
Presumably meaning "workaround" rather than "fix".
(I'm all for switching unix time to a simple counter and leaving it to the calendar libs to put the leap seconds where necessary)
Sounds good to me, but I thought that was a good idea back in the late '80's; the POSIX people thought otherwise, so....
At least as I read RFC 5905, time stamps in NTP packets are essentially "simple counters", and count positive leap seconds and don't count seconds removed with negative leap seconds. I'm not sure what an NTP implementation is supposed to do with the "leap indicator"; that might be dependent on what sort of time the system is supposed to provide to applications. I don't know whether the Linux kernel giving a damn about leap seconds is due to it trying to supply "POSIX time", i.e. time represented as "seconds since the Epoch" rather than as the number of seconds that have elapsed since the Epoch (yes, the two are different), or if it has to do that to function as an NTP client.
And there I was thinking that my 3.2.19 kernel was fairly up to date...
At this point, I'm not sure which Linux kernels, other than perhaps the one on John Stultz's machine, are sufficiently up to date.
the difference being this bug was patched already it only affected systems the were not kept up to date.
A bug, perhaps. This bug, perhaps not.
The bug has already been fixed for months now
A bug might have been fixed for months now, but I don't think that's the bug here.
Possibly refers to some of the issues covered here: https://access.redhat.com/knowledge/articles/15145?amp
In particular, to this issue, which apparently first materialized with the recent leap second, and probably not to this issue, which might be the one fixed by this patch.
yes, an old one that was patched before this became an issue.
And a new one that is either in the process of being patched today (July 1, 2012) or that was patched today, as per the lkml thread that starts here.
the issue if for un-updated/unpatched versions of Linux
"Unpatched" with a patch that didn't exist before the problem showed up...
and shoddily written apps and java
Where "shoddily written" means "using futexes or using something that uses futexes"? I'm not sure I'd be so harsh about using futexes; something that lets you do locking mostly in userland doesn't seem like a bad idea offhand....
Considering leap-seconds happen every now and then, it seems odd that such fundamental things as Linux and Java can not handle it. AFAIK, it was just about for years ago since we last had a leap-second.
Perhaps the bug that was mentioned in the lkml thread that started with this message was introduced less than four years ago, so the code in question had never gotten exposed to a leap second except perhaps in testing (I don't know how hard it is to reproduce it; John Stultz wasn't initially able to reproduce it in his testing, but eventually succeeded).
The hard system lock bug due to a leap second was patched in 2.6.29, so either you've got some weird related bug, or something is very wrong.
Well, the weird related bug would arguably count as something being wrong. Apparently there is a bug in the handling of the insertion of positive leap seconds that could cause weird behavior with futexes, and that bug appears not to have been fixed until at least July 1, 2012 (I'm guessing John Stultz has worked up a patch).
Our problem was with a third party monitoring solution - its daemon process brought every single one of our servers to a near halt by consuming all available cpu cycles at the stroke of gmt midnight.
The OS itself was fine.
Well, if you're talking a Linux kernel, the part of the OS that dealt with leap seconds was not OK, and was "not OK" in a fashion that could cause processes using futexes to spin and consume all available CPU cycles when a leap second is introduced.
This monitoring software is common enough that it likely was behind a lot of the issues seen around the 'net.
...perhaps by virtue of either using futexes (in what I'm presuming is a legitimate fashion) or using something that uses futexes.
the patch was posted back in March.
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b43ae8a619d17c4935c3320d2ef9e92bdeed05d
Or not.
So are you saying that, in addition to the Linux kernel glitch in question (which appears to cause some userland processes to spin)
Actually, I'm not sure that's the case. John Stultz's mail from July 1, 2012 speaks of a bug where clock_was_set() wasn't called after the leap second was added, and of a patch he was working on, so the bug in question might not have been fixed in March.
As far as I can tell, all current operating systems handled it fine. It's applications that have problems, mainly server-type apps that actually use the clock for important things.
Linux being heavily affected is just a side-effect of most servers running Linux (although apparently some older versions don't handle leap seconds so cleanly - maybe that has something to do with it?).
Yes, at least one of the problems appears to be a Linux kernel problem. However, as that thread indicates, the consequence of this isn't a kernel crash; it causes futexes to repeatedly time out (or, at least, causing futexes with timeouts to repeatedly time out). I'm guessing, perhaps incorrectly, that this might mean that code waiting for a futex gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, lathers, rinses, repeats, so it makes no progress and chews up tons of CPU.
If so, then:
so Linux being heavily affected might also be a side-effect of, well, some versions of the Linux kernel having a bug that's triggered by leap seconds.
However, unless an application happens to use futexes in a fashion that trips over the bug, they won't be affected. It might be server applications that are most likely to do so, meaning that you might not see it on, say, a desktop or handheld Linux machine, or even on some servers.
What we should have is what I've described above, time-zero and a counter. And translations from that to localized date time should be handled by a library.
Which, sadly, POSIX doesn't let you have as "UNIX time":
If there were a UN*X API to get a count of seconds since the Epoch (in addition to, or instead of, a call to get "seconds since the Epoch"), and a UN*X API to convert those to UTC and local time labels, that would get what you want. Modulo making it work with NTP, the former could be implemented with less difficulty than a call to get "seconds since the Epoch", and the latter is called "the Olson code complete with the leap seconds database".
However, that would then require some mechanism to allow code to schedule something to happen at a given UTC label; simply calculating the UNIX time for that UTC label, getting the current UNIX time, and scheduling it for then-now seconds in the future is insufficient, as the UNIX time for a given UTC label in the future might change if a leap second is scheduled between then and now. (Note that if you support scheduling something to happen at a given local civil time label would already require correction of that sort to handle DST rule changes.) This would also have to do something if you schedule an event for YYYY-DD-MM 23:59:59 and a negative leap second occurs so that there is no 23:59:59 on YYYY-DD-MM; "something" might be "let somebody know and ask them to correct it" or "do it at 00:00:00 on the next day", perhaps depending on the reason why it's scheduled.
As it turns out my biggest problems was customer-supplied software which uses their own java jre's. We install a jre by default and update it whenever possible, but some software (Adeptia, VLTrader, Alfresco) comes with their own ancient jre and scripts to call that over system-supplied java.
Not a single machine crashed (we are very explicitly in charge of what OS-version there's running) but a lot of java locked up and had to be restarted.
So are you saying that, in addition to the Linux kernel glitch in question (which appears to cause some userland processes to spin), there are purely-userland problems? Or, if you're running on a Linux that doesn't have John Stultz's fix, is it that some JREs are vulnerable to the Linux kernel glitch and others aren't?
My guess ist that Windows simply ignored it, so there never was a 61st second in a minute.
Well, if Microsoft's documentation of the SYSTEMTIME structure reflects the implementation, GetSystemTime() , the claim in that man page^W^WMSDN page that "The system time is expressed in Coordinated Universal Time (UTC)" nonwithstanding, cannot acknowledge the existence of a 61st second in a minute ("The second. The valid values for this member are 0 through 59.", as the SYSTEMTIME page says).
But, just as on UN*X, you have "counter" and "human-style label" times (time_t, struct timeval, struct timespec are examples of the former, and a struct tm as returned by, for example, gmtime() is an example of the latter, on UN*X), with the Windows versions of those being SYSTEMTIME and FILETIME respectively. That page on FILETIME says nothing about leap seconds - does it just keep counting over a positive leap second or does it stop or what? And, if it doesn't just keep counting over a positive leap second, does it just freeze for a while second, or does it slow down over some period of time so that it eventually syncs up, or what?
As for NTP, Microsoft has a page on "How the Windows Time service treats a leap second", which says
(the author of which needs to be told what "inserted or deleted" implies - do they mean that, regardless of whether a leap second is inserted or deleted, the NTP client that is running Windows Time service is one second faster than the actual time?)
And then there's one more question: if there's anything in the NT kernel that deals with leap seconds, does any version have a glitch, as some versions of the Linux kernel do?
If not, then many of the other problems might not exist on Windows. This email from John Stultz, the author of the fix linked to in the previous paragraph, seems to indicate that at least some of the problems, if not all of them, stem from a kernel bug, so it might be that Java and company might be Just Fine on systems that don't have a kernel glitch of that sort (so they might work fine on at least some non-Linux systems, as well as on Linux systems with the bug fixed).
So far all I've heard about is affected Linux systems, did Windows and OS X just fine?
The glitch mostly affected POSIX compliant operating systems as POSIX specifies a day as 86400.
So you're saying the glitch could affect OS X (or, at least, OS X Snow Leopard - although Leopard was also registered - but I'll bet Lion behaves, and Mountain Lion will behave, the same way)?
Not sure if this will convince you, but drawing down my feeds outside of a web browser helps my productivity--if I've got a browser open, then I'm viewing about 18 different tabs and refreshing feeds when I should be working on other thngs. Having Akregator running lets me read my rss feeds without the temptation of a browser when I have internet connectivity, yet still lets me read downloaded feeds without internet access.
What are you doing reading RSS feeds when you should be working on other things?
Given that a lot of the stuff I work on requires looking stuff up on the Intarwebs, I'm pretty much fucked there; there's always the temptation to browse, and sometimes I just need a break in the middle of hacking. So maybe that works for you, but it's 100% unconvincing, and 100% wrong, for me.
Of course, some people use more than that to avoid distractions, with some of them taking rather extreme steps.
I'm no power user, but neither am I a complete idiot. I really like KDE4. I hit the kickoff, then type in the name of the program and I can run it. I know how to get under the hood and clank around if I need or want to with linux/KDE4. With OSX, everything is very kindergarten-simple...as long as you work their way. If you want to work your way and not theirs, it's up to you to change.
So if I want to view an RSS feed in my browser in KDE - i.e., work my way and not theirs - is it up to me to change, or does Konqueror now support reading feeds itself? :-)
I'm not a command-line commando, either. I'm smart enough to know how to use an application and just finicky enough to want to use it my way. Maybe I've been lucky, but KDE 4 has worked for me from 4.0 on.
KDE 1 worked fine for me, as my primary desktop environment, atop FreeBSD 3.0 many many years ago, and KDE3+FreeBSD 6-or-so was OK when my Mac notebook was getting its disk recovered, but that was when I decided that separate RSS reader applications were not the answer for me.
(Speaking of command lines, hopefully most modern UN*X+X11 combinations, whether Linux distributions or PC-BSD-style desktop *BSDs or..., have XSel as a standard package or even pre-insalled, so you can do the same thing there that you can do on OS X with pbcopy and pbpaste.)
And others don't. Opinions differ on merits of different desktops; story at 11. "Desktop A rules, desktop B sucks" is, absent data from a broad population of users, a personal opinion, not a statement of fact
Yes, but what IS a fact is that some desktops allow users to configure their desktop the way they like it, with focus-follows-mouse, click-to-focus, and other properties. The problem is that most desktops do not; the designers think they know what's best for everyone, and refuse to allow any configuration at all. If all or most desktops allowed users to set these things, you wouldn't see all this complaining.
Yes, but "more configurable" and "less configurable" are't ipso facto statements of objective merit.
"More configurable" is an advantage to the people who don't like the default configuration, and may be completely irrelevant to those who do.
As for "less configurable", at least when it comes to click-to-focus vs. focus-follows-mouse, some question are:
so I'm curious how Windows or OS X apps handle that case if focus-follows-pointer is turned on - autoraise might make that inadvertent focus change more obvious, but not everybody wants autoraise.
I'm a click-to-focus user myself, these days (I went that way when I had a Windows machine on my desk, even if most windows ended up being terminal emulators sshed into a UN*X box, as I figured if I got used to it I'd have fewer problems switching between different desktop environments, given that I could always turn it on for UN*X+X11), so I personally am fine with OS X in that department; people who like OS X and, presumably, don't mind click-to-focus shouldn't confuse that with "OS X IS THE BESTEST UNIX DESKTOP EVAR FOR EVERYBODY!!!!!!11111ONE!!!!!!!".
Somebody who loves focus-follows-mouse may want to ram his or her fist through the screen when using OS X, and that's a perfectly legitimate response for them - as long as they don't confuse it with "OS X IS TEH SUXXXOR FOR ALL UNIX USERS!!!!!111ONE!!!!!!".
Meeting increasing challenges of hardware, web standards, etc. is necessary (maybe,) but the thing that XP-7-8 has taught me is that needless complications are needless. Maybe it's time the open source community starts asking *why* a particular change is desirable or necessary to the userbase.
What Peter Penz said in TFBP was
At least for the example in the second paragraph, it's necessary to those members of the user base who want to be able to see Web sites that use Shiny Modern Web Features. If you want to have the developers of the Web standards or technologies that include those features, or the Web developers who use those features on those sites, ask themselves why that stuff is desirable or necessary to the user base, that might be a good idea, but the developers of the free-software Web browsers are probably somewhat stuck here, unless they want to limit themselves to a user base that doesn't use any sites that require the Shiny New Features.
You like the OSX desktop? I hate it. It is like it was designed for children and gets in the way too often. I want focus follows mouse, I want to get rid of the idiot dock bar thing, I want menus on every screen not just the main monitor.
And others don't. Opinions differ on merits of different desktops; story at 11. "Desktop A rules, desktop B sucks" is, absent data from a broad population of users, a personal opinion, not a statement of fact (regardless of whether desktop A is the OS X desktop or $OTHER_UN*X_DESKTOP and whether desktop B is $OTHER_UN*X_DESKTOP or the OS X desktop); to make it a statement of fact, prepend "for me" and append "your mileage may vary" (and, yes, this applies to you and the person to whom you're replying).
(But it sounds as if Apple may be killing one thing I really liked about Safari relative to, for example, Konqueror - Safari, at least, had an RSS feed reader built in, so I didn't have to fuck around with Akregator. Note: if you want to defend the separation of RSS feed reading from Web browsing, please explain to me - in a fashion convincing to me; convincing to you, by itself, doesn't even come close to sufficing - why I would not want to read a feed of Web pages in a Web browser. But I digress....)
On top of it, SHIP WITH THE FUCKING GNUTOOLS YOU MORONS. The half baked commercial versions of these tools lack way to many features.
To which GNU tools are you referring? Developer tools? They used to ship GCC, but when it went to GPLv3 they decided to put their efforts behind Clang and LLVM instead. I don't know whether the current version of GDB is GPLv3, but they're putting their effort behind LLDB. (They may be "commercial" in the sense of being supported by a vendor, but they're free software.) They never used the GNU assembler or linker; they have their own APSL 2.0-licensed assembler and APSL 2.0-licensed linker; presumably if "half baked commercial versions of these tools" is referring to the assembler or linker, "commercial versions of these tools" means "...commercial assembler and linker" not "...commercial versions of the GNU assembler and linker".
Any time I parse something like "(Partnership).*(American|(Econom(y|ic)))", I immediately lump it into the right wing propoganda bin.
The Partnership for a New American Economy sounds more corporatist than hard-right; there's more than one right-wing propaganda bin, and they're not in, for example, the right-wing nativist bin.
(Oh, and given who heads up the list of co-chairs, the "chair" part is a bit amusing....)