June 30th Leap Second Could Trigger Unexpected Issues

← Back to Stories (view on slashdot.org)

June 30th Leap Second Could Trigger Unexpected Issues

Posted by Soulskill on Friday June 19, 2015 @05:16AM from the quick,-everybody-expect-them-instead dept.

dkatana writes: On January 31, 2013, approximately 400 milliseconds before the official release of the EIA Natural Gas Report, trading activity exploded in Natural Gas Futures. It is believed that was the result of some fast computer trading systems being programmed to act, and have a one-second advance access to the report. On June 30th a leap second will be added to the Network Time Protocol (NTP) to keep it synchronized with the slowly lengthening solar day. In this article, Charles Babcock gives a detailed account of the issues, and some disturbing possibilities: The last time a second needed to be added to the day was on June 30, 2012. For Qantas Airlines in Australia, it was a memorable event. Its systems, including flight reservations, went down for two hours as internal system clocks fell out of synch with external clocks.

The original author of the NTP protocol, Prof. David Mills at the University of Delaware, set a direct and simple way to add the second: Count the last second of June 30 twice, using a special notation on the second count for the record. Google will use a different approach: Over a 20-hour period on June 30, Google will add a couple of milliseconds to each of its NTP servers' updates. By the end of the day, a full second has been added. As the NTP protocol and Google timekeepers enter the first second of July, their methods may differ, but they both agree on the time.

But that could also be problematic. In adding a second to its NTP servers in 2005, Google ran into timekeeping problems on some of its widely distributed systems. The Mills sleight-of-hand was confusing to some of its clusters, as they fell out of synch with NTP time. Does Google's smear approach make more sense to you, or does Mills's idea of counting the last second twice work better? Do you have a better idea of how to handle this?

18 of 233 comments (clear)

Min score:

Reason:

Sort:

Doesn't matter by StormShaman · 2015-06-19 05:21 · Score: 5, Informative

The only problem mentioned is that they fall out of sync with each other. If they're both otherwise fine, just pick one. Sounds like the disadvantages of either one aren't as big as the disadvantage of them not working well together.
Google is right by phishybongwaters · 2015-06-19 05:25 · Score: 5, Interesting

Typically when dealing with NTP you do not want big swings. In fact, a system using NTP that's too far out of sync, won't sync back up correctly. One that is slightly out of sync will slowly come back in sync over a period of time, hours or days even. Both approaches could work, they really could, but I think adding a few milliseconds here and there is a better way to get this done as long as the systems don't fall too far behind. I work with Avaya voice equipment and we've been warning people about this for months and months. We've provided instructions on several methods to ensure this doesn't cripple your system, but it all depends on how your NTP is setup. I also foresee issues with just adding an extra second to the day, this is not going to work for a bunch of systems and will actually throw them out of sync compared to googles approach. One of the solutions we've "provided" is to disable NTP shortly before the time roll over, then enable it once it's July. That's a pain in the butt, but if you can afford the few minutes of service interruption, it solves all of the issues right there, you turn it off when it's synced, turn it back on and it syncs to the new time. The real issues come in, for my field at least, with logging, this is going to throw a wrench into sys logs if it's not taken care of, and with some of the platforms, it will literally cripple the system.
choose what standard to violate by at10u8 · 2015-06-19 05:27 · Score: 4, Informative

A problem for sysadmins is that the status quo of the standards requires that we choose which standard we want to violate. We can violate the specification of UTC by not counting 23:59:60 or we can violate POSIX by counting it or we can violate POSIX and the SI second by not actually keeping the system clock on UTC using smeared seconds that are not suitable for tracking projectiles and other real-time applications. This problem is old, 50 years old, as seen in the 3 plots on this web page.
Dice: Please restore the Read More link. Thanks. by Anonymous Coward · 2015-06-19 05:29 · Score: 5, Insightful

I understand the desire to change things, but putting some social media Share link in place of the Read More link goes against the kind of website Slashdot is.
Please restore the original layout. Thanks.
Re:Sync by 0123456 · 2015-06-19 05:37 · Score: 4, Informative

I find it strange than a possible 1 second different could cause so much issues.
It's not the time difference that causes problems per se, it's time going backwards. You presumably missed the fact that many Java servers crashed over the last leap second because of a kernel bug that screwed up their internal timers?
We had problems last time due to faults reported by external hardware when it saw the time jump backwards. I'll be at my desk when it happens this time to deal with any problems that come up this time.
And, given the chaos every leap second causes, hopefully we can finally convince the 'experts' to stop fiddling with time.
Re:Dice: Please restore the Read More link. Thanks by enigma32 · 2015-06-19 05:37 · Score: 4, Informative

+1 - Mod parent up.
just a second by frovingslosh · 2015-06-19 05:50 · Score: 5, Funny

At least it is just a second. That sudden extra hour of daylight in the spring is really bad for my rose bushes.

--
I'm an American. I love this country and the freedoms that we used to have.
Re:Dice: Please restore the Read More link. Thanks by Art3x · 2015-06-19 05:57 · Score: 5, Insightful

I understand the desire to change things, but putting some social media Share link in place of the Read More link goes against the kind of website Slashdot is.
Please restore the original layout. Thanks.
+1 - Mod parent up.
+2. In a Slashdot comment, we must add links and formatting by typing HTML by hand. You would therefore think we know how to copy and paste a web address from Slashdot to Facebook, if that's what we really want to do. We don't need an icon to do it for us.
If you're going to add icons, switch the places for Share and Comments. Put the Share link to the right of the heading. Put the Comments link at the bottom. To me it seems more logical that way, it puts the Comments link back where it was.
Re:Wrong solution, wrong problem by mcelrath · 2015-06-19 06:03 · Score: 4, Informative

Also this is an awesome graph, and illustrates that the Earth is a horrible clock: https://upload.wikimedia.org/w...

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
Re:Buggy software is buggy by petermgreen · 2015-06-19 06:06 · Score: 4, Interesting

Leap years and leap seconds are handled very differently.
The rules for leap years are according to a forumula that has been fixed for hundreds of years. Computers typically handle them as part of their conversion from internal "time elapsed since epoch" data formats to "human" date formats and otherwise don't care much about them. Even the simplified formula of "leap year every 4 years"
Leap seconds OTOH cannot be predicted in advance so you cannot realiablly convert "time elapsed since epoch including leap seconds" to "time elapsed since epoch excluding leap seconds" or "human datetime" for future datetimes and to do it for past datetimes requires an up to date list of leap seconds.
Then there is the problem that "time elapsed since epoch excluding leap seconds" which is a common way to represent time (presumablly due to the difficulty in converting "time elapsed since the epoch including leap seconds" to "human datetime" simply cannot correctly represent the times arround a leap second.
The testcase is also anything but simple, to test the code you have to inject fake leap seconds, but for a correct test leap seconds can only be injected at specific times (NTP for example increases it's update rate around possible leap seconds) so either you can only run the test at specific times or your entire test environment needs to run on "fake time". This is a big problem if your tests need to interact with a system outside the test environment in a way that depends on time within the test environment being in sync with time outside the test environment.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:Dice: Please restore the Read More link. Thanks by GoodNewsJimDotCom · 2015-06-19 06:20 · Score: 4, Informative

I thought Slashdot was dead. I thought they killed the comments until someone told me where to look.

--
God spoke to me
Re:Buggy software is buggy by 93+Escort+Wagon · 2015-06-19 06:22 · Score: 4, Insightful

The ITU-R has outlined 4 methods for the future of UTC. Methods A1, A2, B, C1, C2, and D are from various delegations of the international assembly, and they are in serious disagreement with each other.
That's silly. There's no reason for it. Let's just sit down and come up with a new standardized method that covers all of these use cases.

--
#DeleteChrome
Re:Dice: Please restore the Read More link. Thanks by war4peace · 2015-06-19 06:22 · Score: 5, Insightful

The way they changed the design is clickbait of sorts.
People trained their muscle memory to click that area to load more of the story or comments. Now they click and yell in frustration.
That's a really shitty way of luring people. Shame on you, Dice!

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
Re:Buggy software is buggy by RavenLrD20k · 2015-06-19 06:57 · Score: 4, Funny

The ITU-R has outlined 5 methods for the future of UTC [acma.gov.au]. Methods A1, A2, B, C1, C2, D, and E are from various delegations of the international assembly, and they are in serious disagreement with each other.
Re:Sync by ceoyoyo · 2015-06-19 07:36 · Score: 4, Informative

I'm not sure exactly what arguments each Linux distribution uses, but this is from the man page on ntpd:

-x
Normally, the time is slewed if the offset is less than the step threshold, which is 128 ms by default, and stepped if above the threshold. This option forces the time to be slewed in all cases. If the step threshold is set to zero, all offsets are stepped, regardless of value and regardless of the -x option. In general, this is not a good idea, as it bypasses the clock state machine which is designed to cope with large time and frequency errors Note: Since the slew rate is limited to 0.5 ms/s, each second of adjustment requires an amortization interval of 2000 s. Thus, an adjustment of many seconds can take hours or days to amortize. This option can be used with the -q option.
My reading of that is that the normal adjustment uses slew. Step is used only when there's a big discrepancy, and you can use -x to use slew even in that case.
The problem, and the IMHO correct solution. by arcade · 2015-06-19 07:55 · Score: 4, Interesting

First off, the problem with leap seconds and unix is that unix time isn't UTC. Unix time is defined as seconds since epoch, ignoring leapseconds. Unix time is 'lossy' in that a the moment a leapsecond occurs can't be differentiated from the second before it. More information about that here: https://en.wikipedia.org/wiki/...
The problem is that POSIX.1 is plain stupid when it comes to leapsecond.
The correct solution to this problem would be as follows:
1. Fix POSIX.1 to define unix time as TAI.
2. Implement conversion routines i gettimeofday and other relevant functions.
3. Use a handy store for leapseconds.
Now, number 3 here is a bit tricky. Purists would probably want this in the TZ database or somesuch. This is well and good, but has the problem that the TZ files need to be packaged and updated on all the servers. If I remember correctly (please correct me if I'm wrong) Java is shipped with its own TZ files, and might also need them updated separately. Due to this, I think the most maintainable and portable way to do this across unixes would be to simply have an /etc/leapseconds file which lists the leapseconds since epoch. It does, however, depend on unix time being defined as TAI first.

--
"Rune Kristian Viken" - http://www.nwo.no - arca
1. Re:The problem, and the IMHO correct solution. by at10u8 · 2015-06-19 09:11 · Score: 4, Informative
  
  Please look at this tzdist internet draft which is close to becoming an RFC. The tzdist protocol can communicate the list of leap seconds along with the list of time zones.
Re:Dice: Please restore the Read More link. Thanks by weilawei · 2015-06-19 08:49 · Score: 5, Interesting

I'm willing to accept that layouts change and I'll need to look in a new place--but the new location is actually terrible usability. Here's why:
First, I read the headline. Then, I read the summary. I'm moving down the page, and I'm scrolling the page, too. So, now I'm at the end of the summary, and the headline for any story with a long summary is now out of the window. Now, I need to scroll back up to see how many comments or to click to view those comments. Extra work, even if the summary isn't long.
Fitts' Law applies here. They've made the target smaller in diameter, and placed it further away effectively. That means the difficulty of clicking to view comments is noticeably harder.