Slashdot Mirror


June 30th Leap Second Could Trigger Unexpected Issues

dkatana writes: On January 31, 2013, approximately 400 milliseconds before the official release of the EIA Natural Gas Report, trading activity exploded in Natural Gas Futures. It is believed that was the result of some fast computer trading systems being programmed to act, and have a one-second advance access to the report. On June 30th a leap second will be added to the Network Time Protocol (NTP) to keep it synchronized with the slowly lengthening solar day. In this article, Charles Babcock gives a detailed account of the issues, and some disturbing possibilities: The last time a second needed to be added to the day was on June 30, 2012. For Qantas Airlines in Australia, it was a memorable event. Its systems, including flight reservations, went down for two hours as internal system clocks fell out of synch with external clocks.

The original author of the NTP protocol, Prof. David Mills at the University of Delaware, set a direct and simple way to add the second: Count the last second of June 30 twice, using a special notation on the second count for the record. Google will use a different approach: Over a 20-hour period on June 30, Google will add a couple of milliseconds to each of its NTP servers' updates. By the end of the day, a full second has been added. As the NTP protocol and Google timekeepers enter the first second of July, their methods may differ, but they both agree on the time.

But that could also be problematic. In adding a second to its NTP servers in 2005, Google ran into timekeeping problems on some of its widely distributed systems. The Mills sleight-of-hand was confusing to some of its clusters, as they fell out of synch with NTP time. Does Google's smear approach make more sense to you, or does Mills's idea of counting the last second twice work better? Do you have a better idea of how to handle this?

5 of 233 comments (clear)

  1. Google is right by phishybongwaters · · Score: 5, Interesting

    Typically when dealing with NTP you do not want big swings. In fact, a system using NTP that's too far out of sync, won't sync back up correctly. One that is slightly out of sync will slowly come back in sync over a period of time, hours or days even. Both approaches could work, they really could, but I think adding a few milliseconds here and there is a better way to get this done as long as the systems don't fall too far behind. I work with Avaya voice equipment and we've been warning people about this for months and months. We've provided instructions on several methods to ensure this doesn't cripple your system, but it all depends on how your NTP is setup. I also foresee issues with just adding an extra second to the day, this is not going to work for a bunch of systems and will actually throw them out of sync compared to googles approach. One of the solutions we've "provided" is to disable NTP shortly before the time roll over, then enable it once it's July. That's a pain in the butt, but if you can afford the few minutes of service interruption, it solves all of the issues right there, you turn it off when it's synced, turn it back on and it syncs to the new time. The real issues come in, for my field at least, with logging, this is going to throw a wrench into sys logs if it's not taken care of, and with some of the platforms, it will literally cripple the system.

  2. Re:Buggy software is buggy by petermgreen · · Score: 4, Interesting

    Leap years and leap seconds are handled very differently.

    The rules for leap years are according to a forumula that has been fixed for hundreds of years. Computers typically handle them as part of their conversion from internal "time elapsed since epoch" data formats to "human" date formats and otherwise don't care much about them. Even the simplified formula of "leap year every 4 years"

    Leap seconds OTOH cannot be predicted in advance so you cannot realiablly convert "time elapsed since epoch including leap seconds" to "time elapsed since epoch excluding leap seconds" or "human datetime" for future datetimes and to do it for past datetimes requires an up to date list of leap seconds.

    Then there is the problem that "time elapsed since epoch excluding leap seconds" which is a common way to represent time (presumablly due to the difficulty in converting "time elapsed since the epoch including leap seconds" to "human datetime" simply cannot correctly represent the times arround a leap second.

    The testcase is also anything but simple, to test the code you have to inject fake leap seconds, but for a correct test leap seconds can only be injected at specific times (NTP for example increases it's update rate around possible leap seconds) so either you can only run the test at specific times or your entire test environment needs to run on "fake time". This is a big problem if your tests need to interact with a system outside the test environment in a way that depends on time within the test environment being in sync with time outside the test environment.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  3. Re:Doesn't matter, so why do it? by AthanasiusKircher · · Score: 2, Interesting

    Why do we even bother with this? Why can't we just let noon move a second. Even after a hundred years it won't make any difference. Time zones on average vary in the suns position by a whole hour so a 1 sec variation of the solar zenith makes no difference. Anstronomers will still be able to find there stars.

    Agreed. This is all nonsense. Even NIST admits that it's basically for legacy astronomical equipment. But any astronomer who needs real precision needs to deal with fractional-second corrections all the time now anyway, and there are published tables that allow one to do this. (For the current correction to convert from UTC to UT1, see here, which gives values accurate to +/-5 milliseconds.)

    If we ever got maybe a minute or more off, I could possibly see the reason for a correction. But a second? Who cares? As I said, the very small number of people who actually need to use UT1 mostly do fractional-second conversions all the time anyway, as leap seconds aren't precise enough to keep up with the continuous variation.

  4. The problem, and the IMHO correct solution. by arcade · · Score: 4, Interesting

    First off, the problem with leap seconds and unix is that unix time isn't UTC. Unix time is defined as seconds since epoch, ignoring leapseconds. Unix time is 'lossy' in that a the moment a leapsecond occurs can't be differentiated from the second before it. More information about that here: https://en.wikipedia.org/wiki/...

    The problem is that POSIX.1 is plain stupid when it comes to leapsecond.

    The correct solution to this problem would be as follows:
    1. Fix POSIX.1 to define unix time as TAI.
    2. Implement conversion routines i gettimeofday and other relevant functions.
    3. Use a handy store for leapseconds.

    Now, number 3 here is a bit tricky. Purists would probably want this in the TZ database or somesuch. This is well and good, but has the problem that the TZ files need to be packaged and updated on all the servers. If I remember correctly (please correct me if I'm wrong) Java is shipped with its own TZ files, and might also need them updated separately. Due to this, I think the most maintainable and portable way to do this across unixes would be to simply have an /etc/leapseconds file which lists the leapseconds since epoch. It does, however, depend on unix time being defined as TAI first.

    --
    "Rune Kristian Viken" - http://www.nwo.no - arca
  5. Re:Dice: Please restore the Read More link. Thanks by weilawei · · Score: 5, Interesting

    I'm willing to accept that layouts change and I'll need to look in a new place--but the new location is actually terrible usability. Here's why:

    First, I read the headline. Then, I read the summary. I'm moving down the page, and I'm scrolling the page, too. So, now I'm at the end of the summary, and the headline for any story with a long summary is now out of the window. Now, I need to scroll back up to see how many comments or to click to view those comments. Extra work, even if the summary isn't long.

    Fitts' Law applies here. They've made the target smaller in diameter, and placed it further away effectively. That means the difficulty of clicking to view comments is noticeably harder.