NTP's Fate Hinges On "Father Time"
Esther Schindler writes In April, one of the open source code movement's first and biggest success stories, the Network Time Protocol, will reach a decision point, writes Charlie Babcock. At 30 years old, will NTP continue as the preeminent time synchronization system for Macs, Windows, and Linux computers and most servers on networks? Or will this protocol go into a decline marked by drastically slowed development, fewer bug fixes, and greater security risks for the computers that use it? The question hinges to a surprising degree on the personal finances of a 59-year-old technologist in Talent, Ore., named Harlan Stenn.
For the last three-and-a-half years, Stenn said he's worked 100-plus hours a week answering emails, accepting patches, rewriting patches to work across multiple operating systems, piecing together new releases, and administering the NTP mailing list.
100 hours a week to maintain NTP? How much of that comes down to answering emails that maybe don't need to be answered? I have a ton of respect for Mr. Stenn, whose name, incidentally, I don't think I'd ever heard until today despite having two systems in pool.ntp.org for years and using NTP in or on nearly every device I've owned for two decades. But there's simply no way there's enough going on in NTP to generate 100 hours of work per week. I see the changelog is active and fresh, but I still can't imagine those releases driving a "crunch time" developer schedule, week in and week out for 3+ years.
I hope he backs away from the email before tossing in his hat entirely. Engaging random strangers in thoughtful discussion is probably consuming a lot more of his time than it needs to, and maybe more than he's noticing.
If it is not broke, fix it until it is.
Is this what keeps projects alive?
I'm a good cook. I'm a fantastic eater. - Steven Brust
NTP doesn't just 'return a string of numbers'.
But either way, the article isn't about NTP the protocol, its about one shitty implementation of NTP that I don't think anyone even uses anymore. Windows and OS X certainly don't.
The summary and headline are equivalent to saying 'Netscape is going out of business, HTTP in danger of disappearing'
If he were to drop dead right this instant ... no one that matters would notice beyond his family.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
The reference implementation, which is the subject of the article, is what's used by pretty much everyone. Name a significant OS/distribution which doesn't use ntpd.
Oh, and that includes OS X, contrary to your incorrect claim:
--OS X Yosemite
You are correct about one thing - it is a shitty implementation. It doesn't even follow RFC 5905, which it's supposed to be the reference for.
"National Security is the chief cause of national insecurity." - Celine's First Law
And at the Google Summer of Code summits I've seen the NTP devs (I can't say if it was him or not) stand up in front of a couple hundred FOSS project leaders begging for help with their bus factor problem.
This sort of infrastructure code is at once niche, genuinely hard work, demanding the highest quality programming skill, unrewarded, non-sexy,
and vitally important to the world.
So he patched for and worked with Apple and they said we'll see ?
If his time isn't valuable to him why should it be to Apple ? Next time they call he should tell them he can't talk to them unless they fax back a block time purchase agreement with either a check or credit card and a statement they won't charge back.
I've been on the "NTP Hackers" mailing list for ~15 years now, my last major effort was to develop a server-optimized multi-threaded version of the core ntpd sw: I was hoping for wire speed packet processing on an embedded linux platform, but had to settle for 3-500 Mbit/s since the target kernel version did not support multi-thread targeting of incoming packets, i.e. I needed to have a single receive thread which would fetch the incoming packets, timestamp them and queue them up, then all the other threads/cores would grab them from there.
Back to the "why are there bugs in such a trivial protocol?" question:
By far the biggest cause of required effort when trying to modify or optimize the NTPD distribution is the need to support a big number of OSs and even larger number of OS versions, some of them more than 20 years old, even if the main targets are Unix-like or Windows.
The second problem is the need to support 30+ reference clocks, with all sorts of OS/version specific interfaces needed in order to timestamp events as accurately as possible.
The third and final major stumbling block is all the crypto stuff, which got added in order to be able to authenticate both time packets and monitoring/configuration requests, and this is where the latest major bugs have been found.
PHK (who is working on Ntimed) has spent a lot of time on NTP, including his time as a core FreeBSD hacker when he made sure that the FBSD had the best possible timekeeping kernel. This is the reason that my personal pool server has always used FreeBSD.
If your only need is to get 0.1s level time sync on a number of client only machines, then it really doesn't matter how you implement the NTP protocol, except that you should really try to measure and adjust the local clock frequency so as to track the reference time!
The default Windows time code implements Simple NTP (SNTP) which uses the NTP packet format but doesn't try to implement the proper control loop to steer the local clock, instead it just yanks the OS clock resulting in a sawtooth-like pattern of clock offsets.
Terje
"almost all programming can be viewed as an exercise in caching"