education-2022/selected/time.md

30 KiB
Raw Permalink Blame History

Time

Time is a very deceptive topic for programmers. "I know how time works", they all think. "After all, I use a calendar and clock every day!" Unfortunately, human civilizations have been measuring time for a long time[citation needed], and even our modern systems carry centuries of historical baggage. This article will attempt to give you a broad working knowledge of time, and how it pertains to the computer systems you will work on in your lifetime.

There are two major (and largely separate) concerns related to time: when things happen (instants in time), and how long things take (durations). We will cover each of them in turn.

No matter how confused you may become as you read this article, never forget that time always moves forward at the same rate everywhere. Nothing you do (switching timezones, changing clocks, etc.) can change this fact, and any confusion is simply the result of human notations for time. Unless you have to deal with relativity—in that case, god help you.

When things happen: instants in time

The first major aspect of time for a programmer is the study of when things happen. This is the realm of calendars and clocks.

A first important distinction is the difference between dates and times.

  • A date represents a particular day on a calendar. For example: May 21, 2022. The calendar you are primarily familiar with is the Gregorian calendar, although other calendar systems exist and have some limited use, particularly for holidays and other religious reasons.
  • A time on its own represents a time of day—hours, minutes, seconds, and perhaps more depending on precision. For example: 7:45:00 PM.

Combining a date and a time gives you an precise instant in time, and is commonly called a datetime or timestamp. However, one more ingredient is needed to remove ambiguity: time zones.

Learning resources:

  • TODO: uh is there really anything good to link here [NOTE(hayden): I'm not really sure that anything is needed for this section. This is basically just covering the tip of the iceberg that most are already familiar with]

Time zones

You are already casually familiar with time zones. They reflect the very real fact that different locations on Earth experience the start and end of a day at different times. They are a major source of consternation for programmers, however, because they involve lots of human ambiguity and politics.

From a programmer's perspective, a time zone can be thought of as a bundle of the following:

  • A time offset from UTC
  • Rules for how this offset changes over time

The former is relatively easy to handle; the latter is fraught with politics and annoying edge cases, and will be discussed in the next section.

The time offset removes ambiguity by anchoring the time to UTC (Coordinated Universal Time), an international standard time that is continuous and unaffected by regional policy. A timestamp or datetime is thus a combination of date, time, and time zone or time offset. These three ingredients represent a fully-qualified, unambiguous instant in time.

It is worth noting that the time offset may not always be in increments of one hour. There are several time zones in current use that are offset by 30 or 45 minutes from the typical one-hour interval, and historical timezones that predate modern timekeeping may be offset by any arbitrary amount.

Resources:

Aside: UTC and the International Date Line

Broadly speaking, UTC (Coordinated Universal Time) is the date and time at the prime meridian (zero degrees longitude), and is not subject to daylight savings or other discontinuities.

You may have also heard of GMT (Greenwich Mean Time). GMT is also centered at the prime meridian, but is UK-centric and therefore subject to some historical nonsense. UTC is the internationally standard successor, and the one appropriate for computer use.

The existence of the prime meridian necessitates a date cutoff point at roughly 180 degrees longitude. This is the International Date Line, the point at which you cross from UTC-12 to UTC+12 or vice versa, changing the date as necessary. [NOTE(hayden): Not sure how important this is, but an image might be useful here]

You might think that the International Date Line ensures that no two locations on earth have a time difference greater than 24 hours. But naturally, there are in fact time zones whose time offset is greater than 12 hours from UTC. Rules are made to be broken.

Resources:

Daylight saving time and locations

Many locations on Earth observe regular changes to their timekeeping, shifting their clocks forward or backward by one hour to put sunrise and sunset in a different place relative to their working day. Many people, especially programmers, think this is stupid, but we programmers are stuck with it until every government on Earth decides to abolish this practice.

The most important implication of daylight saving time is that you must not assume that a day is 24 hours long. A calendar day may be 23 or 25 hours long, depending on the direction of the transition [NOTE(hayden): Are there any instances in any countries where it is some arbitrary time BETWEEN 23-25 hours, or are the options specifically 23, 24, and 25?]. The visible impact of this is that the day will either repeat or skip an hour, and this change may occur at different times of day depending on the location.

For example, time in Chicago on March 14, 2021 proceeded directly from 1:59:59 CST (Central Standard Time) to 3:00:00 CDT (Central Daylight Time). Likewise, on November 7, 2021, time proceeded directly from 1:59:59 CDT to 1:00:00 CST. Note that when repeating an hour, the time zone resolves the ambiguity.

To help you deal with daylight saving time, you may have heard mnemonics such as "spring forward, fall back", or the terms "gain an hour" and "lose an hour". These terms are unfortunately confusing and appear at times to be in conflict, so here is a cheat sheet:

Coming from Going to Occurs in Clock adjustment Lived experience Analogous to
Standard DST Spring "Spring forward" "Lose an hour" Traveling east
DST Standard Fall "Fall back" "Gain an hour" Traveling west

A useful gut-check for all this is to remember that the Sun rises in the east and sets in the west. [NOTE(hayden): Is this just a heuristic? Technically it's a generalization and isn't always the case, but I don't know if that's being pedantic or if it's a useful clarification.] Traveling west would thus mean traveling with the Sun and experiencing more daylight—hence, "gaining an hour", and having to adjust your clock backward to account for the extra time in your day.

Aside: there are many conflicting explanations for who originally proposed the practice of daylight saving time, why governments actually chose to adopt this practice, and why it has continued to this day. This article does not attempt to clarify any of this, but the author chooses to believe that all this is Benjamin Franklin's greatest prank of all time.

Resources:

Leap days and leap seconds

Unfortunately, one year on the Gregorian calendar does not correspond exactly to one solar year. You're probably already aware of how this is handled; every four years, February has an extra day. (Except for years divisible by 100, except for those which are also divisible by 400.) This is not very hard to handle, and it will be accurate enough for at least the next 1000 years.

Unfortunately, there is a more difficult problem - the Earth's rotation is somewhat irregular. A variety of geological effects can cause the Earth's rotation to speed up or slow down, causing a solar day to step out of sync with our clocks. In order to avoid this drift, leap seconds are occasionally observed, extending or contracting the length of the day by one second. Unlike leap years, there is no set schedule for leap seconds - the Earth's rotation is not predictable enough. Computer systems must periodically download new timekeeping information so they always know of any upcoming leap seconds.

Since the year 1972, 27 leap seconds have been observed. At the time of this writing, a negative leap second (shortening the day) has never been observed; only positive leap seconds have been observed (lengthening the day). However, negative leap seconds are possible in theory, and they may someday be required.

All of this has some consequences for programmers:

  • You should not assume that a year has 365 days (because a leap year may make it 366).
  • You probably should not assume that a minute has 60 seconds (because a leap second may make it 59 or 61).
  • 23:59:60 may be a valid time, depending on the day.

The irregular and unpredictable nature of leap seconds has resulted in varying implementations. One common technique for dealing with leap seconds is essentially to ignore them: it is common for time synchronization servers to "smear" the leap second over a period of time, causing their clients to slowly drift until they match with real UTC again. In this case, clients are completely unaware that a leap second is occurring. More information about leap smearing can be found in this article:

[NOTE(ben): It kind of feels like we should have covered NTP in some limited way by now?] [NOTE(hayden): I think the logical flow for when it occurs below makes sense]

Resources:

Date representations, the Unix epoch, and the year 2038 problem

With all these concepts established, the question now is how to actually represent these in your application. For most uses of eeeboo beeboo [NOTE(hayden): Wat xD]

  • Unix timestamp
    • Seconds since January 1, 1970 UTC
    • Unambiguously stores an instant in time, but nothing else
    • Somewhat ambiguous for dates before 1972 (because UTC was not established yet)
    • Trivial computer representation (a single number)
    • Not human readable
  • NTP timestamp
    • Fractional seconds since January 1, 1900 UTC (even if UTC didn't exist back then, see above).
    • Uses 64 bit fixed point format. The 32 most significant bits denote the number of seconds, the 32 least significant bits denote fractional seconds. The smallest time increment is thus 2^{-32} seconds (232 picoseconds). The maximum representable time range is approximately 136 years.
  • NTP date
    • It uses 128 bits as follows: a 32 bits signed Era number, and a 96 fixed point era offset (32 bits for seconds and 64 bits for fractional seconds).
    • Covers the whole universe's existence with a precision well below any duration that can be directly measured.
  • RFC 3339 date-time
    • Unambiguously stores an instant in time
    • Indicates offset from UTC (but not time zone in the proper sense - no DST)
    • Human-readable
    • Restricted subset of ISO 8601
      • You literally have no reason to use ISO 8601. Just don't. A copy of the spec costs hundreds of dollars, and it allows way more formats than anyone needs. Anyone who claims to accept "ISO 8601 timestamps" is wrong; they probably actually support RFC 3339 and a few variants. Meanwhile, RFC 3339 is freely available and has a small but complete set of features.
    • Contains useful definitions for both date and time encoding
Unix timestamp
- RFC 3339 and its pitfalls
- Other profiles (ISO8601)
- IANA tzdb

What time is it actually: time synchronization

— Pardon me, do you have the time?
— When do you mean, now or when you asked me? This shit is moving, Ruth.

— George Carlin, Again! (1978)

Getting the time seems easy right? Someone has it, you ask them, and you're done. Well, it's a little more complicated than that.

Let's say you want to set your watch, so you ask the time to a friend. But also imagine voice takes a long, undeterminate amount of time to travel between you and them (alternatively, imagine you want to know what time is it with an extreme precision). When your friend hears your question, it is already too late. They can give you the time it was when the heard the question, but they have no idea when you asked. Now, when you receive their answer you have the same problem. You neither know when they heard your question, nor when they answered. You only know that the time you get corresponds to some point between the moment you asked and the moment you heard the answer.

Now let's say you pick a random point in this interval, and adjust your own watch to that time. Maybe you make an error, but at least you're synchronized with your friend within some tolerance right? No luck! Since it is extremely unlikely both of your watches have exactly the same consistent speed, your watch will drift and wander away from your friend's clock. You could ask again and again. But each time you will get a different error, so your watch will be very noisy (and still drift in between exchanges). Furthermore it seems like a big waste of your... time.

Fortunately for us humans, we don't typically require a high precision compared to the time it takes for sound to travel between two persons speaking to each other. For computers though, it can be crucial to be synchronized within a few microseconds, and network communications can take hundreds of milliseconds. Their clock are also drifting and jittery, sometimes at an astonishingly high rate. Clearly there has to be ways to mitigate the errors generated by such adverse conditions.

The Network Time Protocol

The ubiquitous solution to time synchronization on computer networks is the Network Time Protocol. It allows to synchronize clocks to a common reference (usually Coordinated Universal Time) with an accuracy of a few milliseconds over the Internet, and can typically provide sub-millisecond accuracy on Local Area Networks. NTP was pioneered by Pr. David Mills, whose webpage is a treasure trove of informations about time synchronization. The protocol was described in various RFCs, from its first version (RFC1059) to the current fourth RFC5905.

Other clock synchronization protocols exist, e.g. PTP (IEEE 1588-2019), and some implementations of NTP such as Chrony include different error mitigation techniques than those described by the specification.

I will focus here on NTP since it's the most documented and widely used, and it doesn't require specialized hardware like PTP.

The NTP Network

To maintain a consistent notion of time accross the internet, you first need some reference time signal, which is typically provided by atomic clocks. Since it is neiter practical nor desirable to connect every computer in the world to these atomic clocks, the time signal is distributed accross a network of peers organized in serveral strata. Primary servers sit at stratum 1 and are directly connected (through wire or radio) to a reference clock. Peers at strata 2 can synchronize on servers at strata 1 or 2, and act as a synchronization sources for peers at strata 3. The layering continues up to stratum 15. Strata 16 represents unsynchronized devices.

NTP architecture flow graph

The protocol selects sources in order to avoid loops and minimize the round-trip delay to a primary server. As such, peers collaborate to automatically reorganize the network to produce the most accurate time, even in the presence of failures in the network.

NTP Architecture

Now let's see the implementation of one node of the network. The following diagram shows the various stages of the pipeline used to collect time information and mitigate errors caused by network delays and clock inaccuracies. I'll explain each one in a short paragraph and give links to the relevant parts of the RFC, and to other references if you want to dig deeper.

NTP architecture flow graph

On-Wire Protocol

The first stage consists of getting time estimates from each peer. In the simplest mode of operations, the node polls a peer by sending a request packet timestamped with the transmission time, t_1. Upon reception of the request, the peer stores a timestamp t_2, processes the message, and sends a response packet containing t_1, t_2 and the response transmission time t_3. The node receives the response at time t_4.

NTP architecture flow graph

The node then computes the tuple (\delta, \theta, \epsilon), where:

  • \delta is the round-trip delay.
  • \theta is the estimated offset of the server clock with respect to the local clock.
  • \varepsilon is the dispersion of the sample, i.e. the maximum error due to the frequency tolerances \varphi_p and \varphi_c of the peer's clock and of the client's clock, and the time elapsed since the request was sent.

$\begin{align*} \delta = (t_4 - t_1) - (t_3 - t_2) , ,\qquad \theta = ((t_2 - t_1) + (t_3 - t_4))/2 , ,\qquad \varepsilon = \varphi_p \times \varphi_c \times \delta,. \end{align*}$

Note that peers also transmit the delay and dispersion statistics accumulated at each stratum from the root down to the peer, yielding the root delay and dispersion, \theta_r and \varepsilon_r. These statistics are used to assess the "quality" of the samples computed from each peer.

Simple NTP clients that are only synchronized to one server, that don't act as a source for other peers, and that don't need a high precision, can implement only the on-wire protocol and directly use \theta to correct their local clock. However this estimate is vulnerable to network jitter, delay asymmetry, clock drift and wander. To get better precision, the samples produced by the on-wire protocol must be processed by a number of mitigation algorithms.

Links: Mills, Analysis and Simulation of the NTP On-Wire Protocols, RFC5905 - section 8, SNTP (RFC4330)

Clock Filter

The clock filter is a shift register containing the last 8 samples produced by the on-wire protocol. The filter algorithm selects the sample with maximum expected accuracy. It is based on the observations that:

  • The samples with the lowest network delay were likely to exhibit a low error at the time they were produced.
  • Older samples carry more accuracy than newer ones due to clock drift.

As a new sample is pushed to the register, it evicts the oldest sample. The dispersion values of the other samples are then incremented by \varphi_c\times\Delta_t, where \Delta_t is the time elapsed since the last update from the server. A distance metric \lambda_i is associated to each sample:

\lambda_i = \frac{\delta_i}{2}+\varepsilon_i \,.

The samples are then sorted in a temporary list by increasing value of \lambda_i. The list is pruned to leave only samples whose dispersion is less than a given maximum. The first sample in the list is chosen as the new best candidate, likely to give the most accurate estimate of the time offset \theta. However it is only used if it is not older than the last selected sample.

The peer's dispersion \varepsilon_p is computed as an exponentially weighted average of the sorted samples' dispersions:

\varepsilon_p = \sum_{i=0}^{N-1}\frac{\varepsilon_i}{2^{(i+1)}} \,.

The server's jitter \psi_p is computed as the root mean square (RMS) of the offsets differences with respect to the offset of the first sample's in the list [NOTE(hayden): This sentence is difficult to read -- it may be because of the multiple forms of gramatical possessiveness. There also might be some typos in that regard][NOTE(martin): let me know if this one works better: The server's jitter \psi_p is computed as the root mean square (RMS) of the differences of each offset with the first offset in the list]:

\psi_p = \frac{1}{n-1}\sqrt{\sum_{i=0}^{N-1}(\theta_0-\theta_i)^2} \,.

Links: Mills, Clock Filter Algorithm, RFC5905 - section 10

Clock Selection

The estimates we get from the clock filters of each peer can be in contradiction, either because of network perturbation or because some peers are faulty (or malicious!). The clock selection algorithm tries find a majority group of peers with consistent time estimates (the "truechimers"), and eliminates the other peers (the "falsetickers").

The algorithm is based on the observation that for a non-faulty peer, the time offset error at the time of measurement is bounded by half the round-trip delay to the reference clock. This bound then increases with the age of the sample, due to the accumulated dispersion.

The true offset thus lies in the correctness interval [\theta - \lambda_r, \theta + \lambda_r] (\lambda_r being the accumulated distance to the root). Two peers whose correctness intervals do not intersect can not possibly agree on the time, and one of them must be wrong. The algorithm tries to find the biggest majority of candidates whose correctness intervals share a common intersection, and discards the other candidates.

Links: Mills, Clock Select Algorithm, RFC5905 - section 11.2.1, Marzullo and Owicki, 1983

Cluster Algorithm

Once falsetickers have been eliminated, truechimers are placed in a survivor list, that is pruned by the cluster algorithm in a series of rounds.

For each candidate, it computes the selection jitter, which is the root mean square of offset differences between this candidate and all other peers. Candidates are ranked by their selection jitter and the candidate with the greatest one is pruned from the survivor list.

The cluster algorithm continues until a specified minimum of survivors remain, or until the maximum selection jitter is less than the minimum peer jitter (in which case removing the outlier wouldn't improve the accuracy of the selection).

To get some grasp of why it is designed that way, imagine all samples are coming from the same "idealized" peer, and that we select one "most likely accurate" sample. Then, as we already saw in the clock filter algorithm, we can compute the jitter of that idealized peer. The selection jitter is essentially a measure of the jitter implied by the decision to consider that sample the most accurate. Now to understand the termination condition, remember that each sample is produced by a peer that itself exhibits some jitter. There is no point in trying to get a better jitter from the combination of all samples, than the minimum peer jitter.

Links: Mills, Cluster Algorithm, RFC5905 - section 11.2.2

Combine

The combined offset and jitter are computed as an average of the survivor's offsets and jitters, weighted by the inverse of the root server's distance.

Clock Discipline

The combined estimates are passed to the clock discipline algorithm which implements a feedback control loop, whose role is to direct the local system clock towards the reference clock. At its core the clock discipline consists of the non linear combination of a phase locked loop (PLL) and a frequency locked loop (FLL). It results in a frequency correction and a phase correction.

The discipline is further enhanced by a state machine that is designed to adapt the polling interval, and the relative contribution of the PLL and FLL, in order to allow fast initial convergence and high stability in steady state. It also handles special situations such as bursts of high latency and jitter caused by temporarily degraded network conditions.

Links: Mills, Clock Discipline Algorithm, Mills, Clock State Machine, Mills, Clock Displine Principles, RFC5905 - section 11.3

Clock Adjust

The clock adjust process is called every second to adjust the system clock. It sums the frequency correction and a percentage of the offset correction to produce an adjustment value. The residual offset correction then replaces the offset correction. Thus, the offset correction decays exponentially over time. When the time since the last poll exceeds the poll interval, the clock adjust process triggers a new poll.

It is worth noting that because it is crucially important to preserve a monotony property, the system clock is never actually stepped back or forth: instead, the system usually throttles the clock up or down until the desired correction has been achieved.

Links: RFC5905 - section 12

And this concludes our tour of NTP!

How long things take: durations

Computer clocks


TODO:

  • Style check:
    • em dashes
    • "Earth" and "Sun" are always capitalized
    • "time zone" is two words
    • "daylight saving time", not "daylight savings time"

EVERYTHING BELOW HERE IS OUR INITIAL BRAINSTORM

This will be our article three. Ben will get to this when he has time, unless someone else wants to jump in and lead the content of this.

  • Timezones
    • They are not always on the hour
    • Difference time offset and timezone
      • (locale stuff, formatting, DST transition dates, etc.)
  • Daylight savings
    • Days can be 23 or 25 hours long
    • Countries change their DST rules all the time
  • Dates, times, datetimes
    • Maybe you don't actually want to store full timestamps for literally everything
  • Time math is tricky
    • Adding 24 hours != adding one day (leap days / seconds)
  • Calendars
    • Historical calendars are very weird (kings just removing or adding days when they felt like it)
    • The calendar we use, and its extensions: Gregorian calendar, proleptic Gregorian calendar
    • Other calendars: Jewish calendar, Chinese lunar calendar (?), Hijiri, Japanese calendar (?)
      • Primarily used for establishing holidays, but not really for common use any more (with the exception of China...?)
  • Computer clocks
    • CPU time vs. wall time
    • Monotonic clocks
      • Two kinds on Linux: one that smears forward, one that does not
    • Jiffies
  • Atomic clocks
  • NTP / chrony
  • Useful specs:
  • NOBODY USES ISO8601 STOP SAYING THAT THEY DO
    • NO ONE KNOWS WHAT IT IS
    • RFC3339 IS THE ONLY ONE ANYONE SHOULD USE
  • Accuracy of onboard clocks
    • Quartz oscillators - subject to drift
    • How to correct inaccurate clocks
  • Time smearing
  • Presentation and formatting?
  • The UNIX epoch, and the year 2038 problem
  • ISO "week date" (does this matter to anyone)
  • Falsehoods I think are actually important
    • "Always store time in UTC"
    • "Always store fully-qualified timestamps" i.e. datetimes with timezone, not just dates
    • "The smallest unit of time is X" (i.e. store the start and end of your time ranges precisely, never store 11:59:59)
  • Other advice
    • If anyone ever tells you to "store things for a month", clarify whether you need an actual calendar month or simply a 30-day period.

You get a time you trust - then you work with it in some way (???)

How to use dates and times

Time accuracy / how to get accurate times

Fun tangents

East Asian countries treat birthdays differently: https://en.wikipedia.org/wiki/East_Asian_age_reckoning. In particular, Koreans (and others??) all celebrate their birthday on January 1, and start counting at 1 instead of 0.