It is difficult to overstate the importance of synchronized time to modern computer systems. Our lives today depend on the financial transactions, telecommunications, power generation and delivery, high-speed manufacturing, and discoveries in “big physics,” among many other things, that are driven by fast, powerful computing devices coordinated in time with each other.
Since the first complete specification of Network Time Protocol (NTP) version 1 and its accompanying algorithms appeared in RFC 1059 in 1988, NTP has played a large role in time synchronization by keeping the clocks of networked computer systems synchronized to within milliseconds of each other. NTP has been deployed to a vast number of systems over the years, yet it hardly bears the burden of clock synchronization alone. When users want to coordinate events in time between multiple systems, they typically have many options for accomplishing this, all with different trade-offs.
One of the emerging alternatives to NTP is PTP (Precision Time Protocol). PTP is defined by IEEE standard 1588, published in November 2002 and based on early prototypes built at Agilent Technologies between 1990 and 1998. A revision of PTP with additional features and improved performance was published in 2008; it is known as PTP version 2 or IEEE 1588-2008 (all references to PTP in this article refer to this later version).
PTP and NTP are similar in that both are packet-based and send time-stamps over a network from a time reference device to one or more other devices. Additionally, both synchronize device clocks based on time offsets and network delays, and both support heterogeneous devices with varying clock time precision, resolution, and stability over varying amounts of physical separation. Each protocol has its unique strengths, and choosing one over the other often warrants an evaluation of a system’s environment, capabilities, and goals.
PTP is often chosen when the synchronization performance requirements of systems exceed the millisecond threshold of a typical NTP-based solution. When used with PTP-capable network hardware that has the ability to timestamp PTP packets precisely (something that is quickly becoming commonplace in industrial network interfaces), devices using PTP on a LAN (local-area network) can synchronize their clocks to within tens of nanoseconds of each other. Without hardware timestamping, referred to as a software-only configuration, PTP implementations can still achieve sub-millisecond precision.
NTP remains a popular synchronization technology, even as more PTP implementations have been made available to system designers on more platforms—both commercially and as freely available open source implementations. If PTP is available to a system designer and displays superior synchronization performance, why would NTP even be considered? Should PTP replace NTP altogether? If not, what do system designers need to know in order to choose the appropriate protocol? Perhaps the answers to these questions can be discerned from the circumstances that led up to the definition of PTP and a more in-depth look at this newer standard.
Measurement and Control Devices: Time for Something New?
Measurement and control devices have always been a vanguard for high-precision event synchronization. To achieve the degree of synchronization that devices of this nature require, signals sent over specialized cables can be used to synchronize events between devices. These cables, which are used exclusively for event synchronization, are often matched in length to ensure propagation delay is consistent. Synchronization using this dedicated cabling results in extremely high precision, where events can be coordinated to within picoseconds of each other across multiple devices in proximity. This type of synchronization is commonly referred to as signal-based.
While nearly unbeatable for applications requiring the most accurate synchronization possible, signal-based synchronization can be highly impractical and sometimes not possible. The dedicated cabling needed to synchronize separate devices can be cost prohibitive, and signal-based synchronization requires specialized hardware and software to generate and receive the signals on the cable. The signaling protocol can be proprietary, resulting in potential vendor tie-in, risk of discontinuation, or other legal or technical restrictions. The cables themselves are often subject to varying propagation delay over time and temperature, and as more devices are added to a system, the complexity of cabling multiple devices increases the maintenance burden and effort in troubleshooting failures. Signal-based synchronization also requires the devices be relatively close to each other and does not scale over long distances when compared with other synchronization mechanisms.
Meanwhile, with Ethernet becoming more and more ubiquitous in the laboratories and on factory floors where measurement and control devices are deployed, a need arose for these devices to be able to use a LAN or even a wide-area network (WAN) for control and data communication. NTP was even leveraged to set system time for these devices, but the need for dedicated event synchronization cables still existed. Despite the presence of all the basic ingredients for event synchronization using coordinated, distributed timekeepers (sometimes referred to as time-based synchronization), an acceptable technology that could use this infrastructure to replace signal-based synchronization had yet to be created.
Measurement and control devices have always been a vanguard for high-precision event synchronization.
Because signal-based implementations impose the constraints previously mentioned, time-based solutions using Ethernet were investigated further as a synchronization solution. At first glance, NTP seems like a good candidate for a low-cost time-based synchronization solution—and it is for many applications. Signal-based synchronization, however, provides an extremely high level of precision, and NTP version 3 (until recently the officially supported NTP release) provides only millisecond precision, which is not even close to being sufficient for applications using signal-based solutions. PTP was designed to meet the needs of the measurement and control industry and is capable of near-nanosecond precision while taking advantage of infrastructure that is similar to what NTP uses. A closer look at PTP reveals why it is successful for measurement and control applications—and, as it developed, many other applications as well.
PTP’s primary design goals have been listed in numerous presentations and documents, including the IEEE 1588 standard:1
- To provide sub-microsecond synchronization of real-time clocks in components of a networked distributed measurement and control system;
- To perform best with relatively localized systems typical of industrial automation and test and measurement environments;
- To be applicable to LANs supporting multicast communications (including but not limited to Ethernet);
- To provide a simple, administration-free installation;
- To support heterogeneous systems of clocks with varying precision, resolution, and stability; and
- To impose minimal resource requirements on networks and host components.
PTP meets these goals using a robust synchronization methodology, an algorithm that automatically and continuously maintains the proper device hierarchy for maximum accuracy, and specialized hardware (required only for optimal performance).
Synchronization methodology. At the heart of the PTP standard is the synchronization methodology. While similar to other time-based Ethernet synchronization protocols in concept, PTP’s synchronization methodology is unique and somewhat dependent on the particular hardware and application (power industry, telecommunications, among others) of a PTP deployment.
PTP defines a master-slave hierarchy based on criteria that describe a device’s timekeeping capability and the traceability of its time source. The master serves as the time reference for one or more slave devices. The process of selecting the master from the list of participating devices is defined in PTP’s Best Master Clock (BMC) algorithm, which is applied by each device at specific intervals. Devices (often referred to as ordinary clocks) may consider themselves masters either because they have not yet evaluated themselves against other clocks or have determined, according to BMC, that they now have better timekeeping ability than the current master. They will transmit Announce messages using UDP (User Datagram Protocol) multicast (by default) at configurable intervals. The other devices will process these Announce messages according to BMC and select the new master. If a master receives an Announce message from another potential master (known as a foreign master) and the device’s BMC indicates this foreign master should be master, the current master will transition to the slave state.
In addition to Announce messages, a master clock periodically transmits a Sync message using UDP multicast (by default), which is received by a slave clock. Each slave uses a Sync message to calculate the difference between its clock and the master clock. The message contains a timestamp from the master representing when it was issued (t1 in Figure 1); when the slave receives the Sync message, it records its time of receipt (t2). The time in the Sync message does not represent the precise time the message left the device, since it was not known until after it was sent. The master then sends a Follow-up message that includes the actual time the Sync message left the master as determined by specialized hardware (if equipped) or the network driver. The slave receives the Follow-up and uses that value as the actual t1.
At this point, the slave has two time values (t1 and t2) and can compute the offset between its timekeeper and the master’s. Unfortunately, the offset derived from t1 and t2 includes some unknown amount of additional propagation delay incurred by the network. To determine this delay and compute the actual offset between timekeepers, the slave issues a Delay Request message to the master and notes the time it was sent (t3). The master notes when, according to its timekeeper, it receives the Delay Request (t4) and issues a Delay Response message back to the slave containing t4. When the slave receives the Delay Response it will have four timestamps—11, t2, t3, and t4—and can compute the offset between its timekeeper and the master’s timekeeper while properly taking into account the network delay.
Hardware timestamping and “software-only” configurations. To find the actual time the Sync message was sent from the master in order to insert it in the Follow-up, the master must know exactly when its network hardware was able to send the Sync message. This hardware is most likely the network interface’s physical transceiver (PHY) or other hardware that recognizes PTP packets and notes the precise time they were sent or received. The difference between when the master’s PTP software initiated the sending of this message (the estimated value of t1 included in the Sync message) and the time the PHY was able to send the signals on the physical media will not only vary, but will also be quite significant with respect to the sub-microsecond precision PTP is capable of. Therefore, the time the Sync message spends in the master’s network stack needs to be accounted for in order to achieve maximum accuracy (see Figure 2).
PTP defines another, slightly different synchronization mechanism that takes advantage of additional hardware support, if available. The Sync and Follow-up messages used to calculate the offset between the master and slave described earlier are used by a two-step clock. A one-step clock uses specialized network hardware not only to timestamp when a PTP Sync message leaves the device, but also to modify the outgoing Sync message’s t1 value with the actual departure time (see Figure 3).
This value is normally sent in the Follow-up message, but because the hardware makes it available in the Sync message, the Follow-up is redundant and therefore not needed. A slave device must also understand that its master is operating as a one-step clock. It can determine this by reading a bit field in the PTP message headers sent by the master. A one-step clock helps minimize network traffic while maintaining high synchronization performance and is therefore a requirement for PTP applications in certain industries.
PTP-aware network interface hardware can also simply timestamp PTP messages and correlate them with message IDs for later retrieval by PTP software. This capability allows a PTP master operating in two-step mode to send the precise t1 value in a Follow-up message to slaves. While this degree of hardware support does not make (accurate) one-step operation possible, it does make for superior two-step performance when compared with a software-only implementation. A PTP device is considered to be operating as a software-only clock if it has no hardware support. A software-only clock is limited to two-step operation and typically sends t1 values in the Follow-up messages that are retrieved from software components as low as possible in the software stack, usually from the driver level. Although software-only clocks are obviously not as accurate as those with hardware assist, they are still capable of achieving sub-millisecond precision.
The Best Master Clock algorithm. The BMC algorithm gives PTP devices the ability to maintain the desired synchronization hierarchy under changing network conditions. Describing the BMC algorithm itself is beyond the scope of this article, but needless to say, the BMC is a key part of the “simple, administration-free” aspects of PTP’s objectives. Any PTP device acting as either a master or slave continuously runs the BMC and uses it in determining if a new master needs to be selected, or if the device needs to transition out of the master state.
This type of state change can occur as the result of a number of conditions, all of which are reflected in a master’s (or potential master’s) Announce message. Masters and devices that could potentially be masters issue Announce messages at a configurable rate as part of the protocol. The Announce message contains all the pertinent information the BMC needs to determine if the current master should remain master or yield to a new master as a slave, or in the case of a slave, should start listening to a new master or become a master itself. Some of the attributes of an Announce message are the device’s time source (GPS, atomic clock, or free-running oscillator); the “priority” as determined by the local PTP administrators (which is used as an override mechanism and not required to be set for proper operation); the device’s clock ID (which typically includes the device’s MAC address); and other attributes used by the BMC.
Having every PTP device run the BMC and process Announce messages means administrators can simply power on a system and have a network of time-synchronized devices automatically configured for optimal performance, regardless of spontaneous network topology changes.
Boundary clocks. Because switches and routers effectively segment a PTP network, PTP introduced boundary clocks as a means of distributing a master clock to different parts of the network. The PTP standard describes a boundary clock as containing a single timekeeper disciplined by PTP but having multiple PTP ports in a domain. A port may serve as either the source of time (a PTP master) to devices attached to it or one that synchronizes the timekeeper (a PTP slave) to some other clock connected to that port. A boundary clock can be implemented to replace a traditional network switch or router in larger networks that are normally segmented by such devices. Because boundary clocks differ in operation from the PTP clocks previously described in this article, PTP differentiates the two by referring to them as either ordinary clocks or boundary clocks.
Each port of a boundary clock can be thought of as a separate ordinary clock instance that shares a single timekeeper with the boundary clock’s other ordinary clock instances. Only one port on the device can be in the slave state, which eliminates contentious use of the device’s timekeeper (two ports trying to adjust the time, for example). All other ports are considered masters to the devices on their respective segments.
The existence of boundary clocks requires PTP to use the term grandmaster to describe the master to the entire PTP network, since the slaves on a boundary-clock port consider the boundary clock to be their master. Each master port is responsible for handling the same duties as an ordinary clock master, which effectively hides all of the slaves from the boundary clock’s master. Likewise, a slave ordinary clock (or another boundary clock with the connected port in the slave state) is hidden from the PTP hierarchy “above” the boundary clock. A boundary clock does not pass the PTP synchronization messages from its slaves “up” to its master. Without this behavior, a grandmaster would be responsible for processing Delay Request messages and issuing Delay Response messages from and to, respectively, every slave device on the entire PTP network. In most cases it would not be able to run the protocol stack effectively.
A boundary clock, however, may still allow any eligible slave clock in the entire PTP network to be grandmaster. For example, if the grandmaster goes offline, the next most eligible slave device can announce itself as master (once its BMC algorithm has determined it’s appropriate to do so), and the boundary clock will transition the port connected to that slave to the slave state. That boundary clock will then have a port that was once in the slave state now in the master state to other ordinary and boundary clocks. Those clocks will then evaluate that new master with the BMC algorithm and transition appropriately, repeating this process for the rest of the hierarchy. Depending on the network topology, this situation may not be ideal as the number of hops in between this new master and a slave will have increased by one (the boundary clock connected to the new master), thus increasing any synchronization error accumulation.
The use of boundary clocks and the resulting hierarchy of PTP devices must be considered in order to maximize systemwide synchronization precision (see Figure 4).
Boundary clocks can also be used for bridging networks that use different networking protocols (illustrated in Figure 5), since there is no requirement that PTP implementations use the same underlying communication media or technology. For example, a system can have some devices using Ethernet and others using DeviceNet, all synchronized to the same grandmaster through the use of capable boundary clocks. In this scenario, a boundary clock would have a DeviceNet-capable port connected to the DeviceNet devices, and another connected to Ethernet devices. The specific communication media is abstracted from the PTP clocks, allowing both types of devices to synchronize to the same PTP grandmaster regardless of that grandmaster’s media. In addition to different networking protocols, boundary clocks can also join PTP systems that use different delay calculation mechanisms, which are described later.
Transparent switches. Not all applications allow their PTP devices to be deployed in a manner that lends itself well to a balanced, treelike hierarchy. Systems are sometimes deployed in long linear or ring topologies, which can cause significant synchronization error accumulation when boundary clocks are used to join these segments. Because of this, PTP defines a device known as a transparent switch, which connects groups of PTP devices without segmenting the PTP network.
A transparent switch recognizes PTP messages passing through and notes each message’s residence time, the time spent in the switch where the message is not yet visible by the intended PTP device. The residence time is added to the PTP message’s correction field just before being transmitted from the switch to the next device (see Figure 6). PTP clocks can then examine the received message’s correction field and apply it to their calculations. Even though the message was temporarily held up in the transparent switch—a nondeterministic behavior that normally introduces significant synchronization error—the correction field allows that time to be removed, as if the switch were never there (hence, the name transparent switch).
Unlike boundary clocks, transparent switches expose their slave devices to the PTP master. The transparent switch is typically interested in only a relative time (the time a message spends in the switch) and therefore does not need to have a timekeeper synchronized to the master’s time. The oscillators that “tick” in both the master and the switch, however, must tick at the same rate. Keeping this rate the same is known as syntonization. PTP specifies that transparent switches must be syntonized—not necessarily synchronized—to the master.
The peer delay mechanism. The synchronization model described earlier, where the slave issues a Delay Request message and the master responds with a Delay Response, is known as the delay request-response mechanism, or sometimes as end-to-end mode. PTP offers an alternative to this known as the peer-delay mechanism, or peer-to-peer mode, which can provide superior performance in certain situations. Because end-to-end mode and peer-to-peer mode cannot be used together, system designers have to evaluate which delay mechanism will provide the best results and design their systems accordingly.
In peer-to-peer mode, a device issues a Peer Delay Request message to its immediate neighbor, which may or may not be the device’s master. The receiving device responds with a Peer Delay Response message (and optionally a Peer Delay Response Follow-up after that if the device is operating in a two-step mode). This allows the requesting device to calculate the propagation delay for the individual segment.
NTP remains a popular synchronization technology, even as more PTP implementations have been made available to system designers on more platforms—both commercially and as freely available open source implementations.
By knowing the exact propagation delay for each segment of a network path, peer-to-peer mode allows PTP to apply delay compensations between master and slaves that are more accurate than end-to-end mode allows when the intermediate switches choose different paths. Since peer-to-peer mode specifies that transparent switches adjust the correction field with not only the residence time of Sync and Follow-up messages (just as a transparent switch operating in end-to-end mode does), it also adds the delay previously calculated for the link the message came in on (see Figure 7).
This behavior means the master need not process Delay Request messages from each of its slaves; instead it concerns itself only with Peer Delay Requests and Responses for its immediate peer (transparent switch or PTP clock in slave state). Because of this, transparent switches in peer-to-peer mode do not pass Delay Request or Delay Response messages. Unlike end-to-end mode, peer-to-peer mode can be even more attractive to system designers concerned with network traffic, since a master device need not receive and respond to each slave’s Delay Request messages, and concerns itself only with its immediate peer.
Profiles. PTP profiles allow organizations to specify selections of attribute values and optional features of PTP that, when using the same transport protocol, work together and achieve a performance that meets the requirements of particular applications.3 Profiles make PTP better suited for particular applications while adhering to the more general PTP standard. Profiles can specify several aspects of the standard. There are two “default” profiles: Delay Request-Response (often referred to as end-to-end mode) and Peer Delay (often referred to as peer-to-peer mode). Implementers must support at least one of these defaults. Profiles themselves are standardized and defined by a recognized standards organization that has jurisdiction over a particular industry (such as the IEC, IEEE, IETF, ANSI, or ITU). These organizations, as stated in the PTP standard, should consult the Precise Networked Clock Synchronization Working Group of the IM/ST (Instrumentation and Measurements/Sensor Technology) Committee for technical review.
PTP profiles not only change several aspects of the PTP standard, but also extend it. A profile may define its own BMC algorithm; configuration and monitoring (“management”) mechanism; path-delay mechanism (end-to-end or peer-to-peer); use of multicast or unicast; transport mechanism; node types; and any options that are required, permitted, or prohibited. Profiles may also define completely new transport mechanisms and data types. The flexibility that profiles have in morphing PTP to the needs of almost any particular application has proven useful to telecommunication and energy industries, among others.
Unicast. PTP was designed assuming a multicast communication mode, but support for unicast operation was eventually added as an optional feature. The PTP standard does not describe a unicast PTP implementation in detail, but instead describes several optional unicast features that can be used for an implementation “as long as the behavior of the protocol is preserved.”2 Some implementations may require that slave clocks use a configuration that specifies a list of known master clocks by protocol address (for example, a list of IP addresses when used over Ethernet) to discover the potential masters.
PTP profiles not only change several aspects of the PTP standard, but also extend it.
This unicast discovery mechanism is optional, meaning a unicast implementation could choose to use multicast for discovery of master clocks and unicast for all other messaging. Furthermore, this discovery mechanism may also require some amount of configuration to define the list of masters, since that is most likely specific to a given system and stretches the interpretation of the PTP objective to “provide a simple, administration-free installation.”1 Another optional implementation detail defined by PTP is the use of the unicast-negotiation mechanism, which involves sending specific signaling messages to master devices indicating that they respond with a unicast Announce, Sync, Delay Response, or Peer Delay Response to the signaling slave device. This flexibility in allowing unicast operation and providing several optional features to implement it allows profiles to define the specific unicast implementation details best suited for their applications.
Timescale. The timescale for a PTP network is defined by the grandmaster and can be one of two types: the default PTP timescale or an ARB (arbitrary) timescale.5 With the ARB timescale, the epoch is set by some predetermined procedure and can be set again using that procedure during normal operation. The PTP timescale uses the PTP epoch, and its unit of time is the SI second. The PTP epoch is 1 January 1970 00:00:00 TAI (International Atomic Time), which is 31 December 1969 23:59:51.999918 UTC (Coordinated Universal Time).
Using the Right Tool for the Job: NTP or PTP?
The requirements of devices for the measurement and control industry are similar to those of many other industries—and many innovative outcomes have resulted from applying technology in ways that its designers had not originally considered—but the intended applications for any technology still should be considered before adopting it, regardless of the similarities it may have to the incumbent technology.
Objectives. As described earlier and stated in the standard, PTP was designed to be used over a LAN, or more specifically, “spatially localized systems with options for larger systems.”4 This is one of the more significant differentiators between PTP and NTP. The use of a LAN allows other PTP objectives to be fulfilled using techniques such as multicast for discovery and automatic selection of PTP masters, network equipment such as boundary clocks and transparent switches, and very high message exchange rates that may not be feasible over a WAN. A LAN also gives PTP some liberties that NTP does not usually have, such as assuming—with a reasonable degree of confidence—that unrelated network traffic and security risks are both low, given that LAN usage is usually confined and controlled.
In contrast, NTP is typically used over the Internet and is therefore subject to a large amount of nondeterministic delays from intermediate network elements (such as routers) and exposed to a far greater number of security threats (denial-of-service and man-in-the-middle attacks being some of the more obvious). It must accept these penalties or, in the case of security, account for these challenges.
Security in particular is worth highlighting, since PTP includes only an experimental extension to the protocol to address security concerns, but NTP defines the use of access-control lists and a variant of public-key cryptography called Autokey.6 Also note that NTP can use a multicast mode to discover servers automatically when used on a LAN, and PTP can operate in a unicast mode to be used over a WAN. Neither of these uses, however, is the most common and may impose additional configuration costs.
Another PTP objective is administration-free operation, where the devices that make up a system can be deployed with little or no configuration yet still achieve optimum time synchronization for the given environment. Devices can be added, removed, or reconfigured while the system is in use, and the PTP devices that make up the system will automatically negotiate a new hierarchy in order to maintain optimum synchronization performance. PTP’s BMC algorithm is responsible for this behavior. NTP’s optimization algorithm does not permit the same degree of autonomy in allowing any device to become the equivalent of a PTP grandmaster if necessary, despite the inclusion of a dynamic discovery scheme in the latest NTP specification.7 Instead, NTP defines a series of mitigation algorithms to be used in finding the optimal network path,6 provided an NTP client has been configured to select among more than one server.
Synchronization methodologies. While the synchronization methodologies employed by PTP and NTP are similar in that they both ultimately compute clock offset and message delays, the protocols differ greatly in various mechanisms that must be considered when selecting the appropriate technology. For example, PTP relies on boundary clocks and transparent switches to achieve maximum performance in certain environments. Not including these devices or incorrectly using them could significantly reduce PTP performance.
More relevant to implementers of these specifications is the fact that PTP does not define a servo algorithm for applying the information PTP gives a device to the device’s oscillator. Instead, the servo definition is implementation-specific, and there are no guarantees that different PTP software stacks will exhibit the same synchronization behavior on the same device. In stark contrast, NTP “defines a highly evolved, adaptive-parameter, hybrid phase-frequency lock loop” used to adjust the device’s timekeeper from data provided by NTP.6
The timescales of both technologies also differ. NTP uses UTC time, while PTP (typically) uses TAI and a UTC offset. This difference could be significant to system designers who assume a specific timescale. Concerns about properly handling leap seconds might factor in as well; unlike PTP, a leap second will cause the entire NTP timescale to shift by one second.
Performance expectations. A typical PTP-synchronized system can expect sub-microsecond synchronization precision, where “typical” includes hardware assist and a LAN. A typical NTP-synchronized system, meaning no specialized hardware and devices connected over a WAN, will achieve millisecond synchronization precision. When PTP is appropriately configured for use over a WAN, however, there may be little to no performance advantage over NTP.8
PTP has the capability to synchronize devices to within nanoseconds of each other over a common networking infrastructure, allowing system designers to replace synchronization solutions that are more expensive, limited, or both. NTP has similar use cases, but usually falls short for applications that require the level of performance typical of measurement and control systems. PTP’s BMC algorithm allows it to adapt to changing conditions to ensure devices always have the highest-quality time reference. PTP boundary clocks and transparent switches ensure high synchronization performance even in a non-ideal network topology. In contrast, NTP requires that all devices be configured to reference a predetermined set of time servers prior to use, and its performance suffers when messages have to traverse network elements such as switches.
PTP’s intended environment is different from NTP’s, however, and depending on the application, NTP may be a better choice. For example, NTP’s more established security mechanism and publicly available pool of time servers9 make it better suited for synchronizing time over the Internet when performance requirements permit.
PTP has filled a niche that NTP has not been able to, but it has not replaced it. Instead, PTP offers system designers a new synchronization tool to put in their toolboxes.
Principles of Robust Timing over the Internet
Julien Ridoux, Darryl Veitch
Modern Performance Monitoring
The One-Second War (What Time Will You Die?)