Architecture and Hardware Contributed articles

Computing Needs Time

The passage of time is essential to ensuring the repeatability and predictability of software and networks in cyber-physical systems.

By Edward A. Lee

Posted May 1 2009

Introduction
Glib Responses
Correctness
Requirements
Solutions
Conclusion
Acknowledgments
References
Author
Footnotes
Figures

Most microprocessors are embedded in systems that are not first-and-foremost computers. Rather, these systems are cars, medical devices, instruments, communication systems, industrial robots, toys, and games. Key to them is that they interact with physical processes through sensors and actuators. However, they increasingly resemble general-purpose computers, becoming networked and intelligent, often at the cost of dependability.

Even general-purpose computers are increasingly asked to interact with physical processes. They integrate media (such as video and audio), and through their migration to handheld platforms and pervasive computing systems, sense physical dynamics and control physical devices. They don’t always do it well. The technological basis that engineers and computer scientists have chosen for general-purpose computing and networking does not support these applications well. Changes that ensure this support could improve them and enable many others.

The foundations of computing, rooted in Turing, Church, and von Neumann, are about the transformation of data, not physical dynamics. Computer scientists must rethink the core abstractions if they truly want to integrate computing with physical processes. That’s why I focus here on a key aspect of physical processes—the passage of time—that is almost entirely absent in computing. This is not just about real-time systems, which accept the foundations and retrofit them with temporal properties. Although that technology has much to contribute to systems involving physical processes, it cannot solve the problem of computers functioning in the physical world alone because it is built on flawed technological foundations.

Many readers might object here. Computers are so fast that surely the passage of time in most physical processes is so slow it can be handled without special accommodation. But modern techniques (such as instruction scheduling, memory hierarchies, garbage collection, multitasking, and reusable component libraries that do not expose temporal properties in their interfaces) introduce enormous variability and unpredictability into computer-supported physical systems. These innovations are built on a key premise: that time is irrelevant to correctness and is at most a measure of quality. Faster is better, if you are willing to pay the price in terms of power consumption and hardware. By contrast, what these systems need is not faster computing but physical actions taken at the right time. Timeliness is a semantic property, not a quality factor.

But surely the “right time” is expecting too much, you might say. The physical world is neither precise nor reliable, so why should we demand such properties from computing systems? Instead, these systems must be robust and adaptive, performing reliably, despite being built out of unreliable components. While I agree that systems must be designed to be robust, we should not blithely discard the reliability we have. Electronics technology is astonishingly precise and reliable, more than any other human invention ever made. Designers routinely deliver circuits that perform a logical function essentially perfectly, on time, billions of times per second, for years on end. Shouldn’t we aggressively exploit this remarkable achievement?

We have been lulled into a false sense of confidence by the considerable success of embedded software in, say, automotive, aviation, and robotics applications. But the potential is much greater; hardware and software design has reached a tipping point, where computing and networking can indeed be integrated into the vast majority of artifacts made by humans. However, as we move to more networked, complex, intelligent applications, the problems of real-world compatibility and coordination are going to get worse. Embedded systems will no longer be black boxes, designed once and immutable in the field; they will be pieces of larger systems, a dance of electronics, networking, and physical processes. An emerging buzzword for such systems is cyber-physical systems, or CPS.

The charter for the CPS Summit in April 2008 (ike.ece.cmu.edu/twiki/bin/view/CpsSummit/WebHome) says “The integration of physical systems and processes with networked computing has led to the emergence of a new generation of engineered systems: cyber-physical systems. Such systems use computations and communication deeply embedded in and interacting with physical processes to add new capabilities to physical systems. These cyber-physical systems range from miniscule (pacemakers) to large-scale (the national power grid). Because computer-augmented devices are everywhere, they are a huge source of economic leverage.

“…it is a profound revolution that turns entire industrial sectors into producers of cyber-physical systems. This is not about adding computing and communication equipment to conventional products where both sides maintain separate identities. This is about merging computing and networking with physical systems to create new revolutionary science, technical capabilities and products.”

The challenge of integrating computing and physical processes has been recognized for years,²⁰ motivating the emergence of hybrid systems theories. However, progress is limited to relatively simple systems combining ordinary differential equations with automata. Needed now are new breakthroughs in modeling, design, and analysis of such integrated systems.

CPS applications, arguably with the potential to rival the 20th century IT revolution, include high-confidence medical devices, assisted living, traffic control and safety, advanced automotive systems, process control, energy conservation, environmental control, avionics, instrumentation, critical infrastructure control (electric power, water resources, and communications systems), distributed robotics (telepresence, telemedicine), defense systems, manufacturing, and smart structures. It is easy to envision new capabilities that are technically well within striking distance but that would be extremely difficult to deploy with today’s methods. Consider a city without traffic lights, where each car gives its driver adaptive information on speed limits and clearances to pass through intersections. We have all the technical pieces for such a system, but achieving the requisite level of confidence in the technology is decades off.

Other applications seem inevitable but will be deployed without benefit of many (or most) developments in computing. For example, consider distributed real-time games that integrate sensors and actuators to change the (relatively passive) nature of online social interaction.

Today’s computing and networking technologies unnecessarily impede progress toward these applications. In a 2005 article on “physical computing systems,” Stankovic et al.²⁵ said “Existing technology for RTES [real-time embedded systems] design does not effectively support development of reliable and robust embedded systems.” Here, I focus on the lack of temporal semantics. Today’s “best-effort” operating system and networking technologies cannot produce the precision and reliability demanded by most of these applications.

Glib Responses

Calling for a fundamental change in the core abstractions of computing is asking a lot of computer science. You may say that the problems can be addressed without such a revolution. To illustrate that a revolution is needed, I examine popular but misleading aphorisms, some suggesting that incremental changes will suffice:

Computing takes time. This brief sentence might suggest that if only software designers would accept this fact of life, then the problems of CPS could be dealt with. The word “computing” refers to an abstraction of a physical process that takes time. Every abstraction omits some detail (or it wouldn’t be an abstraction), and one detail that computing omits is time. The choice to omit time has been beneficial to the development of computer science, enabling very sophisticated technology. But there is a price to pay in terms of predictability and reliability. This choice has resulted in a mismatch with many applications to which computing is applied. Asking software designers to accept the fact that computing takes time is the same as asking them to forgo a key aspect of their most effective abstractions, without offering a replacement.

If the term “computing” referred to the physical processes inside a computer, rather than to the abstraction, then a program in a programming language would not define a computation. One could define a computation only by describing the physical process. A computation is the same regardless of how it is executed. This consistency is, in fact, the essence of the abstraction. When considering CPS, it is arguable that we (the computer science community) have picked a rather inconvenient abstraction.

Embedded systems will no longer be black boxes, designed once and immutable in the field; they will be pieces of larger systems, a dance of electronics, networking, and physical processes.

Moreover, the fact that the physical processes that implement computing take time is only one reason the abstraction is inconvenient. It would still be inconvenient if the physical process were infinitely fast. In order for computations to interact meaningfully with other physical processes, they must include time in the domain of discourse.

Time is a resource. Computation, as expressed in modern programming languages, obscures many resource-management problems. Memory is provided without bound by stacks and heaps. Power and energy consumption are (mostly) not the concern of a programmer. Even when resource-management problems are important to a particular application, there is no way for a programmer to talk about them within the semantics of a programming language.

Time is not like these other resources. First, barring metaphysical discourse, it is genuinely unbounded. To consider it a bounded resource, we would have to say that the available time per unit time is bounded, a tautology. Second, time is expended whether we use it or not. It cannot be conserved and saved for later. This is true, to a point, with, say, battery power, which is unquestionably a resource. Batteries leak, so their power cannot be conserved indefinitely, but designers rarely optimize a system to use as much battery power before it leaks away as they can. Yet that is what they do with time.

If time is indeed a resource, it is a rather unique one. Lumping together the problem of managing time with the problems of managing other more conventional resources inevitably leads to the wrong solutions. Conventional resource-management problems are optimization problems, not correctness problems. Using fewer resources is always better than using more. Hence, there is no need to make energy consumption a semantic property of computing. Time, on the other hand, needs to be a semantic property.

Time is a nonfunctional property. What is the “function” of a program? In computation, it is a mapping from sequences of input bits to sequences of output bits (or an equivalent finite alphabet). The Turing-Church thesis defines “computable functions” as those that can be expressed by a terminating sequence of such bits-to-bits functions or mathematically by a finite composition of functions whose domain and co-domain are the set of sequences of bits.

In a CPS application, the function of a computation is defined by its effect on the physical world. This effect is no less a function than a mapping from bits to bits. It is a function in the intuitive sense of “what is the function of the system” and can be expressed as a function in the mathematical sense of a mapping from a domain to a co-domain.¹⁵ But as a function, the domain and co-domain are not sequences of bits. Why do software designers insist on the wrong definition of “function”?

Designers of operating systems, Web servers, and communication protocols reactively view programs as a sequence of input/output events rather than as a mapping from bits to bits. This view needs to be elevated from the theoretical level to the application-programmer level and augmented with explicit temporal dynamics.

Real time is a quality-of-service (QoS) problem. Everybody, from architect to programmer to user, wants quality. Higher quality is always better than lower quality (at least under constant resource use). Indeed, in general-purpose computing, a key quality measure is execution time (or “performance”). But time in embedded systems plays a different role. Less time is not better than more time, as it is with performance. That less time is better than more time would imply that it is better for an automobile engine controller to fire the spark plugs earlier than later. Finishing early is not always a good thing and can lead to paradoxical behaviors where finishing early causes deadlines to be missed.⁷ In an analysis that remains as valid today as it was when first spelled out by Stankovic²⁶ who lamented the resulting misconception that real-time computing “is equivalent to fast computing” or “is performance engineering.” A CPS requires repeatable behavior much more than optimized performance.

Precision and variability in timing are QoS problems, but time itself is much more than a matter of QoS. If time is missing from the semantics of programs, then no amount of QoS will adequately address CPS timing properties.

Correctness

To solidify this discussion, I’ll now define some terms based on the formal computational model known as the “tagged signal model.”¹⁵ A design is a description of a system; for example, a C program is a design, so is a C program together with a choice of microprocessor, peripherals, and operating system. The latter design (a C program combined with these design choices) is more detailed (less abstract) than the former.

More precisely, a design is a set of behaviors. A behavior is a valuation of observable variables, including all externally supplied inputs. These variables may themselves be functions; for example, in a very detailed design, each behavior may be a trace of electrical signals at the system’s inputs and outputs. The semantics of a design is a set of behaviors.

In practice, a design is given in a design language that may be formal, informal, or some mixture of formal and informal. A design in a design language expresses the intent of the designer by defining the set of acceptable behaviors. Clearly, if the design language has precise (mathematical) semantics, then the set of behaviors is unambiguous. There could, of course, be errors in the expression, in which case the semantics will include behaviors not intended by the designer. For example, a function given in a pure functional programming language is a design. A designer can define a behavior to be a pair of inputs and outputs (arguments and results). The semantics of the program is the set of all possible behaviors that defines the function specified by the program. Alternatively, we could define a behavior to include timing information (when the input is provided and the output is produced); in this case, the semantics of the program includes all possible latencies (outputs can be produced arbitrarily later than the corresponding inputs), since nothing about the design language constrains timing.

A correct execution is any execution that is consistent with the semantics of the design. That is, given a certain set of inputs, a correct execution finds a behavior consistent with these inputs in the semantics. If the design language has loose or imprecise semantics, then “correct” executions may be unexpected. Conversely, if the design expresses every last detail of the implementation, down to printed circuit boards and wires, then a correct execution may, by definition, be any execution performed by said implementation. For the functional program just described, an execution is correct regardless of how long it takes to produce the output, because a program in a functional language says nothing about timing.

A repeatable property is a property of behaviors exhibited by every correct execution, given the same inputs; for example, the numerical value of the outputs of a pure functional program is repeatable. The timing of the production of the outputs is not. The timing can be made repeatable by giving more detail in the design by, for example, specifying a particular computer, compiler, and initial condition on caches and memory. The design has to get far less abstract to make timing repeatable.

A predictable property is a property of behaviors that can be determined in finite time through analysis of the design. That is, given only the information expressed in the design language, it needs to be possible to infer that the property is held by every behavior of a correct execution. For a particular functional program, the numerical value of the outputs may be predictable, but given an expressive-enough functional language, it will always be possible to write programs where these outputs are not predictable. If the language is Turing complete, then the numerical value of the outputs may be undecidable. In practice, even “finite time” is insufficient for a property to be predictable in practice. To be usefully predictable, properties must be inferred by a programmer or program analysis tool in reasonable time.

Designs are generally abstractions of systems, omitting certain details. For example, even the most detailed design may not specify how behaviors change if the system is incinerated or crushed. However, an implementation of the design does have specific reactions to these events (albeit probably not predictable reactions). Reliability is the extent to which an implementation of a design delivers correct behaviors over time and under varying operating conditions. A system that tolerates more operating conditions or remains correct for a longer period of time is more reliable. Operating conditions include those in the environment (such as temperature, input values, timing of inputs, and humidity) but may also include those in the system itself (such as fault conditions like failures in communications and loss of power).

A brittle system is one in which small changes in the operating conditions or in the design yield incorrect behaviors. Conversely, a robust system remains correct with small changes in operating conditions or in design. Making these concepts mathematically precise is extremely difficult for most design languages, so engineers are often limited to intuitive and approximate assessments of these properties.

Requirements

Embedded systems have always been held to a higher reliability standard than general-purpose computing systems. Consumers do not expect their TVs to crash and reboot. They count on highly reliable cars in which computer controllers have dramatically improved both reliability and efficiency compared to electromechanical or manual controllers. In the transition to CPS, the expectation of reliability will only increase. Without improved reliability, CPS will not be deployed into such applications as traffic control, automotive safety, and health care in which human lives and property are potentially at risk.

The physical world is never entirely predictable. A CPS will not operate in controlled environments and must be robust to unexpected conditions and adaptable to subsystem failures. Engineers face an intrinsic tension between predictable performance and an unpredictable environment; designing reliable components makes it easier to assemble these components into reliable systems, but no component is perfectly reliable, and the physical environment will inevitably manage to foil reliability by presenting unexpected conditions. Given components that are reliable, how much can designers depend on that reliability when designing a system? How do they avoid brittle design?

The problem of designing reliable systems is not new in engineering. Two basic engineering tools are analysis and testing. Engineers analyze designs to predict behaviors under various operating conditions. For this analysis to work, the properties of interest must be predictable and yield to such analysis. Engineers also test systems under various operating conditions. Without repeatable properties, testing yields incoherent results.

Digital circuit designers have the luxury of working with a technology that delivers predictable and repeatable logical function and timing. This predictability and reliability holds despite the highly random underlying physics. Circuit designers have learned to harness intrinsically stochastic physical processes to deliver a degree of repeatability and predictability that is unprecedented in the history of human innovation. Software designers should be extremely reluctant to give up on the harnessing of stochastic physical processes.

The principle designers must follow is simple: Components at any level of abstraction should be made as predictable and repeatable as is technologically feasible. The next level of abstraction above these components must compensate for any remaining variability with robust design.

Some successful designs today follow this principle. It is (still) technically feasible to make predictable gates with repeatable behaviors that include both logical function and timing. Engineers design systems that count on these behaviors being repeatable. It is more difficult to make wireless links predictable and repeatable. Engineers compensate one level up, using robust coding schemes and adaptive protocols.

Is it technically feasible to make software systems that yield predictable and repeatable properties for a CPS? At the foundation of computer architecture and programming languages, software is essentially perfectly predictable and repeatable, if we consider only the properties expressed by the programming languages. Given an imperative language with no concurrency, well-defined semantics, and a correct compiler, designers can, with nearly 100% confidence, count on any computer with adequate memory to perform exactly what is specified in the program.

The problem of how to ensure reliable and predictable behavior arises when we scale up from simple programs to software systems, particularly to CPS. Even the simplest C program is not predictable and repeatable in the context of CPS applications because the design does not express properties that are essential to the system. It may execute perfectly, exactly matching its semantics (to the extent that C has semantics) yet still fail to deliver the properties needed by the system; it could, for example, miss timing deadlines. Since timing is not in the semantics of C, whether or not a program misses deadlines is irrelevant to determining whether it has executed correctly but is very relevant to determining whether the system has performed correctly. A component that is perfectly predictable and repeatable turns out not to be predictable and repeatable in the dimensions that matter. Such lack of predictability and repeatability is a failure of abstraction.

The problem of how to ensure predictable and repeatable behavior gets more difficult as software systems get more complex. If software designers step outside C and use operating system primitives to perform I/O or set up concurrent threads, they immediately move from essentially perfect predictability and repeatability to wildly non-deterministic behavior that must be carefully anticipated and reigned in by the software designer.¹⁴ Semaphores, mutual exclusion locks, transactions, and priorities are some of the tools software designers have developed to attempt to compensate for the loss of predictability and repeatability.

But computer scientists must ask whether the loss of predictability and repeatability is necessary. No, it is not. If we find a way to deliver predictable and repeatable timing, then we do not eliminate the core need to design robust systems but dramatically change the nature of the challenge. We must follow the principle of making systems predictable and repeatable, if technically feasible, and give up only when there is convincing evidence that delivering this result is not possible or cost-effective. There is no such evidence that delivering predictable and repeatable timing in software is not possible or cost effective. Moreover, we have an enormous asset: The substrate on which we build software systems (digital circuits) is essentially perfectly predictable and repeatable with respect to properties we most care about—timing and logical functionality.

Considering the enormous potential of CPS, I’ll now further examine the failure of abstraction. The figure here schematically outlines some of the abstraction layers on which engineers depend when designing embedded systems. In the 3D Venn diagram, each box represents a set of designs. At the bottom is the set of all microprocessors; an element of this set (such as the Intel P4-M 1.6GHz) is a particular microprocessor design. Above that is the set of all x86 programs, each of which can run on that processor; this set is defined precisely (unlike the previous set of microprocessors, which is difficult to define) by the x86 instruction set architecture (ISA). Any program coded in that instruction set is a member of the set; for example, a particular implementation of a Java virtual machine may be a member of the set. Associated with that member is another set—all JVM bytecode programs—each of which (typically) synthesized by a compiler from a Java program, which is a member of the set of all syntactically valid Java programs. Again, this set is defined precisely by Java syntax.

Each of these sets provides an abstraction layer intended to isolate a designer (the person or program that selects elements of the set) from the details below. Many of the best innovations in computing have come from careful and innovative construction and definition of these sets.

However, in the current state of embedded software, nearly every abstraction has failed. The ISA, meant to hide hardware implementation details from the software, has failed because ISA users care about timing properties ISA cannot express. The programming language, which hides details of ISA from the program logic, has failed because no widely used programming language expresses timing properties. Timing is an accident of implementation. A real-time operating system hides details of the program from their concurrent orchestration yet fails if the timing of the underlying platform is not repeatable or execution times cannot be determined. The network hides details of electrical or optical signaling from systems, but most standard networks provide no timing guarantees and fail to provide an appropriate abstraction. A system designer is stuck with a system design (not just implementation) in silicon and wires.

All embedded systems designers face this problem. For example, aircraft manufacturers must stockpile (in advance) the electronic parts needed for the entire production line of a particular aircraft model to ensure they don’t have to recertify the software if the hardware changes. “Upgrading” a microprocessor in an engine control unit for a car requires thorough re-testing of the system.

Even “bug fixes” in the software or hardware can be extremely risky, since they can inadvertently change the system’s overall timing behavior.

The design of an abstraction layer involves many choices, and computer scientists have uniformly chosen to hide timing properties from all higher abstractions. Wirth³⁰ says, “It is prudent to extend the conceptual framework of sequential programming as little as possible and, in particular, to avoid the notion of execution time.” However, in an embedded system, computations interact directly with the physical world, where time cannot be abstracted away.

Designers have traditionally covered these failures by finding worst-case execution time (WCET) bounds²⁹ and using real-time operating systems (RTOSs) with well-understood scheduling policies.⁷ Despite recent improvements, these policies often require substantial margins for reliability, particularly as processor architectures develop increasingly elaborate techniques for dealing stochastically with deep pipelines, memory hierarchy, and parallelism.^11,28

Modern processor architectures render WCET virtually unknowable; even simple problems demand heroic efforts by the designer. In practice, reliable WCET numbers come with many caveats that are increasingly rare in software. Worse, any analysis that is done, no matter how tight the bounds, applies to only a specific program on a specific piece of hardware.

Any change in either hardware or software renders the analysis invalid. The processor ISA has failed to provide adequate abstraction. Worse, even perfectly tight WCET bounds for software components do not guarantee repeatability. The so-called “Richard’s anomalies,” explained nicely by Buttazzo,⁷ show that under popular earliest-deadline first (EDF) scheduling policies, the fact that all tasks finish early might cause consequential deadlines to be missed that would not have been missed if the tasks had finished at the WCET bound. Designers must be very careful to analyze their scheduling strategies under worst-case and best-case execution times, along with everything in between.

Timing behavior in RTOSs is coarse and increasingly uncontrollable as the complexity of the system increases (such as by adding inter-process communication). Locks, priority inversion, interrupts, and similar concurrency issues break the formalisms, forcing designers to rely on bench testing that is often incapable of identifying subtle timing bugs. Worse, these techniques generally produce brittle systems in which small changes cause big failures.

While there are no absolute guarantees in life, or in computing, we should not blithely discard achievable predictability and repeatability. Synchronous digital hardware—the most basic technology on which computers are built—reliably delivers astonishingly precise timing behavior. However, software abstractions discard several orders of magnitude of precision. Compare the nanosecond-scale precision with which hardware raises an interrupt request to the millisecond-level precision with which software threads respond. Computer science doesn’t have to do it this way.

Solutions

The timing problems I raise here pervade computing abstractions from top to bottom. As a consequence, most specialties within the field have work to do. I suggest a few directions, all drawn from existing contributions, suggesting that the vision I’ve outlined, though radical, is indeed achievable. We do not need to restart computer science from scratch.

Computer architecture. The ISA of a processor provides an abstraction of computing hardware for the benefit of software designers. The value of this abstraction is enormous, including that generations of CPUs that implement the same ISA can have different performance numbers without compromising compatibility with existing software. Today’s ISAs hide most temporal properties of the underlying hardware. Perhaps the time is right to augment the ISA abstraction with carefully selected timing properties, so the compatibility extends to time-sensitive systems. This is the objective of a new generation of “precision timed” machines.⁹

Achieving timing precision is easy if system designers are willing to forgo performance; the engineering challenge is to deliver both precision and performance. For example, although cache memories may introduce unacceptable timing variability, cost-effective system design cannot do without memory hierarchy. The challenge is to provide memory hierarchy with repeatable timing. Similar challenges apply to pipelining, bus architectures, and I/O mechanisms.

Programming languages. Programming languages provide an abstraction layer above the ISA. If the ISA is to expose selected temporal properties and programmers wish to exploit the exposed properties, then one approach would be to reflect these properties in the languages.

There is a long and somewhat checkered history of attempts by language developers to insert timing features into programming languages. For example, Ada can express a delay operation but not timing constraints. Real-Time Java augments the Java model with ad-hoc features that reduce the variability of timing. The synchronous languages⁵ (such as Esterel, Lustre, and Signal) lack explicit timing constructs but, in light of their predictable and repeatable approach to concurrency, can yield more predictable and repeatable timing than most alternatives. They are limited only by the underlying platform. In the 1970s, Modula-2 gave control over scheduling of co-routines, making it possible, albeit laboriously, for programmers to exercise coarse control over timing. Like the synchronous languages, timing properties of programs developed with Modula-2 are not explicit in the program. Real-time Euclid, on the other hand, expresses process periods and absolute start times.

Rather than create new languages, an alternative is to annotate programs written in conventional languages. For example, Lee¹⁶ gave a taxonomy of timing properties that must be expressible in such annotations. TimeC¹⁷ introduces extensions to specify timing requirements based on events, with the objective of controlling code generation in compilers to exploit instruction-level pipelining. Domain-specific languages with temporal semantics have taken hold in some. For example, Simulink, from The Math Works, provides a graphical syntax and language for timed systems that can be compiled into embedded real-time code for control systems. LabVIEW, from National Instruments, which is widely used in instrumentation systems, recently added timed extensions. Another example from the 1970s, PEARL,²² was also aimed at control systems and could specify absolute and relative start times, deadlines, and periods.

In an embedded system, computations interact directly with the physical world, where time cannot be abstracted away.

However, all these programming environments and languages remain outside the mainstream of software engineering, are not well integrated into software engineering processes and tools, and have not benefited from many innovations in programming languages.

Software components. Software engineering innovations (such as data abstraction, object-orientation, and component libraries) have made it much easier to design large complex software systems. Today’s most successful component technologies—class libraries and utility functions—do not export even the most rudimentary temporal properties in their APIs. Although a knowledgeable programmer may be savvy enough to use a hash table over a linked list when random access is required, the API for these data structures expresses nothing about access times. Component technologies with temporal properties are required, providing an attractive alternative to real-time programming languages. An early example from the mid-1980s, Larch,⁴ gave a task-level specification language that integrated functional descriptions with timing constraints. Other examples function at the level of coordination language rather than specification language. A coordination language executes at runtime; a specification language does not.

For example, Broy⁶ focused on timed concurrent components communicating via timed streams. Zhao et al.³¹ developed an actor-based coordination language for distributed real-time systems based on discrete-event systems semantics. New coordination languages, where the components are given using established programming languages (such as Java and C++), may be more likely to gain acceptance than new programming languages that replace established languages. When coordination languages are given rigorous timed semantics, designs function more like models than like programs.

Many challenges remain in developing coordination languages with timed semantics. Naïve abstractions of time (such as the discrete-time models commonly used to analyze control and signal-processing systems) do not reflect the true behavior of software and networks.²³ The concept of “logical execution time”¹⁰ offers a more promising abstraction but ultimately relies on being able to achieve worst-case execution times for software components. This top-down solution depends on a corresponding bottom-up solution.

Formal methods. Formal methods use mathematical models to infer and prove system properties. Formal methods that handle temporal dynamics are less prevalent than those that handle sequences of state changes, but there is good work on which to draw. For example, in interface theories,⁸ software components export temporal interfaces, and behavioral-type systems validate the composition of components and infer interfaces for compositions of components; for specific interface theories of this type, see Kopetz and Suri¹³ and Thiele et al.²⁷

Various temporal logics support reasoning about the timing properties of systems.³ Temporal logics mostly deal with “eventually” and “always” properties to reason about safety and liveness, and various extensions support metric time.^1,21 A few process algebras also support reasoning about time.^19,24 The most accepted formalism for the specification of real-time requirements is timed automata and its variations.²

Another approach widely used in instrumentation systems uses static analysis of programs coupled with models of the underlying hardware.²⁹ Despite gaining traction in industry, it suffers from fundamental limitations, with brittleness the most important. Even small changes in either the hardware or the software invalidate the analysis. A less-important limitation, though worth noting, is that the use of Turing-complete programming languages and models leads to undecidability. In other words, not all programs can be analyzed.

All these techniques enable some form of formal verification. However, properties that are not formally specified cannot be formally verified. Thus, for example, the timing behavior of software that is not expressed in the software must be separately specified, and the connections between specifications and between specification and implementations become tenuous. Moreover, despite considerable progress in automated abstraction, scalability to real-world systems remains a challenging hurdle. Although offering a wealth of elegant results, the effect of most of these formal techniques on engineering practice has been small (though not zero). In general-purpose computing, type systems are formal methods that have had enormous effect by enabling compilers to catch many programming errors. What is needed is time systems with the power of type systems.

Designers must be very careful to analyze their scheduling strategies under worst-case and best-case execution times, along with everything in between.

Operating systems. Scheduling is a key service of any operating system, and scheduling of real-time tasks is a venerable, established area of inquiry in software design. Classic techniques (such as rate-monotonic scheduling and EDF) are well studied and have many elaborations. With a few exceptions,^10,12 the field of operating system development has seen less emphasis on repeatability over optimization. Repeatability is not highly valued in general-purpose applications. Consider this challenge: To get repeatable real-time behavior, a CPS designer may use the notion of logical execution time (LET)¹⁰ for the time-sensitive portions of a system and best-effort execution for the less-time-sensitive portions. The best-effort portions typically have no deadlines, so EDF gives them lowest priority. However, the correct optimization is to execute the best-effort portions as early as possible, subject to the constraint that the LET portions match their timing specifications. Even though the LET portions have deadlines, they should not necessarily be given higher priority during program execution than the best-effort portions.

Designers of embedded systems deliberately avoid mixing time-sensitive operations with best-effort operations. Every cellphone in use has at least two CPUs, one for the difficult real-time tasks of speech coding and radio functions, the other for the user interface, database, email, and networking functionality. The situation is more complicated in cars and manufacturing systems, where distinct CPUs tend to be used for myriad distinct features. The design is this way, not because there are not enough cycles in the CPUs to combine the tasks, but because software designers lack reliable technology for mixing distinct types of tasks. Focusing on repeatability of timing behavior could lead to such a mixing technology; work on deferrable/sporadic servers¹⁸ may provide a promising point of departure.

Networking. In the context of general-purpose networks, timing behavior is viewed as a QoS problem. Considerable activity in the mid-1980s to mid-1990s led to many ideas for addressing QoS concerns, few of which were deployed with any long-lasting benefit. Today, designers of time-sensitive applications on general-purpose networks (such as voice over IP) struggle with inadequate control over network behavior.

Meanwhile, in embedded systems, specialized networks (such as FlexRay and the time-triggered architecture¹²) have emerged to provide timing as a correctness property rather than as a QoS property. A flurry of recent activity has led to a number of innovations (such as time synchronization, IEEE 1588), synchronous Ethernet, and time-triggered Ethernet). At least one of them—synchronous Ethernet—is encroaching on general-purpose networking, driven by demand for convergence of telephony and video services with the Internet, as well as by the potential for real-time interactive games. However, introducing timing into networks as a semantic property rather than as a QoS problem inevitably leads to an explosion of new time-sensitive applications, helping realize the CPS vision.

Conclusion

Realizing the potential of CPS requires first rethinking the core abstractions of computing. Incremental improvements will continue to help, but effective orchestration of software and physical processes requires semantic models that reflect properties of interest in both.

I’ve focused on making temporal dynamics explicit in computing abstractions so timing properties become correctness criteria rather than a QoS measure. The timing of programs and networks should be as repeatable and predictable as is technologically feasible at reasonable cost. Repeatability and predictability will not eliminate timing variability and hence not eliminate the need for adaptive techniques and validation methods that work with bounds on timing. But they do eliminate spurious sources of timing variability, enabling precise and repeatable timing when needed. The result will be computing and networking technologies that enable vastly more sophisticated CPS applications.

Acknowledgments

Special thanks to Tom Henzinger, Insup Lee, Al Mok, Sanjit Seshia, Jack Stankovic, Lothar Thiele, Reinhard Wilhelm, Moshe Vardi, and the anonymous reviewers for their helpful comments and suggestions.

Figures

Figure. Abstraction layers in computing.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Computing Needs Time

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/1506409.1506426

May 2009 Issue

Published: May 1, 2009

Vol. 52 No. 5

Pages: 70-79

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

BLOG@CACM Apr 16 2024

The Value of Data in Embodied Artificial Intelligence

Shaoshan Liu

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Glib Responses

Correctness

Requirements

Solutions

Conclusion

Acknowledgments

Figures

Computing Needs Time

DOI

May 2009 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.