Architecture and Hardware Review articles

Computation Takes Time, But How Much?

Timing analysis for hard real-time systems.

Posted Feb 1 2014

Introduction
Key Insights
Taking the Problem Apart
The Core of the Problem
Academic Development
Industrial Adoption
Open Questions and Future Challenges
Conclusion
Acknowledgments
References
Authors
Footnotes
Figures

A few years ago, in the pages of this magazine, Edward Lee argued that computing needs time.²³ This article focuses on the natural assumption that computing also takes time. We examine the problem of determining how much time. It is the problem of verifying the real-time behavior of safety-critical embedded systems. For such systems, for example, anti-lock brakes and airbags, punctual behavior is of utmost importance: If the controlling computations take too long, quality of service degrades or the systems fail completely—your braking distance is longer or your head hits the steering wheel, respectively.

Key Insights

The tremendous progress in microprocessor architecture not only increased average-case performance, but also the complexity of verifying the real-time behavior of programs executed on those architectures.
To derive useful execution time guarantees, static analyses must prove that speculation mechanisms of modern CPUs will indeed be effective during program runtime.
This is challenging due to timing anomalies and interdependencies between architectural components.

The basis for verifying the timeliness of system reactions is reliable information on the execution times of all computational tasks involved. It is the job of timing analysis, also called worst-case execution-time (WCET) analysis, to determine such information.

To avoid having to solve the halting problem, all programs under analysis must be known to terminate. Loops need bounded iteration counts and recursion needs bounded depth—either given explicitly in the program, determined by some analysis, or supplied by the programmer. Furthermore, computing the exact WCET of a program is not necessary. A conservative approximation, such as an upper bound on the execution times of a program, is adequate; and if low enough also sufficient to prove overall timeliness. Figure 1 illustrates the most important notions.

In the old days, such a conservative approximation, called the timing schema method, was proposed by Shaw.³³ Its goal is to determine bounds on the execution times of higher-level language programs. The idea is to work along the structurally inductive definition of high-level programming languages, such as along the syntax tree of programs: It starts with bounds on the execution times of atomic program elements and then computes bounds on the execution times of complex constructs by composing the execution times of their components.

For instance, the upper bound on the execution times of a conditional if b then s₁ else s₂ would be computed as: ub (if b then then s₁ else s₂) = ub(b) + max{ub(s₁), ub(s₂)}.

Today there are at least two reasons that render the timing schema method impractical, infeasible, or imprecise. The first one is compilers. Program transformations and optimizations performed by compilers render source code inadequate for timing analysis: Source code does not reveal the actually executed machine instructions. It does not show the control flow of the binary program executed on the target machine. Nor does it show the register or memory allocation of program variables and intermediate results. This uncertainty about the actually executed code has already been addressed in Shaw,³³ and has been found to be difficult. Since then, advances in optimizing compilers have only increased this uncertainty.⁴

The second reason is the tremendous progress in computer architecture aiming at ever-increasing (average-case) performance. In the old times, execution times of instructions were constants. With the advent of microprocessors incorporating deep pipelines, caches, and various other speculation concepts, the execution times of instructions became variable: The execution times of an instruction may be different for different occurrences of the instruction, that is, at different program points. Execution time may even differ for different executions of an instruction at a single program point. The variations are large: Execution times of instructions may vary by a factor of 100 or more. One could argue that, most of the time, execution is fast since the architectures are optimized for average-case performance. However, this is no foundation for deriving guarantees. On the other hand, starting the structural composition of the timing schema with extremely wide bounds can only result in overall bounds that are virtually useless.

Let us look into the reasons for the variability of instruction timing more closely: The different execution times of an instruction result from the different states the architecture may be in when execution of the instruction starts. For instance, the time for a load instruction depends on the state of the cache(s) and, maybe, also on the occupancy of the processor memory bus; the time for a conditional branch depends on the state of the branch prediction and may include the time necessary to recovery from misprediction. The architectural state in turn is the result of the execution history, which again is determined by the input to the program and the initial architectural state. Different initial states and different control-flow paths to a program point will result in a set of possible execution states, P, before the instruction at this program point is executed. Later, we describe how a phase called microarchitectural analysis computes invariants that characterize these sets of states.

We could then, at least conceptually, try to “expand” the instruction by its implementation in the underlying architectural platform, which is a huge finite-state machine. For this instruction, only a subset of the possible transitions through this finite-state machine are possible, namely those starting in states in P. They would lead to a new set of states reached by executing the instruction.

All paths not starting in states in P can be ignored in the search. Unfortunately, the rest would still be too large to be exhaustively explored. This is the state-space explosion problem encountered by many attempts to exhaustively explore state spaces.

The main measure used in microarchitectural analysis to counter this complexity threat is abstraction. It allows for a compact representation of sets of execution states: Information irrelevant for timing can be dropped completely. Timing-relevant information can be conservatively approximated such that it can be efficiently represented. As we will discuss, there are limits as to how much such an abstraction may forget and still be useful.

At the end of this introduction, it should be clear that timing analysis does not try to solve the halting problem. All programs under analysis are guaranteed to terminate. The complexity of several subtasks, in particular the huge state space to explore, is the problem to cope with.

Taking the Problem Apart

WCET analysis essentially is the search for a longest path through a program. This can be cast as the problem of constructing a weighted graph and finding a longest path in it: The graph nodes model program fragments, for example, basic blocks, that is, maximally long sequences of straight-line code. The graph edges model possible control flow. The node weights are upper bounds on the execution times of the program fragments, the edge weights are bounds on their traversal counts.

The basis for verifying the timeliness of system reactions is reliable information on the execution times of all computational tasks involved.

Following this scheme, a quasi-standard architecture for static timing analysis has emerged over the years, as shown in Figure 2. Here, we briefly explain the objective and main challenges of each subtask using the graph terminology.

Control-flow reconstruction³⁸ determines the control-flow graph itself. It reads the binary executable to be analyzed, reconstructs its control flow, and transforms it into an intermediate program representation. This is a nontrivial task due to dynamically computed control-flow successors, for example, machine code generated for switch statements; or function pointers whose values might not be determined easily. Some instruction-set architectures (ISAs) have additional surprises in store, for example those without a proper instruction for the return from subroutines.
Value analysis can be seen as an auxiliary analysis. It attempts to statically determine the values stored in registers and memory locations. Such information is needed for loop-bound analysis, to determine the execution time of arithmetic instructions whose timing depends on the values of their operands, for safely approximating effective addresses for data-cache analysis, and to resolve some more value-dependent timing issues. The general problem of value analysis is undecidable, and often several values are possible at a program point when control passes by this program point several times. One out of many approximations⁶ is an interval analysis, which computes enclosing intervals for the set of possible values in registers and memory locations.
Loop bound analysis^14,15 determines the edge weights of the graph. It identifies loops in the program and tries to determine bounds on the number of loop iterations. Similarly, recursion must be bounded. Encountered challenges are the analysis of computations on loop counters and loop exit conditions, as well as dependencies between loop counters in nested loops. As the general problem is undecidable, bounds may have to be provided by the user.
Control-flow analysis,^14,16 also known as infeasible path analysis, is an optional analysis. It determines the set of possible paths through the program more precisely in order to tighten the timing bounds. Its results are additional edge weights, but also more general path constraints. Note that loop bound analysis can be seen as a special, indispensable case of control-flow analysis.
Microarchitectural analysis determines the node weights of the graph. This subtask will be detailed below as it is the most complex one and provides the most interesting insights.
Global bound analysis,^1,24,39 also called path analysis, finally determines a longest path in the graph. One approach^24,39 conveniently relies on integer linear programming to do so: (a) Integer variables model the traversal count of nodes and edges. (b) A set of constraints models the control flow of the program using Kirchhoff’s law: The sum of incoming flow at a node must equal the sum of outgoing flow. The incoming flow at the program start node is fixed to 1. (c) Another set of constraints models loop bounds and other path constraints as determined by control-flow analysis. (d) The objective function is the scalar product of the traversal counts and weights of the nodes, that is, execution count times execution time. To compute upper bounds, the objective function is maximized.

The Core of the Problem

In modern computer architectures, speculation is overabundant, it is the normal mode of operation, and works astonishingly well: Caches speculate on data reuse; branch prediction speculates on the outcome of comparisons, pipelining speculates on the absence of data dependencies; among others. There are even speculations on speculation: Instruction prefetching and speculative execution hope for correct branch prediction. And microarchitectural analysis, which has to determine upper bounds on the execution times of program fragments, has to pay the bill.

Microarchitectural analysis. Computer architects are content when speculation works most of the time—average-case performance is what matters to them. To derive tight timing bounds, however, microarchitectural analysis must prove that the speculation mechanisms work—when they do.

This can possibly be done in many ways. We use abstract interpretation⁶ to compute invariants at each program point that characterize the set of all states the architecture can be in when control reaches this program point. These invariants describe safe information about the contents of the caches;^10,11 the state of the pipeline including the contents of all its queues and buffers as well as the occupancy of its units;^12,40 and the state of off-chip buses, memories, and peripherals. The computed invariants are used to exclude transitions in the architecture. For example, a cache-miss transition can be excluded if the invariant about the cache state allows it to predict a cache hit.

If this seems easy to you, let us indicate some pitfalls that rule out some “obvious” optimizations and limit scalability.

Pitfall timing anomalies. It looks tempting to only follow the worst-case transition, for example, the cache-miss transition, when the statically available information admits several possible transitions, for example, cache hit and cache miss. However, there are timing anomalies that make it difficult to decide which transition is the worst.

Intuitively, a timing anomaly is a situation where the local worst case does not entail the global worst case. Consider Figure 3, where a cache miss to memory block A—the local worst case—results in a globally shorter execution time than a cache hit. This is because it prevents a branch prediction that would erroneously prefetch B. Another example is given in Figure 4. Shortening instruction A leads to a longer overall schedule, because instruction B can now block the more important instruction C, which may only run on Resource 2.

In other words, greedy is not necessarily optimal when maximizing execution times. For details see Lundqvist and Stenström,²⁵ who introduced the notion of timing anomalies, or Reineke and Sen,³⁰ who present a formal definition in the context of WCET analysis.

Pitfall interdependencies. It looks tempting to decompose the architectural analysis into individual analyses of its components. However, architectural components interact in nontrivial ways.

For instance, consider caches in combination with branch prediction: If a branch is mispredicted, instructions are fetched from the wrong branch target before the misprediction is detected and fetching is redirected to the correct branch target. Those extra instruction fetches can evict blocks from the cache that might have been useful for future program execution. Conversely, a data-cache miss can delay the computation of a branch condition, which in turn can trigger a branch prediction, and ultimately speculative fetching and execution of code. If the initial data-cache access were a hit, none of this might have happened. Such (circular) interdependencies also exist between other architectural components.

In particular, the following attempt to analyze the influence of a cache on execution time is unsound:

Determine an execution time bound, t_h, for a program assuming that all cache accesses are hits.
Determine an upper bound, n, on the overall number of cache misses in any program execution.
Take t_h and add n times the cache-miss penalty, t_p, to obtain an upper bound on the execution time that includes cache misses.

In fact, there may be an execution of the program that takes longer, that is, t > t_h + n · t_p. A cache hit may entail a state change in another component such that an even greater penalty is incurred by that other component.

This all suggests that analyzing each architectural component independently of other components is either unsound or imprecise: Unsound if other components and their in influence are simply disregarded; imprecise if the analysis always has to account for the worst behavior of other components.

Dire consequences. As a consequence of timing anomalies, microarchitectural analysis must consider all transitions that cannot be ruled out statically. Due to the interdependencies, the currently practiced solution is to analyze all architectural components simultaneously. If one additionally considers this analysis is performed at the granularity of processor cycles, it becomes apparent that this subtask of WCET analysis is the most complex one.

The preceding discussion also illustrates that worst-case execution time—as other nonfunctional properties like maximal stack usage—is very difficult to check experimentally, such as, by testing and measurements. Identifying safe end-of-test criteria for these program properties is an unsolved problem. In consequence, the required test effort is high, the tests require access to the physical hardware, and the results are not complete. The advantage of static analysis is that it enables full control and data coverage, it can be easily automatized, and software developers can run tools from their workstation computers.

Architectural abstractions. As mentioned earlier, abstraction is the strong asset in microarchitectural analysis. It is used to efficiently represent the architectural state at program points and to allow for an efficient state-space exploration.

A feature common to all architectural abstractions is they completely abstract from data. WCET analysis is primarily interested in how long a computation takes; the actually computed values are of no direct interest. Only if values have an influence on the execution time of an instruction they become important. For instance, it makes a difference whether the address of a memory access maps to a fast SRAM or a slow external flash memory. Or whether the operand of a variable-latency floating-point multiplication is zero or not.

Information about values is maintained by the preceding value analysis and can be queried if necessary. Factoring out value analysis (as well as control-flow analysis and loop bound analysis) in this way is possible because it solely depends on the instruction set of the considered computing platform. How a particular instruction set is implemented by a microarchitecture, which determines the timing, is irrelevant for them. Hence, all those analyses can be performed at the instruction level, prior to microarchitectural analysis, which is performed at the cycle level. This improves analysis efficiency considerably.

Note that the whole truth may include pleasantries like exposed pipelines or delay slots. The intrepid reader may think about the ramifications of “undead code,” that is, unreachable code that gets executed speculatively; or, similarly, loop bounds that get exceeded by speculative execution.

The next step toward efficient representations is component-specific abstractions. Consider caches for instance. The timing-relevant information about caches is whether the memory block accessed at a program point is contained in the cache or not when execution reaches that program point. This essential information is called must– and may-cache information:¹⁰ Must-cache information is a set of memory blocks that definitely are contained in the cache whenever execution reaches that program point. May-cache information is a set of memory blocks that may be contained in the cache whenever execution reaches that program point. The former allows to predict hits; the latter allows to predict misses; and for the memory blocks in the set MayMust both hit and miss need to be taken into account.

Cache abstractions conservatively approximate this information. To be able to maintain useful information on cache updates, abstract caches contain more than just the must- and may-information. Compared to interval and congruence abstractions with rather simple encodings, that is, [l, u] and n mod m, respectively, the encoding and interpretation of abstract caches is more complicated. Yet abstract caches are elegant and more efficient than explicit encodings of sets of concrete cache states. For a recent overview of cache analysis and examples of cache abstractions, Grund¹¹ or refer to the earlier work.¹⁰

Regarding the success in abstraction, pipelines are a counterexample. Pipelines are much more heterogeneous, that is, they consist of a large number of small components, for example, fetch buffers, dispatchers, execution units, queues for pending load and store instructions, among others. Besides some minor abstractions, for example, abstraction of symmetrical units, no satisfactory abstraction of sets of pipeline states has been found so far. In lieu of compact abstract domains, the domain used for pipeline analysis essentially is a powerset domain: The architectural state of pipelines is approximated by sets of concrete pipelines. For an early example of such a pipeline analysis, see Ferdinand et al.;⁸ for a more complex example, see Thesing.⁴⁰

As explained earlier, all these analyses are performed simultaneously. Much like the actual hardware is made up of components, the microarchitectural analysis is made up of abstract domains for all components, which are composed using appropriate domain constructors. Overall, microarchitectural analysis is an abstract interpretation with a huge abstract domain. The number of states considered during the analysis of an average-sized basic block for a PowerPC 7448 can grow to seven-figure numbers.

Context sensitivity. Most static program analyses rely on the tacit assumption of control-flow abstraction: Considering the exact set of possible program paths (to a program point) is intractable. A simple abstraction is to approximate the set of possible paths (to a program point) by the set of all paths through the CFG (to that program point). The loss of this abstraction stems from considering infeasible paths through the CFG.

Regarding the WCET bound, infeasible shortcuts are not that problematic but infeasible detours are. Hence, as explained, there is control flow analysis, which determines infeasible paths and thereby narrows down the set of considered paths in order to tighten the timing bounds.

In general, the WCET problem requires analyses that are highly context-sensitive. That is, they need to distinguish between different possibilities how control reaches a program point. To see why, consider the execution of loops: The first iteration typically exhibits a different architectural behavior than subsequent iterations, for example, the instructions of the loop get loaded into the cache. If one would not distinguish between iterations one would have to conservatively take instruction cache misses into account for all loop iterations. This, however, would lead to a significant overestimation of the execution time.

Similar differences can be exhibited between different call sites of functions. After a first call some of the function’s code might remain in the cache. Loop bounds within the function might depend on a function parameter. Different parameter values might induce paths of different lengths through the function.

Essentially contexts allow to infer stronger invariants per (program point, context) -pair compared to a single invariant per program point.

To read more on context sensitivity and special kinds of context sensitivity that have emerged in WCET analysis, see Martin et al.²⁶

Conclusion. Modern CPUs feature an abundance of speculation mechanisms that increase (average-case) performance. To derive tight bounds on program execution times, static WCET analysis must prove that speculation mechanisms indeed work well. As we have seen, this is difficult for many reasons:

Interdependencies between architectural components forbid the analysis of individual components as well as the naive approach of assigning execution time penalties, for example, “add n times the cache miss penalty.” Analyses need to be integrated, which entails higher complexity and increased resource demand. At the same time, pruning of the analysis search space is hindered by timing anomalies and requires precise information about the architectural state. To obtain such precise information, context-sensitive analyses are required that need to distinguish between a large number of different execution histories. This is because the state of components like caches and branch predictors may depend on events in the distant past. Indeed some may never forget about their history.²⁹

On a more abstract level, CPUs are built to exploit runtime information during the execution of a single program path. WCET analysis must statically prove successful exploitation while efficiency dictates implicit consideration of all program paths. It is fair to say that for hard real-time systems, the high sophistication of some speculation mechanisms is a waste of silicon.

Academic Development

The editorial by Puschner and Burns²⁸ compactly describes the early work on WCET analysis in the 1980s and 1990s. It summarizes the contributions of the then active research groups. The more recent survey of Wilhelm et al.⁴³ takes a more problem-centric point of view. It discusses different approaches at the subtasks of timing analysis, contributing groups, as well as the strengths and limits of the tools available at that time.

In principle, one can distinguish between methods based on measurements, simulation, or static analysis. The WCET estimates determined by measurements or simulation are unsafe as inclusion of the worst-case combination of program input and initial architectural state can rarely be guaranteed. Static methods can provide guarantees but may suffer from overestimation. Performing measurements requires the hardware and tracing equipment; simulations and static analysis require models of the architecture.

Some research groups invested into academic prototypes, some of which made the leap to commercially available tools. The most widely known tools that are still maintained are:

aiT, a commercial tool by AbsInt GmbH, Saarbrücken, Germany, based on static analysis;
BoundT, a commercial tool by Tidorum Ltd., Helsinki, Finland, based on static analysis;
Chronos, an open source software developed by National University of Singapore, employs the SimpleScalar simulator for microarchitectural analysis;
OTTAWA, an open source software developed by University of Toulouse, France, based on static analysis;
RapiTime, a commercial tool by Rapita Systems Ltd., York, U.K., based on measurements; and,
SWEET, an academic prototype of Mälardalen University, Sweden, focusing on static control-flow analysis.

For a more complete list and a deeper discussion of functionalities and limitations we refer to Wilhelm et al.⁴³

Regarding sound static analysis approaches and saving open and upcoming challenges for later discussion, the most important developments were (a) the timing schema introduced by Shaw³³ and its extension; (b) the switch to the more suitable implicit path enumeration technique (IPET), initially proposed by Li and Malik,²⁴ which is still used today for global bound analysis; (c) dissection of the timing analysis task into controllable subtasks;⁸ (d) approaches to microarchitectural analysis, leading to complete models of complex processors;⁴⁰ (e) abstractions for the huge state spaces of cache architectures;^10,11 and (f) relaxations of the uninterrupted-execution assumption,² which we will discuss in further detail.

The WCET research group at Mälardalen University maintains a number of WCET benchmark programs^a used to evaluate and compare different types of WCET analysis tools and methods.¹³ Since 2006, the WCET community periodically performs a WCET Tool Challenge.^b The published results³⁷ give a good overview of the state of the different tools.

Industrial Adoption

Until the 1990s, timing analysis in industrial contexts was dominated by measurement-based techniques and simple counting methods. The strand of research on static methods reached a milestone in 1998 with the founding of the company AbsInt Angewandte Informatik GmbH. After preliminary discussions with the German TÜVs (technical inspection offices) the market potential of Abstract Interpretation based WCET analysis seemed to justify the commercialization effort. Airbus was among the first companies to recognize the potential of the novel technique. Airbus was instrumental in the European IST project Daedalus (Validation of critical software by static analysis and abstract testing), in the course of which AbsInt adapted its prototypical tool chain for WCET analysis to the industrial requirements for avionics software.^34,35,41 The first processors supported by the new commercial WCET analysis tool were Motorola PowerPC 755 and TI TMS470. Today, this tool, which implements the architectures described earlier, is known as aiT WCET Analyzer.

Initially reported results³⁵ pertain to the more complex PowerPC 755, whose variance of execution times for instructions is in the order of a factor of several hundreds. The comparison shows that the computed upper bound of a task typically is about 25% higher than the measured time for the same task, the real but non-calculable WCET being in between. Analysis time was 12 hours per program on average and the maximal memory demand was close to 3GiB. Since then, the aiT tool chain has been continuously improved by incorporating new research results and now supports some 20 processor targets.^c For simpler microcontrollers overestimation is below 10%.^d

Tool couplings to other development tools have been implemented. For instance between aiT and model-based code generators like Esterel SCADE⁹ and dSPACE TargetLink.²⁰ They enable the worst-case timing behavior to be continuously evaluated during software development. Links between analysis results and the model level enable timing information to be traced back to the model level. Errors can be detected early in the development process, thus avoiding late-stage integration problems. Another important process optimization can be realized by integrating tools for computing the worst-case execution time and the worst-case response time. A tool coupling between aiT and the SymTA/S tool from Symtavision provides a seamless approach to timing analysis: SymTA/S computes the end-to-end times and worst-case response times of the system based on the worst-case execution times computed by aiT.¹⁹ From a safety assurance perspective, typically model-based testing is used for showing functional program properties, and static analysis to prove the absence of non-functional program errors. Therefore it can be very beneficial to integrate model-based testing and analysis, which has been addressed by a tool coupling between aiT and the model-based testing tool BTC EmbeddedTester.¹⁸ Model-level information like execution model or environment specifications is automatically taken into account, avoiding duplicate effort for test and analysis setup. Tests and analyses can be launched seamlessly and produce unified result reports.

The worst-case execution time estimates determined by measurements or simulation are unsafe as inclusion of the worst-case starting conditions can rarely be guaranteed. Methods based on static analysis can provide guarantees but may suffer from overestimation.

During the last years, most of the relevant safety standards have been undergoing major revisions, for example, DO-178, IEC-61508, and CENEL-EC EN-50128. The norm ISO-26262 defining functional safety for road vehicles was published in the year 2011. All of them require to identify potential functional and nonfunctional hazards and to demonstrate that the software does not violate the relevant safety goals. All mentioned standards list the worst-case execution time to the software properties that have to be determined for real-time software. When used in the certification process of safety-critical systems, tools must be qualified according to the pertinent safety standard. To support this, AbsInt has developed Qualification Support Kits (QSK) for aiT, which can demonstrate the tool works correctly in the operational context of the user. Additionally, Qualification Software Life Cycle Data (QSLCD) reports are available that document the entire tool development process of aiT for all target processors and compilers, including all verification and quality assurance activities. QSK and QSLCD enable the tool qualification according to any safety standard to be performed in a mostly automatic way up to the highest criticality levels.²²

Customers of aiT come from all safety-critical industry sectors: Avionics and space, automotive, nuclear power plant control, healthcare technology, among others. Regrettably, most aiT customers do not agree to be referenced. Some who agreed can be found at http:/www.absint.com/success.htm. In 2010, aiT was used by NASA as an industry-standard tool for demonstrating the absence of timing-related software defects in the Toyota Motor Corporation Unintended Acceleration Investigation.²⁷

Open Questions and Future Challenges

In this article, we described a solution for the WCET-analysis problem. However, there are still shortcomings whose removal will increase general applicability. Here, we discuss these shortcomings as well as future threats to the viability of this solution.

For the sake of completeness, let us list the underlying assumptions of our approach. Regarding the program to be analyzed: termination, no self-modifying code, no dynamic memory allocation, and resolvable dynamic branch targets.

These are rather easy to satisfy and missing information can be supplied by the developer or derived from a higher-level model. Although the source code might be subject to regulations or norms, this is irrelevant for WCET analysis as it requires binaries as input.

Static analysis requires a truthful model of the architecture to be analyzed. Current abstract models are crafted by studying hardware documentation, by querying designers, by asking somebody who knows, and, if necessary, by performing reverse-engineering experiments. The upcoming alternative to this error-prone and laborious process is to derive abstract models from VHDL or Verilog specifications.³¹

One important assumption we make on the system level is that programs are executed in isolation. If program execution were interrupted, and other code were executed, the architectural state would be different when program execution would resume. Hence, the computed invariants on the architectural state at all following program points would be wrong. One needs additional analyses that allow to bound the interruption-caused increase in execution time.² Nonpreemptive scheduling, as found in avionics for example, is the easier choice though, at least concerning WCET analysis.

Necessary effort and achievable precision for the single-core setting meet industrial requirements. For most current multicore platforms, precision and/or complexity are unacceptable.

In fact one can state this assumption in a more general way: Analysis takes account only of events whose occurrence can be associated with specific program points. Other types of events that happen without such an association and that alter the architectural state have to be dealt with separately. Preemptive scheduling, possibly implemented using hardware interrupts, is only one example. Others include DRAM refreshes, DMA transfers, or even the parallel execution of programs on multiprocessor or multicore platforms.

The problem with parallel execution of programs is the induced interference on architectural resources that are shared between programs, for example, caches, interconnects, flash memories, peripherals, and so on. Some accesses to shared resources, in particular global variables, will be synchronized to guarantee the semantics of the co-running programs. For the rest, the order in which competing threads access shared machine resources is not statically fixed. This complicates timing analysis as any access to a shared resource might be delayed due to competing accesses. One approach to statically prove the absence of such delays is to analyze each program given abstractions of the resource access behaviors of all other co-running programs. One example is given by Schranzhofer et al.,³² but finding suitable abstractions for the general problem is unsolved.

To mitigate problems in timing analysis, one can try to reduce interferences on resources by smart configuration of the CPU or system board.^7,21 As an alternative to tilting at windmills, that is, trying to fix designs that are less favorable for WCET analysis, one can try to design the architectures appropriately in the first place.^42,44 Appropriately meaning that one can easily predict their behavior while they still exhibit high performance. The need for predictability was recognized early³⁶ and has since been inspected in several ways, for example, Henzinger,¹⁷ Bernardes,⁵ and Thiele.⁴² However, the understanding of predictability in the real-time community is rather implicit. A generally accepted formal definition is still to be found. The joint article by Axer et al.³ discusses predictability fundamentally and at several abstraction levels of systems.

Conclusion

Timing analysis has been made difficult by the use of high-performance microprocessors, which use caches, deep pipelines, out-of-order execution, and branch prediction to improve average-case performance. These architectural components have introduced a large variability of the execution times of individual instructions by the dependence on the execution state. The solution is to safely bound the execution times of sequences of instructions occurring in the program based on information about all possible execution histories leading to these occurrences. Static program analysis is used to compute such information. Reliable and precise upper bounds can be computed. Timing-analysis tools are in routine use in the safety-critical embedded-systems industries.

The possibility to determine execution-time bounds and the precision of the results depend heavily on properties of the underlying computer architecture. Trends in computer architecture and in software design threaten the applicability of established methods. The fact the timing analysis problem could be solved in a provably correct way is considered one of the success stories of formal methods. Continuation of this story will require more predictable architectures as well as advances in analysis technology.

Acknowledgments

We thank our many colleagues who have contributed to the described approach, including Christian Ferdinand, Florian Martin, Henrik Theiling, Michael Schmidt, and Stephan Thesing, Reinhold Heckmann, Daniel Kästner, and Jan Reineke.

The development of the technology was supported by the Transfer Project 14 of the Deutsche Forschungsgemeinschaft, the European IST project Daedalus, and the Transregional Research Center AVACS of the Deutsche Forschungsgemeinschaft.

Figures

Figure 1. Fictitious distribution of execution times of a task exemplifying lower and upper timing bounds (LB, UB) as well as best-case and worst-case execution time (BCET, WCET).

Figure 2. Main phases of a static timing analysis.

Figure 3. Speculation timing anomaly.

Figure 4. Scheduling timing anomaly. Arrows indicate dependencies.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Computation Takes Time, But How Much?

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2500886

February 2014 Issue

Published: February 1, 2014

Vol. 57 No. 2

Pages: 94-103

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

BLOG@CACM Apr 16 2024

The Value of Data in Embodied Artificial Intelligence

Shaoshan Liu

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Key Insights

Taking the Problem Apart

The Core of the Problem

Academic Development

Industrial Adoption

Open Questions and Future Challenges

Conclusion

Acknowledgments

Figures

Computation Takes Time, But How Much?

DOI

February 2014 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.