Practice
# Software Model Checking Takes Off

Although formal methods have been used in the development of
safety- and security-critical systems for years, they have not
achieved widespread industrial use in software or systems
engineering. However, two important trends are making the
industrial use of formal methods practical. The first is the
growing acceptance of model-based development for the design of
embedded systems. Tools such as MATLAB
Simulink^{6} and Esterel Technologies
SCADE Suite^{2} are achieving widespread
use in the design of avionics and automotive systems. The
graphical models produced by these tools provide a formal, or
nearly formal, specification that is often amenable to formal
analysis.

The second is the growing power of formal verification tools, particularly model checkers. For many classes of models they provide a "push-button" means of determining if a model meets its requirements. Since these tools examine all possible combinations of inputs and state, they are much more likely to find design errors than testing.

Here, we describe a translator framework developed by Rockwell Collins and the University of Minnesota that allows us to automatically translate from some of the most popular commercial modeling languages to a variety of model checkers and theorem provers. We describe three case studies in which these tools were used on industrial systems that demonstrate that formal verification can be used effectively on real systems when properly supported by automated tools.

Model-based development (MBD) refers to the use of domain-specific, graphical modeling languages that can be executed and analyzed before the actual system is built. The use of such modeling languages allows the developers to create a model of the system, execute it on their desktops, analyze it with automated tools, and use it to automatically generate code and test cases.

Throughout this article we use MBD to refer specifically to software developed using synchronous dataflow languages such as those found in MATLAB Simulink and Esterel Technologies SCADE Suite. Synchronous modeling languages latch their inputs at the start of a computation step, compute the next system state and its outputs as a single atomic step, and communicate between components using dataflow signals. This differs from the more general class of modeling languages that include support for asynchronous execution of components and communication using message passing. MBD has become very popular in the avionics and automotive industries and we have found synchronous data-flow models to be especially well suited for automated verification using model checking.

Model checkers are formal verification tools that evaluate a
model to determine if it satisfies a given set of
properties.^{1} A model checker will
consider every possible combination of inputs and state, making
the verification equivalent to exhaustive testing of the model.
If a property is not true, the model checker produces a
counterexample showing how the property can be falsified.

There are many types of model checkers, each with their own
strengths and weaknesses. Explicit state model checkers such as
SPIN^{4} construct and store a
representation of each state visited. Implicit state (symbolic)
model checkers use logical representations of sets of states
(such as Binary Decision Diagrams) to describe regions of the
model state space that satisfy the properties being evaluated.
Such compact representations generally allow symbolic model
checkers to handle a much larger state space than explicit state
model checkers. We have used the BDD-based model checker
NuSMV^{5} to analyze models with over
10^{120} reachable states.

More recent model checkers, such as
SAL^{11} and Prover
Plug-In,^{9} use *satisfiability modulo
theories* (SMT) solvers for reasoning about infinite state
models containing real numbers and unbounded arrays. These
checkers use a form of induction over the state transition
relation to automatically prove that a property holds over all
executable paths in a model. While these tools can handle a
larger class of models, the properties to be checked must be
written to support inductive proof.

As part of NASA's Aviation Safety Program (AvSP), Rockwell
Collins and the University of Minnesota developed a product
family of translators that bridge the gaps between some of the
most popular commercial modeling languages and several model
checkers and theorem provers.^{8} An
overview of this framework is shown in Figure
1.

These translators work primarily with the Lustre formal
specification language,^{3} but this is
hidden from the users. The starting point for translation is a
design model in MATLAB Simulink/Stateflow or Esterel Technologies
SCADE Suite/Safe State Machines. SCADE Suite produces Lustre
models directly. Simulink or Stateflow models can be imported
using SCADE Suite or the Reactis^{10}
tool and a translator developed by Rockwell Collins. To ensure
each Simulink or Stateflow construct has a well-defined
semantics, the translator restricts the models that it will
accept to those that can be translated unambiguously into
Lustre.

Once in Lustre, the specification is loaded into an abstract syntax tree (AST) and a number of transformation passes are applied to it. Each transformation pass produces a new Lustre AST that is syntactically closer to the target specification language and preserves the semantics of the original Lustre specification. This allows all Lustre type checking and analysis tools to be used as debugging aids during the development of the translator. When the AST is sufficiently close to the target language, a pretty printer is used to output the target specification.

A model checker will consider every possible combination of inputs and state, making the verification equivalent to exhaustive testing of the model. If a property is not true, the model checker produces a counterexample showing how the property can be falsified.

We refer to our translator framework as a product family since most transformation passes are reused in the translators for each target language. Reuse of the transformation passes makes it much easier to support new target languages; we have developed new translators in a matter of days. The number of transformation passes depends on the similarity of the source and target languages and on the number of optimizations to be made. Our translators range in size from a dozen to over 60 passes.

The translators produce highly optimized specifications appropriate for the target language. For example, when translating to NuSMV, the translator eliminates as much redundant internal state as possible, making it very efficient for BDD-based model checking. When translating to the PVS theorem prover, the specification is optimized for readability and to support the development of proofs in PVS. When generating executable C or Ada code, the code is optimized for execution speed on the target processor. These optimizations can have a dramatic effect on the target analysis tools. For example, optimization passes incorporated into the NuSMV translator reduced the time required for NuSMV to check one model from over 29 hours to less than a second.

However, some optimizations are better incorporated into the
verification tools rather than the translator. For example,
predicate abstraction^{1} is a well-known
technique for reducing the size of the reachable state space, but
automating this during translation would require a tight
interaction between our translator and the analysis tool to
iteratively refine the predicates based on the counterexamples.
Since many model checkers already implement this technique, we
have not tried to incorporate it into our translator
framework.

We have developed tools to translate the counterexamples produced by the model checkers into two formats. The first is a simple spreadsheet that shows the inputs and outputs of the model for each step (similar to steps noted in the accompanying table). The second is a test script that can be read by the Reactis tool to step forward and backward through the counterexample in the Reactis simulator.

Our translator framework currently supports input models written in Simulink, Stateflow, and SCADE. It generates specifications for the NuSMV, SAL, and Prover model checkers, the PVS and ACL2 theorem provers, and C and Ada source code.

To make these ideas concrete, we present a very small example,
the mode logic for a simple microwave oven shown in
Figure 2. The microwave initially starts in
*Setup* mode. It transitions to *Running* mode when the
*Start* button is pressed and the *Steps Remaining* to
cook (initially provided by the keypad entry subsystem) is
greater than zero. On transition to *Running* mode, the
controller enters either the *Cooking* or *Suspended*
submode, depending on whether *Door Closed* is true. In
*Cooking* mode, the controller decrements *Steps
Remaining* on each step. If the door is opened in
*Cooking* mode or the operator presses the *Clear*
button, the controller enters the *Suspended* submode. From
the *Suspended* submode, the operator can return to
*Cooking* submode by pressing the *Start* button while
the door is closed, or return to *Setup* mode by pressing
the *Clear* button. When *Steps Remaining* decrements
to zero, the controller exits *Running* mode and returns to
*Setup* mode.

Since this model consists only of Boolean values (*Start,
Clear, Door Closed*), enumerated types (mode), and two small
integers (*Steps Remaining* and *Steps to Cook* range
from 0 to 639, the largest value that can be entered on the
keypad) it is well suited for analysis with a symbolic model
checker such as NuSMV. A valuable property to check is that the
door is always closed when the microwave is cooking. In
CTL^{1} (one of the property
specification languages of NuSMV), this is written as:

Translation of the model into NuSMV and checking this property takes only a few seconds and yields the counterexample shown in Table 1.

In step 2 of the counterexample, we see the value of
*Start* change from 0 to 1, indicating the start button was
pressed. Also in step 2, the door is closed and *Steps
Remaining* takes on the value 1. As a result, the microwave
enters *Cooking* mode in step 2. In step 3, the door is
opened, but the microwave remains in *Cooking* mode,
violating our safety property.

To better understand how this happened, we use Reactis to step
through the generated counterexample. This reveals that instead
of taking the transition from *Cooking* to *Suspended*
when the door is opened, the microwave took the transition from
*Cooking* to *Cooking* that decrements *Steps
Remaining* because this transition has a higher priority
(priority 1) than the transition from *Cooking* to
*Suspended* (priority 2). Worse, the microwave would
continue cooking with the door open until *Steps Remaining*
becomes zero. Changing the priority of these two transitions and
rerunning the model checker shows that in all possible states,
the door is always closed if the microwave is cooking.

While this example is tiny, the two integers (*Steps
Remaining* and *Steps to Cook*) still push its reachable
state space to 9.8 × 10^{6} states. Also note that
the model checker does not necessarily find the "best"
counterexample. It actually would have been clearer if the
*Steps Remaining* had been set to a value larger than 1 in
step 2. However, this counterexample is very typical. In the
production models that we have examined, very few counterexamples
are longer than a few steps.

To be of any real value, model checking must be able to handle
much larger problems. Three case studies on the application of
our tools to industrial examples are described here. A fourth
case study is discussed in Miller et
al.^{7}

*ADGS-2100 Window Manager.* One of the largest and most
successful applications of our tools was to the ADGS-2100
Adaptive Display and Guidance System Window
Manager.^{13} In modern aircraft, pilots
are provided aircraft status primarily through computerized
display panels similar to those shown in Figure
3. The ADGS-2100 is a Rockwell Collins product that
provides the heads-down and heads-up displays and display
management software for next-generation commercial aircraft.

The Window Manager (WM) ensures that data from different applications is routed to the correct display panel. In normal operation, the WM determines which applications are being displayed in response to the pilot selections. However, in the case of a component failure, the WM also decides which information is most critical and routes this information from one of the redundant sources to the most appropriate display panel. The WM is essential to the safe flight of the aircraft. If the WM contains logic errors, critical flight information could be unavailable to the flight crew.

While very complex, the WM is specified in Simulink using only
Booleans and enumerated types, making it ideal for verification
using a BDD-based model checker such as NuSMV. The WM is composed
of five main components that can be analyzed independently. These
five components contain a total of 16,117 primitive Simulink
blocks that are grouped into 4,295 instances of Simulink
subsystems. The reachable state space of the five components
ranges from 9.8 × 10^{9} to 1.5 ×
10^{37} states.

Ultimately, 563 properties about the WM were developed and checked, and 98 errors were found and corrected in early versions of the WM model. This verification was done early in the design process while the design was still changing. By the end of the project, the WM developers were checking the properties after every design change.

*CerTA FCS Phase I.* Our second case study was sponsored
by the U.S. Air Force Research Laboratory (AFRL) under the
Certification Technologies for Advanced Flight Critical Systems
(CerTA FCS) program in order to compare the effectiveness of
model checking and testing.^{12} In this
study, we applied our tools to the Operational Flight Program
(OFP) of an unmanned aerial vehicle developed by Lockheed Martin
Aerospace. The OFP is an adaptive flight control system that
modifies its behavior in response to flight conditions. Phase I
of the project concentrated on applying our tools to the
Redundancy Management (RM) logic, which is based almost entirely
on Boolean and enumerated types.

The RM logic was broken down into three components that could
be analyzed individually. While relatively small (they contained
a total of 169 primitive Simulink blocks organized into 23
subsystems, with reachable state spaces ranging from 2.1 ×
10^{4} to 6.0 × 10^{13} states), the RM
logic was replicated in the OFP once for each of the 10 control
surfaces on the aircraft, making it a significant portion of the
OFP logic.

To compare the effectiveness of model checking and testing at discovering errors, this project had two independent verification teams, one that used testing and one that used model checking. The formal verification team developed a total of 62 properties from the OFP requirements and checked these properties with the NuSMV model checker, uncovering 12 errors in the RM logic. Of these 12 errors, four were classified by Lockheed Martin as severity 3 (only severity 1 and 2 can affect the safety of flight), two were classified as severity 4, two resulted in requirements changes, one was redundant, and three resulted from requirements that had not yet been implemented in the release of the software.

In similar fashion, the testing team developed a series of tests from the same OFP requirements. Even though the testing team invested almost half as much time in testing as the formal verification team spent in model checking, testing failed to find any errors. The main reason for this was that the demonstration was not a comprehensive test program. While some of these errors could be found through testing, the cost would be much higher, both to find and fix the errors. In addition, the errors found through model checking tended to be intermittent, near simultaneous, or combinatory sequences of failures that would be very difficult to detect through testing. The conclusion of both teams was that model checking was shown to be more cost effective than testing in finding design errors.

*CerTa FCS Phase II.* The purpose of Phase II of the
CerTA FCS project was to investigate whether model checking could
be used to verify large, numerically intensive models. In this
study, the translation framework and model checking tools were
used to verify important properties of the Effector Blender (EB)
logic of an OFP for a UAV similar to that verified in Phase
I.

The EB is a central component of the OFP that generates the actuator commands for the aircraft's six control surfaces. It is a large, complex model that repeatedly manipulates a 3 × 6 matrix of floating point numbers. It inputs 32 floating point inputs and a 3 × 6 matrix of floating point numbers and outputs a 1 × 6 matrix of floating point numbers. It contains over 2,000 basic Simulink blocks organized into 166 Simulink subsystems, many of which are Stateflow models.

Running a set of properties after each model revision is a quick and easy way to see if anything has been broken. We encourage our developers to "check your models early and check them often."

Because of its extensive use of floating point numbers and large state space, the EB cannot be verified using a BDD-based model checker such as NuSMV. Instead, the EB was analyzed using the Prover SMT-solver from Prover Technologies. Even with the additional capabilities of Prover, several new issues had to be addressed, the hardest being dealing with floating point numbers.

While Prover has powerful decision procedures for linear arithmetic with real numbers and bit-level decision procedures for integers, it does not have decision procedures for floating point numbers. Translating the floating point numbers into real numbers was rejected since much of the arithmetic in the EB is inherently nonlinear. Also, the use of real numbers would mask floating point arithmetic errors such as overflow and underflow.

Instead, the translator framework was extended to convert floating point numbers to fixed point numbers using a scaling factor provided by the OFP designers. The fixed point numbers were then converted to integers using bit-shifting to preserve their magnitude. While this allowed the EB to be verified using Prover's bit-level integer decision procedures, the results were unsound due to the loss of precision. However, if errors were found in the verified model, their presence could easily be confirmed in the original model. This allowed the verification to be used as a highly effective debugging step, even though it did not guarantee correctness.

Determining what properties to verify was also a difficult problem. The requirements for the EB are actually specified for the combination of the EB and the aircraft model, but checking both the EB and the aircraft model exceeded the capabilities of the Prover Plug-In model checker. After consultation with the OFP designers, the verification team decided to verify whether the six actuator commands would always be within dynamically computed upper and lower limits. Violation of these properties would indicate a design error in the EB logic.

Even with these adjustments, the EB model was large enough that it had to be decomposed into a hierarchy of components several levels deep. The leaf nodes of this hierarchy were then verified using Prover Plug-In and their composition was manually verified using manual proofs. This approach also ensured that unsoundness could not be introduced through circular reasoning since Simulink enforces the absence of cyclic dependencies between atomic subsystems.

Ultimately, five errors in the EB design logic were discovered and corrected through model checking of these properties. In addition, several potential errors that were being masked by defensive design practices were found and corrected.

The case studies described here demonstrate that model checking can be effectively used to find errors early in the development process for many classes of models. In particular, even very complex models can be verified with BDD-based model checkers if they consist primarily of Boolean and enumerated types. Every industrial system we have studied contains large sections that either meet this constraint or can be made to meet it with some alteration.

For this class of models, the tools are simple enough for developers to use them routinely and without extensive training. In our experience, a single day of training and a low level of ongoing mentoring are usually sufficient. This also makes it practical to perform model checking early in the development process while a model is still changing.

Running a set of properties after each model revision is a quick and easy way to see if anything has been broken. We encourage our developers to "check your models early and check them often." The time spent model checking is recovered several times over by avoiding rework during unit and integration testing.

Since model checking examines every possible combination of input and state, it is also far more effective at finding design errors than testing, which can only check a small fraction of the possible inputs and states. As demonstrated by the CerTA FCS Phase I case study, it can also be more cost effective than testing.

There are many directions for further research. As illustrated in the CerTA FCS Phase II study, numerically intensive models still pose a challenge for model checking. SMT-based model checkers hold promise for verification of these systems, but the need to write properties that can be verified through induction over the state transition relation make them more difficult for developers to use.

Most industrial models used to generate code make extensive use of floating point numbers. Other models, particularly those that deal with spatial relationships such as navigation, make extensive use of trigonometric and other transcendental functions. A sound and efficient way of checking systems using floating point arithmetic and transcendental functions would be very helpful.

It can also be difficult to determine how many properties must be checked. Our experience has been that checking even a few properties will find errors, but that checking more properties will find more errors. Unlike testing for which many objective coverage criteria have been developed, completeness criteria for properties do not seem to exist. Techniques for developing or measuring the adequacy of a set of properties are needed.

As discussed in the CerTA FCS Phase II case study, the verification of very large models may be achieved by using model checking on subsystems and more traditional reasoning to compose the subsystems. Combining model checking and theorem proving in this way could be a very effective approach to the compositional verification of large systems.

The authors wish to thank Ricky Butler, Celeste Bellcastro, and Kelly Hayhurst of the NASA Langley Research Center, Mats Heimdahl, Yunja Choi, Anjali Joshi, Ajitha Rajan, Sanjai Rayadurgam, and Jeff Thompson of the University of Minnesota, Eric Danielson, John Innis, Ray Kamin, David Lempia, Alan Tribble, Lucas Wagner, and Matt Wilding of Rockwell Collins, Bruce Krogh of CMU, Vince Crum, Wendy Chou, Ray Bortner, and David Homan of the Air Force Research Lab, and Greg Tallant and Walter Storm of Lockheed Martin for their contributions and support.

This work was supported in part by the NASA Langley Research Center under contract NCC-01001 of the Aviation Safety Program (AvSP) and by the Air Force Research Lab under contract FA8650-05-C-3564 of the Certification Technologies for Advanced Flight Control Systems program (CerTA FCS) [88ABW-2009-2730].

1. Clarke, E., Grumberg, O. and Peled, D.
*Model Checking.* The MIT Press, Cambridge, MA, 2001.

2. Esterel Technologies. SCADE Suite Product Description; http://www.estereltechnolgies.com/

3. Halbwachs, N., Caspi, P., Raymond, P and
Pilaud, D. The synchronous dataflow programming language Lustre.
In *Proceedings of the IEEE* 79, 9 (1991)
1305–1320.

4. Holzmann, G. *The SPIN Model Checker:
Primer and Reference Manual.* Addison-Wesley Professional,
2003.

5. IRST. The NuSMV Model Checker; http://nusmv.irst.itc.it/

6. The Mathworks Simulink Product Description; http://www.mathworks.com/

7. Miller, S., Anderson, E., Wagner, L.,
Whalen, M. and Heimdahl, M.P.E. Formal verification of flight
critical software. In *Proceedings of the AIAA Guidance,
Navigation and Control Conference and Exhibit* (San Francisco,
CA, Aug. 15–18, 2005).

8. Miller, S., Tribble, A., Whalen, M. and
Heimdahl, M.P.E. Proving the Shalls. *International Journal on
Software Tools for Technology Transfer* (Feb. 2006).

9. Prover Technology. Prover Plug-In Product Description; http://www.prover.com/

10. Reactive Systems, Inc.; http://www.reactive-systems.com/

11. SRI International. Symbolic Analysis Laboratory; http://sal.csl.sri.com/

12. Whalen, M., Cofer, D., Miller, S. Krogh,
B., and Storm, W. Integration of formal analysis into a
model-based software development process. In *Proceedings of
the 12th International Workshop on Formal Methods for Industrial
Critical Systems* (Berlin, Germany, July 1–2, 2007).

13. Whalen, M., Innis, J., Miller, S. and Wagner, L. ADGS-2100 Adaptive Display & Guidance System Window Manager Analysis. NASA Contractor Report CR-2006–213952 (Feb. 2006); http://shemesh.larc.nasa.gov/fm/fm-collins-pubs.html/

**©2010
ACM 0001-0782/10/0200 $10.00**

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.

No entries found