At the University of Washington (UW) Medical Center, a radiotherapy system shoots high-powered radiation beams into the heads of patients, to treat cancers of the tongue and esophagus. Any software errors in the system could prove fatal, so engineers at the medical center have teamed with a group of computer scientists from the university to ensure the system will not fail, and that the beam will shut off if prescribed settings go out of tolerance.
This is made possible by a process known as software verification, and verifying implementations of critical systems like that radiotherapy setup is one of the things about which Zachary Tatlock is passionate. Over three years ago, Tatlock was a Ph.D. candidate giving a talk at the university on his thesis research in program verification. The lead engineer for the medical center’s radiotherapy team was in the audience, and asked Tatlock how they could apply verification to that system. “That probably helped me get hired,” Tatlock recalls. Today, he’s an assistant professor of computer science at the university and, with other colleagues and students at UW, has also been working with the team at the medical center ever since.
What makes the software verification process challenging in the case of the radiotherapy system described here is that it is written in a variety of languages, so different techniques have to be deployed to verify the software in its entirety. The system has about a dozen components, each with different levels of criticality.
Software for logging an event, for example, is not as critical as software that verifies the beam power has not become too high, Tatlock notes. “What we want to be able to do is ensure the reliability of all pieces,” he says. “We want to make sure there are no bugs that can affect the parts that are critical.” There are two or three components “where the rubber meets the road, and it’s super-critical to get them right,” he says.
The radiotherapy system team uses powerful verification methods ranging from automated theorem proving tools to manual proofs written by hand and checked by a proof assistant (a program that checks the correctness of proofs in expressive logic).
Preventing Software Hacks
Generally, computer code is written without any formal processes, and the main metric for testing it is simply trying it out and seeing whether it or not works. Testing does not necessarily guarantee all the bases have been covered that might occur at runtime, or that it would prevent a malicious attacker who reads the program from devising something clever with which to undermine it. Formal software verification relies on mathematical theorems and reasoning and uses deductive techniques to check the most critical aspects of a system. Proponents say this technique is making hacker-proof software possible.
“A lot of the ways attackers take over programs on machines and the Internet is by exploiting security vulnerabilities, and there are many kinds, such as the buffer overrun vulnerability, to which safe languages are just immune,” notes Andrew Appel, professor of computer science at Princeton University, who is considered an expert in the program verification field.
Formal software verification uses methods that don’t rely on running the program; rather, they analyze program text to prove things about its behavior on any possible input. Even using so-called “safe” languages such as Java to write programs doesn’t necessarily guarantee they are correct, says Appel; they can still have bugs and do wrong things, just nothing catastrophic. Safety is always important, but correctness is crucial when it comes to the critical infrastructure components of a system, he emphasizes.
Safety has been proven to be an issue with critical systems before, and in at least one case, put patients’ lives at risk. In 1982, Atomic Energy of Canada Limited (AECL) produced the Therac-25 radiation therapy machine. The system was involved in at least six accidents between 1985 and 1987, in which patients were given massive overdoses of radiation—sometimes as much as hundreds of times greater than normal, resulting in death or serious injury.
Among the findings of a commission that investigated the Therac-25 was that AECL did not have the software code independently reviewed, nor did it ever test the system’s software and hardware until it was assembled at the hospital.
“So if you want your programs to be correct and not just safe, then you need to prove that your program behaves according to some specification,” Appel says. “You have to write down in a formal way what would be correct in terms of how its output relates to its input, and then you have to find a way to assure that for any possible input, your implementation will satisfy its specification.”
Appel is a member of a research project called DeepSpec, whose mission is to examine the full functional correctness of software applications and hardware so programs run the way they are supposed to run. To do this, DeepSpec is building tools for verifying that programs conform to deep specifications—granular, precise descriptions of how software behaves based on formal logic and mathematics—and that software components such as OS kernels provably conform to their deep specifications.
Another DeepSpec member, Yale University computer science professor Zhong Shao, along with a team of researchers there, wrote an operating system called CertiKOS which uses formal verification to ensure the code behaves exactly as is intended. “If the code and spec do not match, there is a bug or loophole,” explains Shao. “What a certified operating system provides you with is a guarantee there are no such loopholes; under every situation, the code will behave exactly like its specification.” This guarantees hackers cannot exploit the operating system, he says.
In addition to verification, the other factor that sets CertiKOS apart from other OSs is that it has an extensible operating system kernel, meaning it can easily be customized and adapted for many different settings so more features can be added, Shao says. CertiKOS also can run multiple threads (small sequences of programmed instructions) simultaneously on multiple central processing unit (CPU) cores, a process known as concurrency.
DeepSpec is building tools for verifying programs, and software components such as OS kernels, conform to deep specifications.
The major questions facing CertiKOS and other examples have to do with “semantics engineering,” the process of defining specifications and proof methodologies to minimize the cost of revalidation, observes Suresh Jagannathan, a computer science professor at Purdue University.
“I don’t consider this a limitation of these systems as much as an aspect of formal verification that will likely be increasingly important as we gain more experience with verification tools and proof assistants, and as we achieve more success in using these tools for verifying realistic systems,” he says. It will be critical to determine what processes and methodologies need to be adopted to make proofs robust as specifications and implementations evolve, Jagannathan adds.
DeepSpec vs. Other Principles
The UW team uses DeepSpec principles to check the more heavy-duty components of the radiotherapy system. To assess the parts of the system that are not as critical, the team uses “lighter-weight, less-powerful techniques to ensure the correctness, so the guarantees for those parts aren’t as strong, but it’s a better engineering trade-off,” Tatlock says. That’s because the DeepSpec principles typically require highly trained humans to prove they function correctly, he says. “They take a lot of effort in that style, but you’re rewarded by having a much stronger guarantee.”
This begs the question: if the DeepSpec techniques could make absolute, iron-clad guarantees of the verification of software, why don’t we use them all the time to avoid crashes and bugs in all types of systems? The reason, says Tatlock, is cost. The proof system used by DeepSpec and the UW medical center radiotherapy team for the most critical components is too expensive to apply to all components, because it requires an expert to sit at a computer and type out proofs so the computer can check them for every version of the system.
This can be time-consuming and inefficient. “Ideally, if you proved one version and made a small change, you’d only make a small change to the proof, but that’s not the way it works,” Tatlock says. Whenever even a small change is made to software code, “it can have very large consequences,” resulting in a big change in the proof. It might change some fact that’s relied upon throughout the rest of the proof, he adds.
Other techniques are less powerful, like bounded model checking, which is a form of exhaustive testing, Tatlock says. This involves considering some component and showing every possible execution of steps up to some bound is correct. “So I might make sure for the logging system, there’s no execution within 100 steps that ever crashes. I get that by automatically testing every execution up to 100 steps.” Yet, he reiterates, the guarantee isn’t as strong, and bounded model checking up to 100 steps does not tell you anything about the 101st step.
Tatlock and his colleagues have built a suite of tools the engineers use in their regular development process. They include a checker that allows them to formally describe the entire radiotherapy system to a computer and ensure the key components are individually correct. The researchers are now working on building verified replacements for those parts of the system. When all those individual system components are put together, he says, essentially, they are assured top-level safety verification.
The radiotherapy system is checked daily because “we want to make sure the code written by the engineers on that team will correctly turn off the beam if anything goes wrong,” Tatlock says. The work is similar to DeepSpec’s; it just emphasizes a different degree of automation.
While CertiKOS prevents one app from incorrectly reading the memory of another application, the UW team does not use it because CertiKOS is a traditional Unix OS meant for running Unix-style apps, and the preponderance of components in the radiotherapy system are embedded systems and just run code directly on the hardware, he says.
Like Tatlock, Appel and Shao stress that only certain types of critical software need to be verified. Already, Appel points out, some verification tools are commercially viable. For example, an optimizing C compiler in France called CompCert is being evaluated by Airbus and European certification agencies for use in compiling the fly-by-wire software used to fly the Airbus jetliner.
“Compared with the compiler Airbus currently uses, CompCert has the advantage of being proved correct, no matter what program is compiled,” he says.
Other aeronautics agencies are also starting to use them. Formal verification tools “have been shown to be effective at finding defects in safety-critical digital systems including avionics systems,” according to a 2017 report released by the U.S. National Aeronautics and Space Administration (NASA) Langley Research Center. Also, the U.S. Federal Aviation Administration (FAA) has issued guidelines that allow the use of formal methods tools to replace some of the methods it uses for “certification credit.”
NASA acknowledges this will be a slow process, noting, “There are still many issues that must be addressed before formal verification tools can be injected into the design process for digital avionics systems.” The report points out, “Most developers of avionics systems are unfamiliar with even the basic notions of formal verification, much less which tools are most appropriate for different problem domains,” and skills and expertise will be needed to master the tools effectively.
The U.S. Defense Advanced Research Projects Agency (DARPA) has developed a program called High-Assurance Cyber Military Systems that takes a “clean-slate, formal methods-based approach to enable semi-automated code synthesis from executable, formal specifications.” The approach ensures a hacker-proof system and “illustrates that we’re now at a point where such systems can be deployed in truly mission-critical environments,” notes Jagannathan.
In recent years, the auto industry has become aware of how vulnerable cars are to hacking, and “they are eagerly looking for solutions from security researchers and verification researchers,” Appel says. He believes in the next decade there will be more industries using software verification, and more software available for purchase that has been verified.
For software verification to become widespread, however, there must be trusted compilers to translate these formal software verification methods, since they are written in high-level languages. Progress is being made, however, and when a trusted compiler becomes widely available, the issue of hacker-proof software may no longer be a wistful notion, but a concrete reality.
Shao, Z.
Certified Software, Communications of the ACM, 53(12), pages 56–66, December 2010
Serna-M. E., and Morales-V. D.
State of the Art in the Research of Formal Verification, Ingeniería, Investigation y Tecnologiá, Oct.-Dec. 2014, pgs. 615–623. Volume 15, Issue 4.
D’Silva, V., Kroening, D., and Weissenbacher, G.
A Survey of Automated Techniques for Formal Software Verification, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 2008, Vol. 27, No. 7.
Souyris, J., Wiels, V., Delmas, D., and Delseny, H.
Formal Verification of Avionics Software Products, FM 2009: Formal Methods, Second World Congress, Eindhoven, The Netherlands, Nov. 2–6, 2009, Proceedings, pp. 532–546
Hoare, T.
The Verifying Compiler: A Grand Challenge for Computing Research. In: Hedin G. (eds) Compiler Construction. CC 2003. Lecture Notes in Computer Science, 2003, vol 2622. Springer, Berlin, Heidelberg
Join the Discussion (0)
Become a Member or Sign In to Post a Comment