On August 5, 2012, 10:18 P.M. PST, a large rover named Curiosity made a soft landing on the surface of Mars. Given the one-way light-time to Mars, the controllers on Earth learned about the successful touchdown 14 minutes later, at 10:32 P.M. PST. As can be expected, all functions on the rover, and on the spacecraft that brought it to its destination 350 million miles from Earth, are controlled by software. This article discusses some of the precautions the JPL flight software team took to improve its reliability.
To begin the journey to Mars you need a launch vehicle with enough thrust to escape Earth's gravity. On Earth, Curiosity weighed 900 kg. It weighs no more than 337.5 kg on Mars because Mars is smaller than Earth. Curiosity began its trip atop a large Atlas V 541 rocket, which, together with fuel and all other parts needed for the trip, brought the total launch weight to a whopping 531,000 kg, or 590 times the weight of the rover alone.
The accompanying video of Gerard Holzmann was recorded at HotDep '12 (https://www.usenix.org/conference/hotdep12).
The article points out the difficulties in writing correct multithreaded code, because of the ever-present risks of race conditions and other subtle concurrency flaws. It describes how critical MSL software components were checked for race conditions via the use of logical assertions in the code and a powerful model checker. Such methods are the state of the art for safety-critical software and software running on spacecraft, in which requirements for software quality are extreme. The difficulty of scaling up such methods is hinted at by the fact that these methods were used only for critical software components and by the use of different methods for subsystems of different sizes.
Static code analysis tools, referred to elsewhere in the article, can be used to automatically find race conditions and other concurrency flaws without requiring the use of assertions. Their accuracy is not quite as high as methods involving model checking, but they can be used routinely by ordinary software engineers and they scale to million-line codebases. We have developed a static analysis tool that focuses specifically on concurrency defects in Java, called ThreadSafe (www.contemplateltd.com/threadsafe), and can wholeheartedly appreciate the difficulty of finding such flaws. Our experience with users is that concurrency defects are as much of a problem in commercial software as in systems like the one described, and tool support is desperately needed to avoid the risk of failures.
The following letter was published in the Letters to the Editor in the April 2014 CACM (http://cacm.acm.org/magazines/2014/4/173227).
Gerard J. Holzmann's article "Mars Code" (Feb. 2014) demonstrated a nonblocking implementation of concurrent double-ended queues, or deques,1 (previously shown to be incorrect by Doherty2) to not work through an application of Holzmann's own Spin model checker. However, the demonstration seemed too shallow. Assuming the writer process of the test driver is allowed to pushRight(0) and pushRight(1) before the reader process gets a chance to run, then the value of rv returned by the first succeeding popRight() in the reader process would definitely not be 0 and the assert(rv==i) would fail because i is 0; that is, the test driver is incorrect and could fail, even with a correct implementation of concurrent deques.
Holzmann's demonstration included a description of a failing run of his test driver exercising the concurrent deque implementation. Even though Holzmann clearly left out some details—the initialize function and complete output of the model checker—the failing run he described was definitely much simpler than the failure described by Doherty,2 indicating the failure detected was in the test driver, not in the concurrent deque implementation.
Naur is correct, and I thank him for his keen observation. The test driver used for the example was flawed. If we make the required changes we can show it takes minimally three processes to expose the bug in the original DCAS algorithm,1 as was also shown in Doherty.2 A corrected version of the example, with the model-checking result, is available from the author.
Gerard J. Holzmann
1. Detlefs, D.L. et al. Even better DCAS-based concurrent deques. In Distributed Algorithms, LNCS Vol. 1914, M. Herlihy, Ed. Springer-Verlag, Heidelberg, Germany, 2000, 59–73.
2. Doherty, S. Modelling and Verifying Non-blocking Algorithms That Use Dynamically Allocated Memory. Master's thesis, Victoria University, Wellington, New Zealand, 2004.
The article mentions that "In 145 code reviews held between 2008 and 2012 for MSL flight software, approximately 10,000 peer comments and 30,000 tool-generated reports were discussed"
This seems like a low number of code reviews for 4 years. Is this figure correct? If so, is there a reason for not doing code reviews more often?
Yaki Tebeka: each software module was peer reviewed at least once, and some more than once. (There were about 120 such modules -- one for each VxWorks task.)
So 145 code reviews is roughly what would be expected for this code. Note also that the peer code review was meant to be the final step before a code freeze for launch.
It would be good to do more frequent code reviews of course, but the time schedule for a mission like this is typically very tight.
A small point: a kilogram (kg) is a unit of mass, not weight, and the mass of Curiosity is the same on Earth and Mars or anywhere else in the universe. The weight is the product of the mass and the local acceleration due to free-fall. Weight is a force, and has units such as kg.m/s^2; this is what changes from Earth to Mars.
Displaying all 6 comments