In February, Apple revealed and fixed a Secure Sockets Layer (SSL) vulnerability that had gone undiscovered since the release of iOS 6.0 in September 2012. It left users vulnerable to man-in-the-middle attacks thanks to a short circuit in the SSL/TLS (Transport Layer Security) handshake algorithm introduced by the duplication of a
goto statement. Since the discovery of this very serious bug, many people have written about potential causes. A close inspection of the code, however, reveals not only how a unit test could have been written to catch the bug, but also how to refactor the existing code to make the algorithm testable—as well as more clues to the nature of the error and the environment that produced it.
This article addresses five big questions about the SSL vulnerability: What was the bug (and why was it bad)? How did it happen (and how didn't it)? How could a test have caught it? Why didn't a test catch it? How can we fix the root cause?
I could not agree more about the unit testing and if possibly even more about the evils of copy & paste coders.
However ... this bug would have been a shining red error in my Eclipse workspace since trivial static analysis would have detected it. There are better arguments for unit tests imho ...
@Peter Kriens: Part of the argument is that the bug could've taken another form, e.g. mismatched braces. Static analysis, while a very helpful tool that might've caught this one manifestation of the bug, likely wouldn't have caught mismatched braces. A unit test would have. A unit test such as the proof-of-concept test I wrote to accompany this article should be called for regardless, to ensure that the correct failure points are triggered given various flavors of bogus inputs. Then there's the design pressure to eliminate duplication, which I believe is directly responsible for the genesis of this bug.
Either way, it appears neither unit testing nor static analysis was brought to bear here, and possibly not even code review (or, at best, the diff was so big the reviewer was blind to the buried error). That points to the deeper cultural issue I raised, where subpar code quality is indicative of a development (and possibly corporate) culture that does not take its social responsibilities seriously enough, or at least had a dangerous lapse in this one case. Having most of the media bandwidth spent on talking about how hard it would've been to test for, or how the code was pretty good anyways, doesn't help solve the problem, as it's not just a technical problem. In addition, we need people to say something more than just "X would've caught it" or "goto is e-ville" or "Nobody should be using C anymore". None of these responses go far enough to help solve the problem.
What better arguments for unit testing are there? Seriously, if they exist, please share them. We need more concrete, compelling arguments that show the value of unit testing in particular, and code quality practices in general. Bear in mind that pure rational arguments often aren't effective on the other side of the chasm, to borrow the metaphor from Geoffrey Moore's "Crossing the Chasm". Such appeals get heads to nod among the like-minded Innovators and Early Adopters (described in Moore's book), but everyone else needs more meat, in terms of examples and, best of all, experience (hence the proof-of-concept test).
Part of my motivation in writing this was to inspire more people to share their concrete arguments and experiences, even if only internally in their own companies. Bugs such as these, where we can both point clearly at the code and the downstream implications, provide opportunities like no other to make the case that code quality matters, and unit testing in particular is one of the best tools at our disposal to prevent a multitude of really straightforward coding errors early--and produces a host of beneficial second-order effects when done well. I also go into far greater depth in my "Goto Fail, Heartbleed, and Unit Testing Culture" article for Martin Fowler, linked from the final section of the article. (This article is seven pages in the printed CACM, including images; the other article, printed from Chrome, minus the acknowledgements at the end, is forty-seven.)
Displaying all 2 comments