How Much Software Testing Is Enough?

Investing in a large amount of software testing can be difficult to justify, particularly for a startup company. For every line of production software written, an organization should, at minimum, invest an equivalent amount of developer time and number of lines of code to test the created software. The extra effort means that features can take longer to develop and deliver to the customers. With the constant pressure of "Deliver Now!", it is very easy to skimp on the amount of testing in an effort to launch sooner. The real difficulty is that most developers are good enough that they only do minimal testing to make sure that their software works as expected, deploy their software, and move on.

Companies can actually develop software like this for a long time. However, as soon as the software gets beyond a basic complexity level, the number of bugs that creep back in via regression or untested use cases will result in an unstable application. At this point the company is compelled to either (a) stop development, and add the regression tests they failed to do earlier, or (b) continue a bad pattern where a team of software testers chase regression bugs and add test suites for the previous version of software, while other developers create the next set of features (and bugs) concurrently. Both these patterns are flawed because the time they take to fix the issue is longer than it would have taken had the tests been created continuously.

Test driven development (an Extreme Programming practice) is arguably one of the best ways to help ensure that the created software always has a truss to test it. The basic methodology is to create the test suite first, have it fail, and then create the methods that will get it to pass successfully. This helps to ensure that there is at least one test case for each method created by the developer who wrote the software. By having the testing harness developed concurrently with the software you will have placed the responsibility of testing on the developer who created the feature. This means the company saves time in overall development because the tests are created by people knowledgeable about what needs to be tested, and software can be tested continuously on every source code commit allowing for deployment on demand.

This leaves one critical hole in the testing process. How good is the test suite that the developer created? This is the point where I put on a pragmatic’s hat. If your organization already has the discipline to test every method of your software, you should probably ask the developer to just test the "basic" behavior and allow for extending the test suite if a new bug emerges. The purpose of the test harness is to make sure the software works given the known assumptions of the software and having them re-tested on every check-in and deployment helps build confidence that you are deploying correct software. The most dramatic bugs I have seen (with or without a test harness) have generally occurred when an unanticipated event occurred and testing against the unknowable is difficult.

My favorite story about unanticipated bugs that would have been helped by having a test harness in place occurred during in my early tenure at Amazon. It was a bug I affectionately call "Karmic Revenge." The site was crashing on a subset of Amazon’s book catalog, and happened disturbingly frequently on the search results page. I was called in to identify the bug. (For those coders in the audience: I discovered that a data-structure we were using was referencing an array at location offset of [-1] which was causing the software to crash.) The catalog software had changed recently such that the number -1 was a flag that no data was available. Unfortunately, this knowledge hadn’t propagated through the search software. The "Karmic Revenge" was the book that displayed the problem was about "Memory Management in C." Additionally, for the superstitious, the date the bug was identified, debugged, and fixed was Friday, February 13, 1998. Some bugs you just can’t forget.

Had there been a test harness in place, perhaps this bug would have never made it to the production site. Or if the bug had made it to the site, then once found a new test would have been added to the test harness to prevent future occurrences. However, the structure didn’t exist either in the code or at the organizational level. Better patterns of development will always reduce the likelihood of this error occurring and reoccurring.