During the 1980s, a great deal of emphasis was placed on software productivity improvements. New technologies were explored and advocated that were claimed to give “orders of magnitude” improvements. During the 1990s, the emphasis has swung to software quality. Again, new technologies have been explored and advocated to give similar improvements.
Yet there is little evidence of dramatic change in either the productivity of software developers, or the quality of the products they produce. A recent study by Ed Yourdon and Howard Rubin reports, in fact, a 13% decline in productivity in recent years, and a quality increase of a substantial (but not dramatic) 75% for the 12 years since 1985. What happened to these projected improvements?
A careful examination of the claims of productivity and quality improvements shows a common thread running through them—that software development is sufficiently simple and well understood for dramatic improvements to be possible. But that belief is in stark contrast with those of some of the top people in software engineering, like David L. Parnas and Frederick P. Brooks, who have said “software is hard” [to build], and “software is the most complex task ever undertaken by humanity,” and Tom DeMarco, who likens the breakthrough claims to laetrile, a “cruel fraud.” How can two different camps, with such diverse beliefs, exist in the same discipline?
The purpose of this article is to investigate that question by identifying several specific productivity and quality improvements and exploring (via the research literature) what findings and data exist to support their value.
This type of study has not occurred in the literature before. There has been material in the literature that discusses ways of improving the software research process, by authors like Basili and Fenton. However, the goal of this article is to focus on the practical issue “What does research tell us about the value of new technologies?” rather than the research issue “How can we improve research in order to learn more about the value of new technologies?”
The “Improvements”
To explore the value of new technologies we will look at the literature investigating several of those claimed to have significant benefits:
- Structured Techniques
- Fourth Generation Languages (4GLs)
- Computer Aided Software Engineering (CASE)
- Formal Methods
- Cleanroom Methodology
- Process Models
- Object-Orientation
Structured techniques are defined as the use of structured analysis, structured design, structured programming, and/or any of a number of other techniques labeled as “structured.”
Although there is some disagreement in the literature about what the term fourth generation languages means (some say what distinguishes these languages is that they are nonprocedural), here we take the position that a 4GL is a higher-level, problem-focused language. Most contemporary 4GLs are intended for the database/report generation application domain.
By Computer Aided Software Engineering we mean any automated tool that supports the software person in the process of building or maintaining software. Most contemporary CASE tools support systems analysis and design, but we do not limit our definition to that.
Formal methods is a term that has been subject to a wide variety of definitions. Here we take the fairly narrow but traditional computer science view that it applies primarily to formal specification and formal verification.
Cleanroom methodology’s purpose is to remove errors from software. What distinguishes the approach is that programmers do no testing, instead, they do formal verification; testing is performed only by independent testers; and testing is based on “statistical” approaches.
Process models are descriptions of the appropriate process to be used for building software. The most famous process model is the Capability Maturity Model of the Software Engineering Institute at Carnegie Mellon University.
Object-orientation is that methodology that focuses software problem solving on the objects inherent in the problem to be solved, and the generation of solution objects that address those problem objects.
Note that significant benefits have been claimed for other technologies, such as reuse or TQM. However, publication constraints preclude complete coverage of all such technologies in this article. The technologies chosen here are those for which benefit claims have been the most prominent and persistent.
The Findings
Here the relevant findings for each of the improvements listed previously will be considered and discussed.
Structured techniques. The earliest improvement touted as a breakthrough solution for software problems was the structured techniques. In the early 1970s, the techniques were advocated in a series of documents from IBM. Nearly all software professionals and students have by now been exposed to these techniques. Given the length of time that the techniques have been used in both practice and academia, one would expect numerous research findings identifying the benefits of the approach.
Those expectations are in fact false. Ten years after their first usage, a research study [12] reviewed the literature on the techniques and found “equivocal” results, and no solid benefit data, to support the use of the approach. It is important to note that the findings did not suggest that the structured techniques had no value; it only found that that value had never been determined.
“What does research tell us about the value of new technologies?”
That study, although it happened 10 years after the onset of the structured “revolution,” was published nearly 15 years ago. Have the intervening years been any more productive in evaluating the benefits of the structured techniques?
The answer is no. One recent study found “modest-measurable, but not overwhelming” advantages of structured design over more informal design approaches [7]. There have been few other objective attempts to evaluate these approaches.
4GL. In striking contrast with the dearth of evaluative studies of the structured techniques, there has been some interesting research into the value of 4GLs. Three studies conducted in the late-1980s found specific benefits and costs regarding the use of 4GLs vs. traditional third generation languages, such as Cobol.
The studies, by Misra and Jalics, Matos and Jalics, and by Verner and Tate, summarized and referenced in [1], report the following results:
- Productivity of 4GLs vs. Cobol: in one study, 4GL source code was 29%39% shorter (in lines) than Cobol, and the development process was 15% faster to 90% slower(!). In another, the 4GL programs were 1122 times smaller. In the third, the 4GL was 45 times less effort. In short, there are productivity benefits to 4GLs, but they exhibit enormous variability.
- Performance of 4GLs vs. Cobol: in one study, the 4GL code was 15174 times slower. In another, the 4GL was 6 times faster to 4 times slower. Once again, there are in general performance penalties for the use of 4GLs, but it is difficult to predict what they will be.
According to the researchers, these differences are due to the variability in the capability of the 4GLs. Whereas Cobol can be used to solve any business application problem, a 4GL may or may not be capable of solving the problem at hand, and if it can, it may or may not provide the desired productivity benefits (and there is a price to be paid in performance penalties).
CASE. Computer-aided software engineering, once lauded as the technology that would automate our ability to build software, has more recently been denigrated by software experts. It isn’t just that most CASE tools became “shelfware,” unused by practitioners; it is more a matter of backlash when the claimed benefits turned out to be unachievable.
Thus, for CASE tools especially, the issue of evaluative research becomes vital. What do we really know about this highly-touted, much-maligned technology?
There are few evaluative research studies to help answer those questions. One, conducted by Lauber and Lempp and reported in [1], found these costs and benefits of CASE:
- An increase in the costs at the front end (requirements and design specification) of the life cycle.
- A decrease in the back end (implementation and checkout) costs.
- A net 9% savings.
A second evaluative study, reported in [5], found CASE productivity benefits to be 40% on one project and 128% on another.
A different but also interesting collection of numbers emerges from Myers [9]. This study, an opinion survey of CASE users and their managers, shows domain differences in CASE effectiveness:
- Information systems applications averaged 10% productivity benefits, with a maximum of 25%.
- Scientific applications averaged 10%, with a maximum of 17%.
- Real-time applications averaged 9%, with a maximum of 15%.
Once again, we find a variety of studies with somewhat contradictory findings.
Formal methods. Perhaps no improvement in software engineering is subject to more controversy than formal methods. This collection of techniques has been studied by computer scientists since the late 1960s, has been at the core of computer science curricula since the late 1980s, has been mandated into law for certain application domains in England…and yet remains little used. Why?
One problem is that formal methods remain an underdefined, underevaluated concept. Most definitions of the term match our definition earlier in this article—that formal methods consist of formal specification and formal verification. But a surprising number of other techniques creeps into discussions of the topic, including languages for specifying programming language syntax and semantics, and even hardware verification techniques. (One paper on the topic defined formal methods as any “predictable, routinized, rigorous practice.”)
Perhaps because of this diversity of definitions, the benefits of formal methods—for all the controversy surrounding the topic—have been the subject of very few research studies.
The research literature contains only one formal methods study that has produced any hard numbers. In that study [11], the researchers found a 9% improvement in total development cost for a product on which formal specifications were used.
Cleanroom. Cleanroom, as we noted earlier, is an error-removal methodology. It proposes a radical set of changes to traditional error-removal techniques, especially in the areas of (a) decoupling the programmer from any testing activity, (b) requiring that the programmer do formal verification, and (c) requiring that all testing be “statistical” in nature (statistical testing is the use of a randomized test- case generator that produces test cases tailored to the “operational profile” (normal usage) of the product). Another component of Cleanroom, the use of an independent test group for testing, is less radical.
The findings present a few glimmers of evaluative research light in an otherwise dark universe.
A well-planned series of evaluative research studies has been conducted into Cleanroom’s value. A summary of those studies, with references to them, is presented in [2]. These studies are a model for the kind of evaluative research that could be conducted in the software field. The first study used students who were building software-in-the-small. The second study used industry lab people, a somewhat elite group, who were building something less than software-in-the-large. But the third and final study involved typical practitioner subjects working on an actual production software project. Thus an ever- more relevant series of studies was conducted in order to be able to measure the value of the technology in its intended setting.
The findings were, in some cases, spectacular. 91% of the errors were typically removed from the software product before the first test case was run. Computer time usage decreased by 70%90%. Time spent fixing errors in rework was reduced by 95%. Productivity, at least in the first two studies, was improved by 70% [4].
But a closer analysis of these benefits is important. In the large project setting the productivity benefits of Cleanroom vanished (the other benefits did not). Further, the project software people disliked the approach. In the second and third studies, in which it would have been difficult to find study subjects familiar with formal verification, “rigorous inspection” by the programmers was allowed as a substitute.
Looking at the findings in this light, we see that:
- The most important part of the technique is the static error removal, since 91% of the errors were removed before any testing was begun, and it is through these static techniques that computer time usage was reduced.
- The static error removal technique of “rigorous inspection” was equally as effective as formal verification.
The “spectacular benefits” of Cleanroom, in other words, were derived not from the use of the full methodology but from rigorous inspection techniques, which are not really part of the methodology! That is not a new finding—numerous other studies of inspection techniques have found them to be capable of removing over 90% of software errors, at a lower cost than competing techniques. This is an important finding, but not one that supports using Cleanroom methodology as such.
Thus Cleanroom is an improvement with good solid data about its benefits, but for which interpretation of that data is vital.
Process Models. The best known process model, a model of the maturity of an organization’s software process, is the Software Engineering Institute’s (SEI) five-level Capability Maturity Model (CMM). What do we know about the benefits of the SEI’s CMM?
Even the strongest advocates of the SEI approach acknowledge that there is not a good answer to this question (no comprehensive SEI evaluative study has yet been conducted). The CMM has evolved from the knowledge of some veteran software managers, assisted by consultation from some of the best academic software minds in the country.
There are two published studies that explore the benefits of using the CMM. In Putnam [10] and Herbsleb et al. [6] we find some hard numbers. Companies that had moved from CMM level 1 to level 3 were examined in Putnam [10]. It was found that in doing so they had achieved the following benefits: schedule time was reduced by a factor of 1.7; peak staff needs were reduced by a factor of 3.2; and effort was reduced by a factor of 5.7. These are impressive numbers indeed. They are much larger than the comparable numbers for any of the techniques previously addressed in this article.
The article by Herbsleb et al. [6] looked at 13 companies that instituted CMM-based Software Process Improvement programs. It found productivity gains of 9%67% per year (median 35%); and defect detection gains of 6%25% per year (median 22%). These are, in general, considerably less impressive improvements than those identified in the previous study. Obviously, as with the findings regarding 4GLs, the CMM data reflects a consistent pattern of improvement using the technique, but enough variation in the findings to warrant more evaluative research studies.
Object-Orientation. Here again we have a fairly radical improvement, one that necessitates serious changes in how practitioners build software. The object-oriented approach, to think about a problem and its solution in terms of objects rather than the more traditional functional or data approaches, is radical indeed.
The OO approach has been the subject of long-term evaluative research by the NASA/Goddard Software Engineering Laboratory (SEL). The SEL is a government/industry/academic collaboration of NASA, Computer Sciences Corporation, and the University of Maryland Computer Science Department; the SEL conducted the Cleanroom studies described earlier. The SEL study concluded that OO is “the most influential methodology we have studied to date” [8].
But there is a problem with that finding. The SEL researchers, in studying OO approaches, coupled them with the use of the Ada programming language, and with a heavy emphasis on reuse. The finding is therefore about OO+Ada+reuse, not OO alone.
Some attempts to examine OO in a disaggregated form have been made. For example, there are two studies described in [3] on the claimed OO benefit of “naturalness,” the belief that problem solving via objects is more natural than traditional approaches. These studies both find that OO is not a natural approach; the functional approach appears to be much more effective, and the OO approach more problematic.
Conclusions
With respect to individual improvements, the conclusions are relatively easy. A brief summary of these findings is presented as numerical findings in Table 1, and as discussion in what follows:
- Structured Techniques: Surprisingly few research findings.
- 4GL: Some good, but conflicting, research findings. The potential benefits, however, are at or near the top of the improvements explored here.
- CASE: A few research findings, mostly from practitioner surveys; modest productivity benefits, quality benefits to be determined.
- Formal Methods: Surprisingly unevaluated, relatively small benefits found to date.
- Cleanroom Methods: Excellent research data available showing spectacular benefits but subject to considerable interpretation. Rigorous inspection has the most support for its value, and the value appears to be significant.
- Process Models: Little evaluative data. What exists is encouraging (some very large improvement figures). Heavy use in practice.
- Object-Orientation: Early findings are promising, but with less supportive peripheral results.
Are there any discernible trends or conclusions that can be drawn from these findings? Most importantly, we need to examine the few numbers that do exist. Productivity and quality benefits, as seen from the studies discussed in this article, are typically in the range of 5%70%, with the average in the vicinity of 20%25%. (Interestingly, these figures are supported in a book by Robert Grady,1 which lists benefits ranging from 2%35%). Evaluative research studies into the productivity (or less often, quality) benefits of these improvements show modest but supportable benefits. There are a few outlying numbers, such as the 400%500% benefits for 4GLs in one study, and the 170%570% benefits for the CMM in another, but in general the initial findings show that the early excitement over the benefits of these technology improvements has probably been misplaced. This finding will not surprise those who have been sounding the alarm about excessive hype in our field, but it may—and should—surprise those who have been promulgating a belief in “breakthroughs.”
There is one additional finding worth highlighting here. In the study of CASE tools, one survey described the learning curve for the technology (see Figure 1). It is important to note that this kind of learning curve is probably valid for any new technology. There is an initial loss of productivity, then a slow improvement, reaching a maximum after some period of time. The scales will differ depending on the improvement involved, but the basic curve will remain the same. The importance of the curve is twofold: (1) practitioners exploring the use of new technology must be tolerant of the learning curve, and not expect immediate benefits; and (2) researchers exploring the merits of new technology must perform their evaluations in such a way that learning curve effects are minimized.
The findings of this article present a few glimmers of evaluative research light in an otherwise dark universe. Those bits of light suggest to us that most improvements to the software process offer modest advantages over traditional alternatives. They may not offer the breakthroughs that the field has so long sought—after all, both Brooks and Parnas warned us years ago about the shortage of “silver bullets”—but they show that we have at least been on the right track for a long time.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment