According to the U.S. Bureau of Labor Statistics there were nearly three million people in the U.S. employed in the business of software as of May 2005. Software is arguably one of the principal sources of wealth in the modern world, and it is the mission-critical component of many of the most essential systems that support today’s society. Yet there is a surprising lack of solid data on how good we are at building the stuff.
Why We Don’t Count
There are many reasons for this lack of good information. Obviously, many companies are loath to make public how long their projects really take, or what the final software quality really is, especially if the projects are less than successful. Also, software is a very variable business: we employ a very wide range of development practices, methodologies, life cycles, technologies, and languages when we create software. These variables tend to make the data quite "noisy," which makes detailed causal analysis difficult and interpretation ambiguous. The noise also reduces the perceived value of the metrics, which discourages their further collection.
There are other reasons too. Software development has been an identifiable activity for around 50 years. The phrase "software engineering" has been used at least since the NATO Garmisch conference in 1968. But we don’t appear to have done a very good job of establishing professional engineering credentials or standards. The most significant factor is that, at its core, software is a knowledge repository. The knowledge that is stored in software largely relates to the application domain, and not to software as an entity itself. The purpose of application software is to manage what the application does, rather than the somewhat self-referential activity of managing what the software does. Since the potential application domains are very varied, the software-stored application knowledge has very many possible forms and structures. Except in a few isolated areas, we have not fixed on a consistent ontology to describe these much beyond a syntactic level. So the structure and representational forms of "software knowledge" are more historical and conventional than they are essential and empirical.
Underlying this is the uncomfortable fact that, if software is truly a knowledge storage medium, our metrics would need to empirically measure the quantity and complexity of knowledge itself. Unfortunately, there is simply no way to measure and quantify knowledge. It is a problem that has engaged philosophers for over two millennia and there is no easy answer to it.
The philosophical and practical problems of software measurement aside, if our profession aspires to the title of "engineering," we must get better at measurement.
Thankfully, there are people doing this. The results are usually interesting, sometimes puzzling, and occasionally startling.
The QSM Software Almanac
Quantitative Software Management (QSM) is a company that develops software project estimation and tracking tools that go by the name "SLIM" (short for Software LIfecycle Management). They also collect data from their customer base. QSM has recently published the 2006 IT Metrics edition of its software almanac.1 This column summarizes some of their findings.
The "Average" Project
QSM has been gathering data on software development since the late 1970s. Their 2006 software almanac is a study of findings from measurements of modern projects. The study covers 564 recent IT projects from 31 corporations in 16 industries in 16 countries. These projects were sampled from a database of over 7,000 projects going back nearly three decades. From the sample, a "typical" (median) project had the following characteristics:
- Mean project size of fewer than seven people, though the median project size was between one and three people;
- Duration was less than eight months;
- The project required just less than 58 staff months of effort;
- The primary language is still COBOL, though Java is gaining fast; and
- The median delivered code size is 9,200 LOC.
A few of the more interesting findings shown by the study are considered here.
More Life Cycle "Overlap"
In the period since 1999, we seem to be overlapping our project definition-analysis and analysis-construction phases a lot more than we had previously. The 50%60% increase in overlap is probably due to the greater use of iterative life cycles, though we could postulate other causes. Interestingly, this overlap does not seem to vary with project size, which indicates we run small projects much the same way we run big ones. This is perhaps a little surprising, since small projects are usually considered better candidates for agile and iterative development.
Small is (Much, Much) Better
The effectiveness of large teams is seriously questioned by the data in this analysis. Large teams (29 people) create around six times as many defects as small teams (three people) and obviously burn through a lot more money. Yet the large team appears to produce about the same amount of output in only an average of 12 days less time. This is a truly astonishing finding, though it fits with my personal experience on projects over 35 years. This data also supports earlier observations and measurements made by QSM.
Best to Worst-in-Class
The "best in class" projects compared to the "worst in class" in this study tracked over the following ranges:
- Effort: 15x less effort.
- Schedule: 1/5 of the duration.
- Team Size: The "best" teams were much smaller (9% > 10 people versus 79% > 10 people on the "worst in class" teams).
Not surprisingly, the best teams had better measurements, lower turnover, better application knowledge, and better management. The median delivered code size for the best and worst in class projects was about the same. So using delivered size as a measure of value there are clear and huge differences between how good we can be and how bad we can be in this business.
Delivered Quality
One of the characteristics of bad projects is that they don’t tend to collect detailed data on just how bad they are. It is almost an axiom of software process and software metrics that the projects that are most likely to need them are the ones that are least likely to use them. For this reason, while there are plenty of anecdotes and opinions, QSM found it was not possible to empirically measure the difference between quality levels of best-to-worst projects.
Decreasing Size
It seems that projects have been getting smaller and smaller. From an average peak of 80K LOC in the mid 1980s and 55K LOC in the late 1990s, delivered code size has decreased to around 28K LOC in 2005, with a spike of delivered code around the end of the millennium. This decrease probably has a number of causes, the most significant is likely to be the more pervasive use of high-level languages, which tend to deliver more bang for the LOC.
We’re still not managing scope creep though. Initial size estimates are out by around 20% compared to delivered size, and requirements growth is around 30%.
Big is Better
In contrast to the team size, the projects that deliver larger system sizes do seem to have higher productivity. This characteristic has stayed consistent over the years too. Larger projects have significantly higher levels of productivity, indicating there are some economies of scale, but this is more likely to be a correlated factor than a dependent factor. Teams that build bigger systems usually have higher productivity, but it is probably not just because they are building bigger systems. It is likely that the larger systems are more important to the business and attract higher quality staff and more consistent resource allocation because of that importance.
Productivity is Getting Lower
One of the most interesting things QSM found was that productivity of projects, after rising steadily and quite predictably for the 15 years from 1982 through 1997, has declined markedly since then. The primary productivity factor they tracked (a factor called "Productivity Index" or PI) is now around the same level as it was back in the late 1980s. It might be enlightening to postulate what reasons there could be for this decline.
I would like to invite reader comments on and interpretations of these results and if they match other observations and measurements. It would interesting if we, as a profession, were to think how we could measure the reasons for these findings, and understand what these critical trends actually mean to our business.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment