You can’t control what you can’t measure." So says Tom DeMarco at the beginning of his seminal book Controlling Software Projects.3 DeMarco was echoing Lord Kelvin who, a century or so earlier, had said "If you cannot measure it, you cannot improve it."
These statements seem rather intuitive, and both quotes have been used over the years to justify the development and application of metrics programs in the business of software. Certainly it would be difficult to determine if you have actually improved something if there were no way to quantify the change—but there are other considerations in measurement.
Watts, on First
In the mid-1990s I attended a presentation by the late Watts Humphrey given to an audience of telecommunications executives where he gently chided them for investing so heavily in a measurement program that was, to his mind, rather ineffectual. Humphrey’s point was that this particular company was, at the time, an SEI CMM Level 1 organization. Such an organization was characterized by the Software Engineering Institute’s Capability Maturity Model as Initial or "ad hoc" meaning the processes used to build systems were recreated each time for each system and each project. In Humphrey’s view this meant that, to some extent, this company’s development process was out of control; or perhaps more appropriately, nothing could be inferred from measurement on this project since the next project would likely use a quite different process. As Humphrey put it, the only thing measurement would truly show was that the process was out of control and they already knew that. Therefore, he reasoned, the company should invest first in stabilizing their processes before they expended much effort in developing measurement programs.
Should the control chicken come before the measurement egg or should it be the other way around?
This message received a cold response. The executives and the company had already invested heavily in their metrics program and they were proud of it. They believed it was instrumental in their technological success irrespective of their SEI level. So who was right? Should the control chicken come before the measurement egg or should it be the other way around?
Hawthorne Would
In purely metric terms, Humphrey had a point—if the engine in your car keeps surging uncontrollably it is more important to fix it than to install a new speedometer. However, this company had a long track record of using measurements to generate change. They intentionally employed a form of Hawthorne Effecta; measurement was sometimes intended to catalyze improvement rather than to measure. At one point this company instituted a large metrics program that collected data from regular feed-back sessions between employees and supervisors simply to force managers and their direct reports to engage in regular feedback sessions and to develop feedback skills. Though measurement was touted as the purpose of the program, in reality it was much less significant than the process.
Core Metrics
At the core of systems development, there are some truly basic attributes that we need to measure. These attributes were summed up by Larry Putnam, Sr. and Ware Myers4 as:
- Size: usually of the delivered system in some useful metric. While few people actually care how "big" a system is, size is actually a proxy for the knowledge content that is itself a proxy for the customer-realized value of a system.
- Productivity: how effective the organization is in turning resources of time and effort/cost into valuable software products.
- Time: how much calendar time is required to build the system/deliver the value.
- Effort/Cost: the amount of work and money required to deliver the value.
- Reliability/Quality: the relative amount of functioning (read "correct" knowledge) system versus non-functioning ("incorrect" knowledge) system.
As simple as they are, there are challenges to measurement even for these basic measures.
Size: Measuring Knowledge
At the core of all software size metrics is this uncomfortable truth: what we really want to measure is the knowledge content of the delivered software but there is no way to do that. There is no empirical way of quantifying knowledge. There is no unit for knowledge. There is not even a consistent definition of what knowledge is. However, there are proxies for the knowledge content that are quite measurable. They are always related to the substrate or medium on which knowledge is deposited and we inevitably end up measuring the physical properties of this substrate. All else being equal, a system that is twice as "big" as another system will contain more knowledge—approximately twice as much knowledge in fact. This size might be counted using many different units: requirements, stories, use cases, and so forth. Each of these units has itself some average knowledge content or "size" coupled with some variability or uncertainty in that knowledge content. The size of the unit times the number of units indicates the size of the system. The uncertainty in the size of the unit times the uncertainty in the count of the unit indicates the uncertainty in the size of the system.2
At the core of systems development, there are some truly basic attributes that we need to measure.
Productivity: Measuring Effectiveness
This is a loaded term—it sounds like a manufacturing units-per-hour metric but in software it really references the team’s or organization’s cooperative ability to acquire knowledge—to learn. It is this factor, more than any other, which determines how effectively we can build systems. We can usually see when a project team is effective or ineffective, but proactively ensuring such effectiveness is difficult and many business practices (such as continually moving people on and off projects) make us less effective though it is hard to quantify just how much.
Time: Measuring Duration
One would think this would be very easy—just look at the clock or the calendar. But some organizations do not even record and retain information on when their projects start and even when they do what they measure may be quite ambiguous. Imagine a project against whose time recording system someone logged one hour of requirements activity two years ago, while the project team came on board yesterday. Did the project "start" two years ago or did it start yesterday? In this case the answer is pretty clear. But suppose a year ago, five people recorded a month’s worth of work, then six months ago 10 person-months were recorded, and three months ago 20 person months were recorded: now when did the project start? Such a slow ramp-up is not unusual and it means there simply was no one discrete point in time when the project "started." And if we cannot define when a project started, clearly we cannot say how long it took to complete.
Effort: Measuring Cost
Organizations routinely play with both the effort recorded (declining to record overtime, for instance) and with the effort-to-cost ratio by not paying for this overtime or by hiring cheaper offshore resources. These practices introduce significant variance in such measures, even when they are actually taken.
What we really want to measure is the knowledge content of the delivered software but there is no way to do that.
Reliability: Measuring Quality
This is a perennial challenge in our business. Since defects are mostly deemed to be "bad" there are personal and organizational pressures that operate against consistent measurement. Since a "defect" is simply the identification of knowledge that was not gained, measuring defects suffers from some of the same challenges as measuring system size. In some ways, we can view defects as key and valuable indicators of the knowledge acquisition process.1 But we still have a lot of variability in how we define and measure them.
Intrinsic and Artificial Variability
In measurement of software processes and products, there are two types of variability: intrinsic and artificial. Intrinsic variability simply exists in the situation whereas artificial variability is a function of the artifice of measurement. For example: Intrinsic variability occurs in predicting the size of a finished system in order to estimate a project—before we build a system we simply cannot know exactly how big it will be. We can make an educated guess, we can compare against similar system history, we can extrapolate from what we have seen before, but we do not know. Artificial variability occurs due to variation in the format of metrics and how we collect them. If we count system use cases as a size metric and one person writes highly detailed requirements while another dashes off vague and perfunctory products, what "use case" means is quite different. The knowledge content of one person’s use case can vary a lot from the knowledge content of another person’s use case. When recording time, if one project counts its start as the very first moment any time is recorded for any activity, while another waits until everyone is on board the meaning of "start" will be quite variable.
We do not have control over the intrinsic variability—we cannot definitively know precisely how big a system will be before we build it, for instance—but we can and should manage the artificial variability. Good metrics programs do this: they apply structure to requirements document formats and density, they define usable and practical measures of time and effort, they carefully define what a defect is and how severe it might be. And they embrace the intrinsic uncertainty as something to be identified and quantified rather than denied and hidden away.
We can usually see when a project team is effective or ineffective, but proactively ensuring such effectiveness is difficult.
Get Control
The fact that some things are not particularly measurable does not mean measurements are not useful. The fact that there is uncertainty all around us does not mean we should pretend things are not uncertain. There is a lot we can do to remove or reduce artificial variability and one of our primary reasons to do this is to expose the irremovable intrinsic uncertainty so we can measure it and make important decisions based upon it. To make metrics more usable and useful we have to strip off this artificial variability and we do this by looking at the metrics and the metrics collection process and by applying some, well, control.
To reverse engineer the statements of Lord Kelvin and Tom DeMarco: if you do not improve it, you cannot measure it very well and, to some extent, we cannot measure what we do not control.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment