The business of software
# The Goldilocks Estimate

Ben was hurrying to his next meeting when his boss stopped him in the hallway. "Ben, I'm heading up to the CEO's office for the budgeting meeting," he said. "Remember the system upgrade we talked about the other day? Could you give me a quick ballpark dollar figure that I can take to the boss just as an example? You won't be held to it, of course..."

A drive-by estimate occurs when a senior manager corners a subordinate and demands an immediate answer to the estimation questions: "When can we get this project done?" "How much will it cost?" and "How many people do we need?" Depending on how much pressure is applied, the unhappy estimator must produce *some* numbers and do it quickly, usually without much research. Estimates derived this way are normally of low quality and making any kind of critical business decision based on them is dangerous and often costly.

At the opposite extreme, sometimes estimation can be a process that goes on and on and on. To make an estimate "safer" and more "accurate," organizations may try to remove all the uncertainty in the project and the data they use to create the estimate. One way to reduce uncertainty is simply to spend more time and effort in analyzing the situationspending more time and effort on something almost always produces a better outcome. However, when estimating a project, the work we have to do to remove the uncertainty in the estimate is pretty much the same work we have to do to actually run the project. I have seen companies where, in order to decide if they should run a project and what resources it will need, they de facto run the project and use up the resources.

Ideally, an estimation process would aim for the Goldilocks "sweet spot" where we spend enough time and effort on the estimate to be sufficiently confident in the result, but not so much that we overengineer the estimate and actually start working the project.

Most uncertainty reduction processes (such as inspections, testing, and estimation) follow an exponential decay curve. Initially there is a lot of low-hanging fruititems that are quite clear and easy to find and fix. As we progressively clean up the more obvious issues, the remaining ones tend to be more obscure, ambiguous, and difficult to find and it takes us longer and longer to resolve the next uncertain item.

If we allocate a certain number of people to produce an estimate and let them work until we are "satisfied" with the result (whatever that might mean), the cost profile will be linear and ascending with respect to time and effort. The combination of these two graphs produces a U-shaped profile, as shown in the figure here. Too far to the left and the likely cost of a poor estimate will be high. Too far to the right and the work done in producing the estimate is not balanced by its value. Indeed, the project might actually be going ahead without its funding being reviewed and approved.

The optimal estimate time and cost occurs at the saddle point and we can calculate where that should be. It will not be the same point for the early feasibility assessment of a very large complex system and for late planning of a very small simple system so we need to adjust for different types of projects. NASA's Deep Space Network (DSN) project^{1} developed a mechanism for this calculation based on just two simple parameters:

**Target Cost of Project**. This is the*goal cost*of the project as first envisaged in the project concept. It is not the estimated cost of the project (which has not been calculated yet). The rationale is, when we expect and plan to spend a lot of money on a project, we should also expect to spend proportionally more for the estimate, simply because more is at risk.**The Business Practice being Supported**. Very early in a project, it is normal that the data available is sparse, ambiguous, and of low quality. At this point in time, the business practice being supported is a conceptual one: should we consider this project? If we were to invest in it, would we get a reasonable return? For such estimates, investing too much time and effort is usually not worthwhile. The estimate produced will be (or should be) only used to approve the project for continued analysis or to reject it altogether. In the latter case, developing a highly detailed and expensive estimate for a project that will not even start is not a good use of resources.

Later in a life cycle, assuming the project has been given the go-ahead, we have better data and we are at the point of committing significant resources to move the project forward. Therefore it is worth spending more time and money to produce an estimate. Here the business practice being supported is one of financial budgeting and resource allocation.

Later still, when the resources have been allocated and the project is ready to launch, we need even more detailed estimates and we often have high-quality data to support them. These estimates will provide the bounding box inside which the project plan will sit and supports the project and man-power planning activity.

**The Formula**. The formula devised by NASA is a simple one:

**Cost_of_Estimate = Practice_Parameter * Target_Cost ^{0.35}**

Where the values of **Practice_Parameter** are related to the estimation phase by Table 1, and the exponent of 0.35 is fixed (for NASA).

**Using the Formula**. Since the formula only uses two simple parameters, it is easy to applya project with a Target Cost of (say) $10m should expect to spend the amounts given in Table 2. Simply by dividing the cost by a personnel rate we can arrive at a simple effort value and using a personnel loading factor we can deduce an approximate schedule to produce the estimate. Using an average personnel rate of $100/ staff hour and three people allocated half time of eight-hour days to producing these estimates, our $10m project estimates would take the effort and time given in Table 3.

There are some caveats to using this approach intelligently, three of which I will address here. One is that the effort/time ratio is not linear for heavy staff loadingwe cannot get the 68 Staff Hour estimate in one hour elapsed time by putting 68 people on the task for instance.

Secondly the calculation is driven by the **Target Cost** which is only, well, a target. What happens if the target is way off? A common result of producing an estimate based on the calculated time and effort determined by this formula is that we find the Target Cost is actually not achievable. This is, of course, the point of creating the estimate. In this case it is perfectly appropriate to expend whatever extra time and effort is indicated by the ** new** (estimated) cost to further refine the estimate.

Also, the formula assumes the quality of data is only dependent on the phase and target cost, so the Planning Estimate for two $10m projects would be equivalent. If this is not the case, if one project is dealing with well-known quantities while the other is something completely new, some adjustment to the estimate effort might be necessary. The key thing is to control the estimate investment to achieve the most optimal cost-effective result.

After Ben's boss had taken the ballpark numbers to the CEO they were, quite predictably, cast in concrete and used to fund the projectthough only after they were trimmed somewhat to take out the "fat." The resulting project was a case study in under-resourcing and experienced enormous overruns. Having learned their lesson from this, much greater "accuracy" was demanded of future estimates, to the point that it was decreed that the project manager would be held personally responsible for any variation from the estimate of more than 5%plus or minus(!). Using these guidelines, producing the "estimate" on a subsequent project took close to 35% of the expected budget before it was decided it should be canceled. Both extremes, of course, drew the public ire of senior executives.

Clearly, in the business of software, our estimation effort needs to be "just right" if we are not to suffer the anger of Papa Bear.

1. Remer, D.S. and Buchanan, H.R. A model for the cost of doing a cost estimate. *The Telecommunications and Data Acquisition Progress Report* 42110, NASA Jet Propulsion Laboratory, Pasadena, CA (1992), 278283.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2012 ACM, Inc.

No entries found