Is There a 10x Gap Between Best and Average Programmers? And How Did It Get There?

Georgia Institute of Technology professor Mark Guzdial

You can find lots of people on the Internet claiming that there are magnitudes of difference between the best and the average programmers. See this article that makes the claim that some programmers are ten times better than others.

Joel Spolsky tells us (see here) that there is 5x or 10x gap between the best and average programmers, so companies need to hire the very best programmers.

Still, I don’t think it’s a stretch to believe this data shows 5:1 or 10:1 productivity differences between programmers. But wait, there’s more! If the only difference between programmers were productivity, you might think that you could substitute five mediocre programmers for one really good programmer. That obviously doesn’t work. Brooks’ Law, "adding manpower to a late software project makes it later," is why. A single good programmer working on a single task has no coordination or communication overhead. Five programmers working on the same task must coordinate and communicate. That takes a lot of time. There are added benefits to using the smallest team possible.

Alan Eustace of Google has been quoted saying that the best are 300 times better than the average (see article here).

One top-notch engineer is worth "300 times or more than the average," explains Alan Eustace, a Google vice president of engineering. He says he would rather lose an entire incoming class of engineering graduates than one exceptional technologist. Many Google services, such as Gmail and Google News, were started by a single person, he says.

Eustace isn’t the same thing as the others in his 300x claim. He’s not saying that some programmers produce 300 times more useful code than others. He’s saying that they’re more valuable, which may mean that that have business insights or other disciplinary knowledge that makes them particularly valuable.

For this post, let’s stick to the programmer productivity claims and explore the 10x argument. There isn’t much reason to believe it, and even if it’s true, it more describes a design goal than a truth about the world of programmers.

The Empirical Evidence

Steve McConnell wrote a chapter in Making Software (Oram and Wilson, eds., O’Reilly 2011) arguing that we do have empirical evidence that there is a magnitude (10:1) difference between the best and average programmers. He wrote a blog post that follows up on that chapter, responding to criticism of his earlier chapter. See the post here.

The general finding that "There are order-of-magnitude differences among programmers" has been confirmed by many other studies of professional programmers (Curtis 1981, Mills 1983, DeMarco and Lister 1985, Curtis et al. 1986, Card 1987, Boehm and Papaccio 1988, Valett and McGarry 1989, Boehm et al 2000).

As I reviewed these citations once again in writing this article, I concluded again that they support the general finding that there are 10x productivity differences among programmers. The studies have collectively involved hundreds of professional programmers across a spectrum of programming activities. Specific differences range from about 5:1 to about 25:1, and in my judgment that collectively supports the 10x claim. Moreover, the research finding is consistent with my experience, in which I have personally observed 10x differences (or more) between different programmers. I think one reason the 10x claim resonates with many people is that many other software professionals have observed 10x differences among programmers too.

That last part is important. Many of us have met programmers who are just amazing, who seem at least ten times better than the rest of us. We might not even know what we mean by "better" (smarter? faster? more productive?), but we definitely have a sense of "better." Our everyday perception, though, can be easily swayed. If we could carefully measure "better," we could assess the 10x claim.

Laurent Bossavit critiques the programmer productivity studies in his book The Leprechauns of Software Engineering: How folklore turns into fact and what to do about it (2013, Lean Pub, download the book here). Bossavit spends two chapters exploring the literature on the 10x claim. He points out flaws in many of the studies, including measuring time to completion in one study and number of lines of code generated in another (obviously dissimilar objectives) and comparing programmers in machine language with those using Algol.

Dickey pointed out that the 28:1 ratio was observed because "subject 7 required 170 hours to program the ‘algebra’ program in a batch environment, in machine language (while) subject 3 required 6 hours to program the same problem in JTS (ALGOL) in a time-shared environment."

Sackman shouldn’t have directly compared the best and worst performances in the entire set, argued Dickey, but rather the best and worst among programmers placed by the experimental setup under comparable conditions.

Dickey concludes: "After accounting for the differences in the classes, only a range of 5:1 can be attributed to programmer variability."

There’s a Better Question to Ask

Bossavit argues that the studies cited in support of the 10x claim are poorly done, with no replication of any result — an important part of any scientific claim. Part of what makes the studies poor is inconsistency. Just as it’s hard to tell what we mean by "better," there is no clear way to measure programming "productivity." Lines of code? Should it be more lines, or fewer lines that achieve the same objective? Or maybe more correct lines of code? Less time to completion? Less time to debug? He ends his discussion of the 10x claim with this statement:

The 10x claim is "not even wrong", and the question of "how variable is individual programmer productivity" should be dissolved rather than answered.

I agree with his recommendation, but for a different reason.

None of the cited studies controlled for expertise or training. In other words, they might be comparing people with decades of experience to those newly hired. Even those studies that occur in classroom settings do not account for time spent developing expertise in computing outside the classroom (which we know from decades of studies can be sizable — see discussion here).

While there is little sound empirical evidence arguing for the 10x difference, I am not worried that there might be a magnitude of difference between the best and worst programmers. It’s simply the wrong question to ask. The important questions are: How did the best programmers get so good? Can we replicate the process?

In any field, there is a difference between the best and worst practitioners. There is no reason to believe that the differences are somehow "innate," that there are people who are born programmers. (I have made this argument before.) There is every reason to believe that we can design educational opportunities to develop the best programmers! Few of the "best" programmers today got there through intentional educational interventions. Computing educators aim to understand how the "best" got there, and how to provide educational opportunities to get there. Education is a design problem. Computing education is still quite new, and we have a long way to go to catch up. We are just starting to learn how to teach computing and programming better.

From Joel Spolsky’s and Alan Eustace’s perspective, I understand why they want to hire the very best programmers available. But from our perspective as computing educators, that’s the wrong direction. Rather than filter for just the best and send them off to industry, we need to figure out how to help students become the best.

Is There a 10x Gap Between Best and Average Programmers? And How Did It Get There?

The Empirical Evidence

There’s a Better Question to Ask

Is There a 10x Gap Between Best and Average Programmers? And How Did It Get There?

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.