You can find lots of people on the Internet claiming that there are magnitudes of difference between the best and the average programmers. See this article that makes the claim that some programmers are ten times better than others.
Joel Spolsky tells us (see here) that there is 5x or 10x gap between the best and average programmers, so companies need to hire the very best programmers.
Still, I don’t think it’s a stretch to believe this data shows 5:1 or 10:1 productivity differences between programmers. But wait, there’s more! If the only difference between programmers were productivity, you might think that you could substitute five mediocre programmers for one really good programmer. That obviously doesn’t work. Brooks’ Law, "adding manpower to a late software project makes it later," is why. A single good programmer working on a single task has no coordination or communication overhead. Five programmers working on the same task must coordinate and communicate. That takes a lot of time. There are added benefits to using the smallest team possible.
Alan Eustace of Google has been quoted saying that the best are 300 times better than the average (see article here).
One top-notch engineer is worth "300 times or more than the average," explains Alan Eustace, a Google vice president of engineering. He says he would rather lose an entire incoming class of engineering graduates than one exceptional technologist. Many Google services, such as Gmail and Google News, were started by a single person, he says.
Eustace isn’t the same thing as the others in his 300x claim. He’s not saying that some programmers produce 300 times more useful code than others. He’s saying that they’re more valuable, which may mean that that have business insights or other disciplinary knowledge that makes them particularly valuable.
For this post, let’s stick to the programmer productivity claims and explore the 10x argument. There isn’t much reason to believe it, and even if it’s true, it more describes a design goal than a truth about the world of programmers.
Steve McConnell wrote a chapter in Making Software (Oram and Wilson, eds., O’Reilly 2011) arguing that we do have empirical evidence that there is a magnitude (10:1) difference between the best and average programmers. He wrote a blog post that follows up on that chapter, responding to criticism of his earlier chapter. See the post here.
The general finding that "There are order-of-magnitude differences among programmers" has been confirmed by many other studies of professional programmers (Curtis 1981, Mills 1983, DeMarco and Lister 1985, Curtis et al. 1986, Card 1987, Boehm and Papaccio 1988, Valett and McGarry 1989, Boehm et al 2000).
As I reviewed these citations once again in writing this article, I concluded again that they support the general finding that there are 10x productivity differences among programmers. The studies have collectively involved hundreds of professional programmers across a spectrum of programming activities. Specific differences range from about 5:1 to about 25:1, and in my judgment that collectively supports the 10x claim. Moreover, the research finding is consistent with my experience, in which I have personally observed 10x differences (or more) between different programmers. I think one reason the 10x claim resonates with many people is that many other software professionals have observed 10x differences among programmers too.
That last part is important. Many of us have met programmers who are just amazing, who seem at least ten times better than the rest of us. We might not even know what we mean by "better" (smarter? faster? more productive?), but we definitely have a sense of "better." Our everyday perception, though, can be easily swayed. If we could carefully measure "better," we could assess the 10x claim.
Laurent Bossavit critiques the programmer productivity studies in his book The Leprechauns of Software Engineering: How folklore turns into fact and what to do about it (2013, Lean Pub, download the book here). Bossavit spends two chapters exploring the literature on the 10x claim. He points out flaws in many of the studies, including measuring time to completion in one study and number of lines of code generated in another (obviously dissimilar objectives) and comparing programmers in machine language with those using Algol.
Dickey pointed out that the 28:1 ratio was observed because "subject 7 required 170 hours to program the ‘algebra’ program in a batch environment, in machine language (while) subject 3 required 6 hours to program the same problem in JTS (ALGOL) in a time-shared environment."
Sackman shouldn’t have directly compared the best and worst performances in the entire set, argued Dickey, but rather the best and worst among programmers placed by the experimental setup under comparable conditions.
Dickey concludes: "After accounting for the differences in the classes, only a range of 5:1 can be attributed to programmer variability."
Bossavit argues that the studies cited in support of the 10x claim are poorly done, with no replication of any result — an important part of any scientific claim. Part of what makes the studies poor is inconsistency. Just as it’s hard to tell what we mean by "better," there is no clear way to measure programming "productivity." Lines of code? Should it be more lines, or fewer lines that achieve the same objective? Or maybe more correct lines of code? Less time to completion? Less time to debug? He ends his discussion of the 10x claim with this statement:
The 10x claim is "not even wrong", and the question of "how variable is individual programmer productivity" should be dissolved rather than answered.
I agree with his recommendation, but for a different reason.
None of the cited studies controlled for expertise or training. In other words, they might be comparing people with decades of experience to those newly hired. Even those studies that occur in classroom settings do not account for time spent developing expertise in computing outside the classroom (which we know from decades of studies can be sizable — see discussion here).
While there is little sound empirical evidence arguing for the 10x difference, I am not worried that there might be a magnitude of difference between the best and worst programmers. It’s simply the wrong question to ask. The important questions are: How did the best programmers get so good? Can we replicate the process?
In any field, there is a difference between the best and worst practitioners. There is no reason to believe that the differences are somehow "innate," that there are people who are born programmers. (I have made this argument before.) There is every reason to believe that we can design educational opportunities to develop the best programmers! Few of the "best" programmers today got there through intentional educational interventions. Computing educators aim to understand how the "best" got there, and how to provide educational opportunities to get there. Education is a design problem. Computing education is still quite new, and we have a long way to go to catch up. We are just starting to learn how to teach computing and programming better.
From Joel Spolsky’s and Alan Eustace’s perspective, I understand why they want to hire the very best programmers available. But from our perspective as computing educators, that’s the wrong direction. Rather than filter for just the best and send them off to industry, we need to figure out how to help students become the best.
A few objections to a number of otherwise fine points.
First of all, I would claim it is meaningless to compare the best with the worst. There are programmers out there that are so bad that they actually have a negative impact on the projects they are part of, most of us will have met some. In this, programming is probably pretty unique. Outside of direct sabotage, in most other lines of duties, doing an honest effort will help even if there may be significant differences up to the top of the crop. It doesn't really matter whether it is lack of education, experience or that some just will never learn to understand programming well enough, they will none the less provide the bottom points and with their productivity being negative, I have a hard time to see how one could make a meaningful comparison to anybody else.
I would also like to object in part to your criticizing the studies for not controlling for expertise or training. It depends on what the aim is of the study, but a difference in productivity is a difference in productivity, whether it can be remedied or not. Obviously there are many things impacting a particular programmers productivity, including the organization of the work, the size of the program and the tools (such as programming languages) selected to solve the problem. It might not be fair to dismiss an assembler programmer on poor productivity, since his problem might primarily be the tools available to him, but it does not change the effect on the project: if you are using assembler you will take a serious hit to productivity.
All of this is complicated by the fact that controlled measurements of productivity is a very hard problem. If one does a controlled experiment and ask a team to produce a particular application based on some specification and then ask the exact same team to make the same application based on the exact same specification, they will in all probability do not just better but much better the second time. Nothing has changed except the teams understanding of the problem and challenges.
It is not that productivity is that hard to define, one way or another it boils down to how many lines of code a programmer has delivered to the finished product divided by the time it took to deliver the product. Moreover, this number appears to be very low; one study (of which I have lost the reference so take it as anecdotal) claimed this to be 1.5 lines a programmer a day; imagine how sweet programming would be if one could know exactly which 1.5 line to write!
Yes, there are many things that affects the productivity, if you bog down a programmer in meetings, his productivity will dwindle; no matter what the cause is, low productivity remains low productivity, the project will not really care if it is the programmers or somebody elses fault that nothing happens.
I can however wholeheartedly agree with you that it would be silly to give up on *improving* the productivity of programmers, even the average ones. Everybody can't be best, it would be strange if programming would be that different from all other crafts, but everybody can improve, either personally (discipline, experience, insight) or by getting better support (empowerment, languages, tools, organization) and that would relate to everybody, even the top 1%.
Thank you for your comments!
You don't suggest an alternative to comparing the best to the worst. Perhaps we should compare the best to the average programmer? But if we don't know the worst, how can we determine the average? Isn't the average the mid-point between the best and worst?
The literature on measuring programmer productivity shifts between measuring the best to the average and the best to the worst. Makes me wonder how the gap can consistently be "10x" if we're comparing to the middle of the range *or* the bottom of the range. Surely the average isn't the same as the worst.
I don't think that anyone would be surprised to learn that an NBA player is more than 10 times better than the average 50 year old playing pick-up basketball. Rather, we'd be surprised that anyone is even measuring the difference. An NBA player trains for basketball his whole life. Of course, he's better than an amateur. We can make a similar argument for just about any field, including those that matter more to team productivity. I bet that a simultaneous translator of French at the UN is more than 10 times better at French than a high school student in their first semester. In other fields, if you're not good enough to contribute to the team, you go back to school.
We measure productivity of programmers because we don't think that there's anything we can do about it. In any other field, we would simply say, "Train. Learn. Practice. Get better." It's only because computing education is so poorly understood and poorly practiced that we think it makes sense to measure the productivity gap between the best and average/worst programmers.
Displaying all 2 comments