Sign In

Communications of the ACM

[email protected]

Towards Empirical Answers to the Core Problems of Software Engineering

Bertrand Meyer

This article is adapted from the presentation of a panel that I will chair at ESEC/FSE 2013 (  in Saint Petersburg on August 22. The other confirmed panelists are Harald Gall, Mark Harman and Giancarlo Succi. 


For all the books on software engineering, and the articles, and the conferences, a remarkable number of fundamental questions, so fundamental indeed that just about every software project runs into them, remain open. At best we have folksy rules, some possibly true, others doubtful, and others — such as "adding people to a software project delays it further"[1] — wrong to the point of absurdity. Researchers in software engineering should, as their duty to the community of practicing software practitioners, try to help provide credible answers to such essential everyday questions.

The purpose of this panel discussion is to assess what answers are already known through empirical software engineering, and to define what should be done to get more.

"Empirical software engineering" applies the quantitative methods of the natural sciences to the study of software phenomena. One of its tasks is to subject new methods — whose authors sometimes make extravagant and unsupported claims — to objective scrutiny. But the benefits are more general: empirical software engineering helps us understand software construction better.

There are two kinds of target for empirical software studies: products and processes. Product studies assess actual software artifacts, as found in code repositories, bug databases and documentation, to infer general insights. Project studies assess how software projects proceed and how their participants work; as a consequence, they can share some properties with studies in other fields that involve human behavior, such as sociology and psychology. (It is a common attitude among computer scientists to express doubts: "Do you really want to bring us down to the standards of psychology and sociology?" Such arrogance is not justified. These sciences have obtained many results that are both useful and sound.)

Empirical software engineering has been on a roll for the past decade, thanks to the availability of large repositories, mostly from open-source projects, which hold information about long-running software projects and can be subjected to data mining techniques to identify important properties and trends. Such studies have already yielded considerable and often surprising insights about such fundamental matters as the typology of program faults (bugs), the effectiveness of tests and the value of certain programming language features.

Most of the uncontested successes, however, have been from the product variant of empirical software engineering. This situation is understandable: when analyzing a software repository, an empirical study is dealing with a tangible and well-defined artifact; if any of the results seems doubtful, it is possible and sometimes even easy for others to reproduce the study, a key condition of empirical science. With processes, the object of study is more elusive. If I follow a software project working with Scrum and another using a more traditional lifecycle, and find that one does better than the other, how do I know what other factors may have influenced the outcome; and even if I bring external factors under control how do I compare my results with those of another researcher following other teams in other companies? Worse, in a more realistic scenario I do not always have the luxury of tracking actual industry projects since few companies are enlightened enough to let researchers into their developments; how do I know that I can generalize to industry the conclusions of experiments made with student groups?

Such obstacles do not imply that sound results are impossible; studies involving human behavior in psychology and sociology face many of the same difficulties and yet do occasionally yield important and credible insights. But these obstacles explain why there are still few incontrovertible results on process aspects of software engineering. This situation is regrettable since it means that projects large and small embark on specific methods, tools and languages on the basis of hearsay, opinions and sometimes hype rather than solid knowledge.

No empirical study is going to give us all-encompassing results of the form "Agile methods yield better products" or "Object-oriented programming is better than functional programming". We are entitled to expect, however, that they help practitioners assess some of the issues that await every project. They should also provide a perspective on the conventional wisdom, justified or not, that pervades the culture of software engineering. Here are some examples of general statements and questions on which many people in the field have opinions, often reinforced by the literature, but crying for empirical backing:

  • The effect of requirements faults: the famous curve by Boehm is buttressed by very old studies on special kinds of software (large mission-critical defense projects). What do we really lose by not finding an error early enough?
  • The cone of uncertainty: is that idea just folklore?
  • What are the successful techniques for shortening delivery time by adding manpower?
  • The maximum compressibility factor: is there a nominal project delivery time, and how much can a project decrease it by throwing in money and people?
  • Pair programming: when does it help, when does it hurt? If it has any benefits, are there in quality or in productivity (delivery time)?
  • In iterative approaches, what is the ideal time for a sprint under various circumstances?
  • How much requirements analysis should be done at the beginning of a project, and how much deferred to the rest of the cycle?
  • What predictors of size correlate best with observed development effort?
  • What predictors of quality correlate best with observed quality?
  • What is the maximum team size, if any, beyond which a team should be split?
  • Is it better to use built-in contracts or just to code assertions in tests?

When asking these and other similar questions relating to core aspects of practical software development, I sometimes hear "Oh, but we know the answer conclusively, thanks to so-and-so’s study". This may be true in some cases, but in many others one finds, in looking closer, that the study is just one particular experiment, fraught with the same limitations as any other.

The principal aim of the present panel is to find out, through the contributions of the panelists — who are top contributors to empirical engineering, having helped to bring up the field to its current level of success and respect — which questions have useful and credible empirical answers already available, whether or not widely known. The answers must indeed be:

  • Empirical: obtained through objective quantitative studies of projects.
  • Useful: providing answers to questions of interest to practitioners.
  • Credible: while not necessarily absolute (a goal difficult to reach in any matter involving human behavior), they must be backed by enough solid evidence and confirmation to be taken as a serious input to software project decisions.

An auxiliary outcome of the panel should be to identify fundamental questions on which credible, useful empirical answers do not exist but seem possible, providing fuel for researchers in the field.

To mature, software engineering must shed the folkloric advice and anecdotal evidence that still pervade the field and replace them with convincing results, established with all the limitations but also all the respectability of quantitative, scientific empirical methods.

[1] From Brooks’s Mythical Man-Month.




The title is problematic: what is the CORE of software engineering? Is there one that all software theoreticians and practitioners agree on? I doubt it. If software engineering is NOT empirical, how can we expect to get any empirical answers?

It does not help the articles argument when the 1st sentence is missing a or some words. It does not read right. I guess that the word all should be between the words about and software.

The 3rd paragraph begins Empirical software engineering applies the quantitative methods of the natural sciences to the study of software phenomena." One gets the impression that the argument is tripping up on the word engineering. Since before the Industrial Revolution, the notion of engineering deals primarily with mechanical aspects of nature or human constructs. What defines mechanical is simply actions that are reversible. What the phrase software engineering does is to impose a mechanistic illusion of reversibility to software. At the level of bits and bytes, software is perfectly reversible. Zeros can be flipped to ones and back. However, the importance of software today is remote from bits and bytes. This year 2013, marks the 65th anniversary of Shannons Mathematical Theory of Communications. He was careful to point out that for the engineering problem of signaling between two points, the bits he defined were meaningless.

I think both software theoreticians and practitioners agree on at least one thing in software: what we do with software today has meaning. This quality has resisted quantitative methods of engineering because in part, software is all about informations meaning. Until we are ready and willing to accept this conundrum, we will be caught in an infinite loop.

CACM Administrator

Corrected omission of word "every" in first sentence, thanks. -- BM

Michael Erdmann

I guess gathering empirical data is always the first step of making a theory (model) and in the first hand to identify the "thing" your theory is about. So i would say clearly YES ... i guess it is science what we are talking here about.

On the other hand two areas of analysis "artifacts ..." and " ..projects.." matches my daily experience. Creating good code is more a technical; but getting software through a development organisation and placed in the market is more a "governance" issue. I guess for the first part the tools and the interpretation of the data is well understood. For the second part i see management very often failing ... here i am not sure how empiric could help ...


On reading this article I wondered to myself, what about empirical electrical engineering, empirical mechanical engineering, empirical chemical engineering, etc? I simply don't know the answer to this set of questions. Are there or have there been major empirical "movements" in these disciplines? Of course putative results are tested experimentally, but my sense is that such empiricism is a test of a model against nature. This feels very different than comparing the performance of human teams that are matched but for a difference in, say, design method employed. --Kevin Sullivan, University of Virginia

Michael Erdmann

Hallo Anonymous;

you don't make a theory and test it; this is half of the story. In physics e.g. observations (empiric) are motivating hypothesis. Prediction and verification by some community makes it a theory.

But you are right; empirical physics .. never heard about it :-) I share your understanding; using empirical data is an integral part of engineering.
E.g. in mechanical engineering long tables with material constants are used for construction. Such tables i am missing completely in SW engineering. - Michael Erdmann

Michael Erdmann

While reading my previous threads i realized that i am ignoring the objective of this panel; sorry for this;
Regarding "...The principal aim of the present panel is to find out, through the contributions of the panelists.." frankly speaking during my daily work during the last 30 years in the SW engineering i did not came across any person collecting data for some academic studies which i find very strange. Could it be that academia has an issue in getting data from the industry?


View More Comments