I recently submitted a paper to a conference, and when I got the reviews back, I noticed a reviewer had asked me specifically to take out the experiments (in this case, simulations) that I had presented partly as motivation, and partly to demonstrate my results. The conference was, unsurprisingly, a theory conference; according to the reviewer the experiments were not only unnecessary, they detracted from the paper.
Jeffrey Ullman’s Viewpoint in this issue of Communications observes that many computer science areas are overly focused on experiments as validation for their work. I do not disagree; he raises several great points. My own experience on several program committees for systems conferences, as well as an author for papers for such conferences, is that experiments are, with very rare exception, a de facto requirement. How can you know if it is really a good idea, the thinking goes, unless there are experiments to back it up? Indeed, in some settings, even simulations are called into question; experiments should be done in real, deployed systems, as simulations can oversimplify what goes on in the real world. As Ullman points out, there are gaping problems with this framework. In particular, we run the risk of losing strong ideas because they do not fall into a framework where it is natural to build a system around them.
But when I put on my theory hat, I find similar fervor, except the sign has changed. In many theory conferences, and perhaps most so in the top ones, experiments are rarely seen. While they are not expressly forbidden, anecdotally I am not the only one who has received reviews where the very idea of including experimental results was called into question, even for algorithmic work. In my opinion, this mind-set represents a dramatic change from when I started out doing research, although I believe the shift has occurred slowly and gradually over time, so that people have not really noticed. And I worry that, in the theory community, the pendulum has now swung too far against experiments. In part, I simply believe that many theoretical results can be greatly enhanced by including proper experimentation, both in settings where the work is naturally tied to practice and where it is not. I also worry the theory community is separating itself further from the rest of computer science, to its own detriment.
To be clear, I understand not all theory papers require experiments. For example, almost all papers providing lower bounds would not need them, nor would many purely mathematical papers. But I think they are more useful than many people believe. A short, incomplete list of reasons to have experiments includes the following.
Demonstrating the potential for practice. This first reason to provide experimental results is arguably the important one: to show the potential utility of ideas for practice. I would argue that ideally a significant amount of theoretical work would make its way, directly or indirectly, into computer science applications. Providing experimental results—or, possibly, even code—should ease the path for practitioners. The best running times and the best results need not be expected; those can be improved later, and theoreticians should not be expected to be the top coders. However, even showing the ideas can be implemented and run successfully in reasonable time will encourage system builders to examine them more closely, and more readily lead to their adoption.
Writing about the experiments can help others understand how theoretical results were developed.
Improving real-world computer applications is not the only goal of theoretical computer science (TCS), but it should be a major goal. Otherwise, TCS runs the risk of isolating itself from the rest of the field. Other computer scientists may come to view theoretical work as largely unhelpful or unrelated to their goals, which would seem to relegate TCS to a subfield of mathematics rather than of computing. While some researchers may think that would be fine, I think the vibrant activity of TCS stems in large part from its connection to the real-world problems from the rest of computer science.
Offering verification of one’s work. The conception that experiments help validate one’s work can be uncomfortable for those with a mathematical bent. The proofs of theorems constitute the results, and they supposedly speak for themselves. But, as we all know, proofs can contain subtle errors. I once asked a student to run a simulation for a published paper studying a problem on Markov chains, where the authors proved a result about some value at equilibrium—I do not recall what, but say it was 0.212. … The student’s simulation showed a result of 0.106. … It turns out a factor of 2 had somehow not been canceled somewhere along the way, and this mistake had not been noticed by the authors, and through several reviews. While in context this was not really a serious error, there are many other examples I know of where some simple simulation experiments would have caught problems before publication. And in some cases, the errors were much more serious.
While such computations generally cannot completely verify that a theoretical result is correct, they can demonstrate that the results are not obviously wrong. In many cases, they are relatively simple to set up. As a reader, I appreciate seeing that this check has been performed.
Understanding whether results are tight or not. In many algorithmic papers, while both upper and lower bounds on the running time of algorithm might be presented, the two bounds might not be the same order asymptotically. Where does the true bound lie? Again, experiments might not be able to provide an answer—perhaps the worst case is not seen in natural examples—but they can provide guidance or lead to conjectures on where the true running time lies.
Even if one has the right asymptotics, theoretical analysis is usually very loose in considering the constant factors. In some cases, constant factors from proofs can be so large that it seems the algorithm would never be practical. But even if the analysis is not tight, the algorithm might still be suitable for practice. More generally, there might be a big difference between a constant factor of 10 and a constant factor of 100. An order of magnitude may not matter in the asymptotic analysis of an algorithm, where theorists typically ignore constant factors in the running time, but it can make a significant difference in other contexts. Experiments can help give insight into such performance issues.
Approximation algorithms provide another example of where experiments can provide insight on the tightness of theoretical analysis. For approximation algorithms, the output provides a solution that is close to optimal, and theoretical analysis bounds how far off the answer can be. As with running time, experiments can provide guidance as to how close the theoretical analysis is to the true approximation bound. Of course, experiments can also show how useful the approximation algorithm will be in practical situations.
Spurring insight into further or future problems. Experimental results can push the imagination into new directions. In several recent papers, my co-authors and I executed simulations, and the insights we gained from them led us to improve and refine our theoretical results. In this case, the results helped us before publishing the paper. But it stands to reason that some readers, in seeing experimental results in the final paper, might themselves see something interesting that might lead them to consider an important related or new question. Some people might argue that including such simulations is not needed when writing the paper, as the simulations have done their job, and the theory can and should stand on its own. I disagree. Writing about the experiments can help others understand how the theoretical results were developed, develop experiments of their own when needed, and start exploring extensions or related problems.
Theoreticians inclined toward more practical problems have found some relief by publishing outside of theoretical conferences.
Several of the preceding arguments make the case that experiments in theoretical papers can be useful even when the work is not necessarily motivated by practice. But practical significance is, still, very important. A scientific community ultimately reveals, through its decisions on such things as paper acceptances and awards, how it values empirical results and practical significance (real or potential). I think, today, the TCS community undervalues these, and I do not see this as the best course forward for TCS in the long run.
Theoreticians inclined toward more practical problems have found some relief by publishing outside of theoretical conferences, in venues that appear to be more interested in this kind of work. Conferences such as the World Wide Web conference (WWW), the International Conference of Machine Learning (ICML), and the IEEE Conference on Computer Communications (INFOCOM) take a number of papers that mix theory and experimental results, to great effect. Such conferences can provide a more direct way of connecting theoreticians with interested practitioners. It is not clear this approach is the best solution to the question of how to treat such research, as they reinforce the separation of the larger theory community from our more practical colleagues. But for now these conferences provide outlets for some research that combines theory and practice.
There are also positive signs that many theorists wish to stay connected with practice. Workshops regularly appear that emphasize the utility of empirical work to theoretical computer science and try to bridge the theory-practice divide.a Arguably, however, these workshops cannot do enough on their own to counteract larger trends.
To the extent that various subcommunities harden their views toward experiments, thinking of them as requirements as Ullman describes or potentially as just distractions in more theoretical areas, this rigidity and lack of openness does not seem healthy for computer science. The relatively rapid cross-fertilization of ideas between theory and practice has historically been a great strength of computer science research. Increased specialization as our field matures may make such cross-fertilization more difficult. But the goal of gaining a deeper understanding of computing through a suitable mix of both theoretical understanding and well-designed experiments should be a goal we all work toward together.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment