I read "Automated Program Repair" with interest (Dec. 2019, p. 56–65). This is exciting technology that, if successful, holds out the promise of substantially improving software quality. While the article highlights systems developed by the first and third authors (GenProg, SemFix, Angelix), it omits quantitative data that can provide a more complete picture of the capabilities of extant program repair systems. My hope is this quantitative data can help researchers and practitioners better understand the capabilities and current limitations of this promising technology.
The most complete evaluation of the GenProg system was reported in Le Goues et al.,1,2 which examines results for a superset of the defects originally considered in Le Goues et al.3 Unfortunately, as reported in Qi et al.7 and communicated to the authors of Le Goues3 in fall of 2014, the experimental setup contains a variety of test harness and test script issues. When these issues are corrected, the results show that Gen-Prog does not fix 55 of 105 bugs, as one might reasonably expect from reading the title of the article. Instead, GenProg fixes only two bugs, highlighting the remarkable ineffectiveness of GenProg as an automatic patch generation system. Moreover, only 69 of the reported 105 bugs are bugs—the remaining 36 are deliberate functionality changes.
I note this ineffectiveness may not be widely recognized—despite being informed of these results in fall of 2014, and despite the publication of Qi,7 at press time, websites maintained by the authors of GenProg still do not reflect the corrections required to accurately represent the capabilities of the GenProg system (for example, see https://squareslab.github.io/genprog-code/).
For comparison, the Prophet system,6 the current state of the art on this benchmark set, generates correct patches for 18 of the 69 defects. But for another 21 defects, Prophet generates incorrect patches that nevertheless validate. This situation requires developers to manually filter the validated patches, with developer evaluation effort and false positives an important concern.
These quantitative results can provide insight into why current commercial automatic patch generation systems such as those discussed in the article focus on specific defect classes such as null dereference defects. Focusing on these classes enables the development of more narrowly tailored techniques that can aspire to fix a larger proportion of the defects with fewer false positives.4,5
In the near term, I think we can expect patch generation systems that focus on specific defect classes to play an increasingly prominent role in maintaining large software systems. Because of the substantial redundancy present in and across most large software systems, as well as the availability of multiple sources of information such as revision histories present in software repositories, I would expect efforts directed at broader classes of defects to pay off in the future. Of course, accurate reporting of relevant results can play an important role in helping the field progress.
Martin Rinard, Cambridge, MA, USA
CS + CS
I read "When Human-Computer Interaction Meets Community Citizen Science" (Feb. 2020, p. 31–34) with interest given my own, multidisciplinary exploration of similar territory. The authors do a nice job of describing the increasingly wide range of citizen science activities. Not only do many leading the expansion of citizen science refer to it as CS, a challenge for those of us who use that term for computer science, but that recent expansion has been occasioned by the launch and growth of online platforms, laying a foundation for the intersection of the two kinds of CS, as is implicit in the article.
I led a small team at RAND that has published two small reports on community citizen science. The Promise of Community Citizen Science9 came out in 2017; Community Citizen Science: From Promise to Action8 came out in 2019. So, while we would like to think we were the ones to introduce the concept, we applaud the work of Yen-Chia Hsu and Illah Nourbakhsh and hope that we can find a way to collaborate.
Marjory S. Blumenthal, Washington, D.C., USA
Editor-in-Chief response
It's great to see excitement and energy in this important area!
Andrew A. Chien, Chicago, IL, USA
Join the Discussion (0)
Become a Member or Sign In to Post a Comment