Academic computer science has an odd relationship with software: Publishing papers about software is considered a distinctly stronger contribution than publishing the software. The historical reasons for this paradox no longer apply, but their legacy remains. This limits researchers who see the open-source software movement as an opportunity to make a scholarly contribution. Expanded definitions of scholarship acknowledge both application and discovery as important components.1 One obstacle remains: evaluation. To raise software to the status of a first-class contribution, we propose "best practices" for the evaluation of the scholarly contribution of open-source software.
Typically, scholars who develop software do not include it as a primary contribution for performance reviews. Instead, they write articles about the software and present the articles as contributions. This conflation of articles and software serves neither medium well. An article describes an original intellectual contribution consisting of an idea, the argument for its importance and correctness, and supporting data. In contrast, software is more often an implementation of prior ideas in a usable form. It bridges the often considerable gap between an idea and the practical application of that idea. The original idea and its implementation represent distinct kinds of contribution.
The critical gap is the perceived incomparability of these two contributions. Lacking a concise description adapted to the traditional practices of performance review committees, software is difficult to evaluate as a scholarly contribution and is often relegated to second-class status. We propose a framework for common assessment based on widely accepted definitions of scholarship. Within this general framework, we consider the material and procedures that a performance review committee uses to evaluate a publication. We then describe how software can be summarized in a compatible form of bibliographic citation and supplementary material.
An Expanded Definition of Scholarship
Boyer1 proposed a model of scholarship with four components: discovery, integration, application, and teaching. Scholarship of discovery is the pursuit of knowledge for its own sake. Scholarship of integration works to overcome the isolation and fragmentation of academic disciplines. Scholarship of application applies new knowledge to solve consequential problems in society at large. Scholarship of teaching and learning is the principled study of teaching and learning. In this expanded definition of scholarship, the application of knowledge is a partner to discovery, not a servant. Similar taxonomies have been proposed within computer science by Tsichritzis6 and Denning.2
The current standard for evaluation of scholarship, which emphasises publication of papers describing original research, effectively limits itself to acknowledging scholarship of discovery. This predominance of research is a relatively recent phenomenon. The transition from an emphasis on teaching and application of knowledge to an emphasis on research began around 1945 and was effectively complete by the late 1980s. To quote Glassick, et al.,3 "The academy… gave short shrift to the application of knowledge, despite the country’s increasing need for expert advice to cope with growing social, economic, technological, and environmental problems. Many colleges and universities have been loath to bestow academic rewards on faculty members who concentrate on applying knowledge instead of discovering it."
When Boyer and his collaborators presented their expanded notion of scholarship to faculty, university administrators, and professional bodies, they found broad support. They also found a common barrier to its adoption: All groups identified a need for objective standards, like those applied to scholarship of discovery, which could be applied to the evaluation of scholarship of integration, application, or teaching. Nonetheless, the development of objective standards for evaluating an individual’s contributions across the full range of scholarship has languished.
The need for such standards has been recognised in computing. In a report on broadening the CS research agenda, a U.S. National Research Council committee specifically recommended developing mechanisms for evaluating interdisciplinary and applications-oriented work.4 This recommendation may have its greatest impact through criteria for open-source software, because such software can fill gaps in the current research process, accelerating the diffusion of results from the scholarship of discovery. Where a publication can only present knowledge passively, software is the instantiation of discovery in useable form. This availability is particularly relevant at the interfaces of computer science and other disciplines.5
Scholarly Contributions in Open-Source Software
There are many definitions of open-source software, but for the purposes of this article we use the criterion that the source code is freely available to any who wish to use it or learn from it or review it. Free access to code is necessary for meeting the gold standard of scientific reporting: A study’s tools and methods must be described clearly enough for its results to be replicated. Only the original code provides enough detail for replication. Pseudocode of an algorithm is insufficient. Software that does not meet the minimal standard of freely available code should not be considered a scholarly contribution.
What types of scholarship are embodied in a work of open-source software? In general, open-source software emphasizes scholarship of integration, application, and teaching and learning, and deemphasizes scholarship of discovery. These other forms of scholarship require distinct skills and additional effort. Translating an algorithm, typically developed under the scholarship of discovery and disseminated as a publication, into a useful working program includes activities such as: factoring the system into coherent parts, crafting detailed data structures, handling exceptional cases and real-world pathologies, defining a user interface, developing import/export capabilities and other tools for interoperation, porting the system to multiple operating systems and environments, carefully testing individual units and the whole system, and documenting the system. These skills and efforts deserve recognition for their distinct contribution.
Here, we provide some illustrative examples locating open source software within the other scholarships. These examples presuppose that the software in question is well written. Badly written software is no better than a badly written paper.
Some aspects of open-source software are relatively easy to associate with particular types of scholarship. Scholarship of application is obvious: software is, almost by definition, the application of an implementation of one or more algorithms to solve some specific problem. Particularly in its application to problems of commercial interest, opensource software is an efficient means of technology transfer. In general, where the software attacks a problem of significance to society, there is a component of scholarship of application.
Software is an ideal environment for practicing the scholarship of integration. Programs that solve interesting problems often consist of many algorithms, drawn from many disciplines within and outside of computer science. Such programs are often a collaborative effort of many developers with varying backgrounds.
Open-source software can also serve the scholarship of teaching and learning, by offering beginning software developers extended, real-life examples of the craft of software creation, as practiced by experts in the field. One can learn about the techniques of formal proof construction by reading general descriptions, but skilled use of those techniques is learned by reading and creating proofs. Similarly, one can learn about techniques for creating good software by reading software engineering texts, but the skilled use of those techniques is learned by reading and creating software.
Some aspects of open-source software are not scholarly contributions, but instead deserve recognition as service to the community. The maintainers of a software repository should receive credit akin to that accorded to the editorial board of a journal.
Finally, simply making source code available does not, in and of itself, constitute a scholarly contribution. Posting unmaintained, undocumented proof-of-concept code to a personal Web site is not a scholarly contribution. At best, it may supplement some other contribution. Contributing documented, robust, portable software to an established repository, where it can be easily found and where a prospective user has some expectation of long term availability is a scholarly contribution.
Explicit and Implicit Statements of Contribution
When evaluating submissions for a performance review, committee members assess publications based on a terse summary, the standard bibliographic entry.
What information is extracted from a typical bibliographic entry? There is some explicit information. The title indicates the subject of the contribution, and the number of authors allows a rough estimate of an individual’s contribution. The page count indirectly specifies the contribution size. The classification of the publication venue (reviewed/unreviewed, journal, monograph, and self-published) provides an estimate of the exclusivity of the venue. The publication date gives an estimate of the recency of the contribution. Supplementary data may explicitly document exclusivity (such as acceptance ratio) or impact (such as citations).
From this explicit data, a committee member can infer additional information, with a degree of confidence dependent on their knowledge of the subject area. A committee member familiar with the area can infer the likely significance and difficulty of the contribution from the title. A member actively working in the area may recognise the names of the authors, and infer the relative contribution of the individual under review. Similarly, knowledge of the area allows one to infer considerable information from the publication venue the editorial emphasis, reviewing criteria, intended audience, exclusivity, probable extent of the contribution, possible impact of the contribution, and other items. Unless the title or venue explicitly indicates otherwise (a survey paper, for example), the contribution will be assumed to fall into the category of scholarship of discovery.
"Likely," "probable," and "possible" are important qualifiers. For a typical short-term (annual or biennial) review, the details of individual contributions are typically not assessed; rather, a committee member forms a (more or less) informed estimate of the worth of a contribution based on their knowledge of the area, authors, and publication venue. In most cases, the individual under review can count on at least one (and usually two or three) committee members with sufficient expertise to make a reasonably informed judgement and persuade the remaining members of its validity.
Bibliographic Format for an Open Source Contribution
Because the bulk of the information communicated by a bibliographic description of a publication is implicit, committee members’ level of comfort declines dramatically when the form of the contribution is unfamiliar. In particular they are not comfortable with an equally cryptic summary of an open-source software contribution. Consider a bibliographic format for open-source software containing a title, author, publication venue, size, publication date, and a short note.
L. Hafer, dylp LP code. Coin-OR, http://www.coin-or.org, 44,000 lines source and embedded documentation, 88 pp. documentation, November, 2005. Coin-OR classification level: 4.
The publication venue will typically specify an organisation and a URL; line count takes the place of page count. The information provided is directly analogous to the information provided for a standard publication. It should offer the same explicit and implicit information to a knowledgeable committee member.
However, open-source software is in its infancy when compared to traditional publication, so there is only a limited record for many open-source venues. Only a subset of researchers in computer science actually create software, and only a subset of those professionally participate in open-source projects. An open-source contributor cannot assume that any members of a reviewing committee will have the background knowledge necessary for an informed evaluation from a three-line bibliographic entry for the contribution. Furthermore, centuries of convention circumscribe bibliographic descriptions of publications; there are no equivalent conventions for software. For unfamiliar forms of contribution, more information must be provided.
Supplementing the Bibliographic Entry
Just as with standard publication, the bulk of the information in the example citation is implicit, supplied by the reader’s inside knowledge. For the benefit of the reviewing committee, this information must be made explicit by providing material to supplement the bibliographic entry. We suggest the following.
The individual submitting an open-source contribution should clearly state the type of scholarship (discovery, integration, application, or teaching and learning) embodied in the contribution to set a context for the evaluation. The example represents scholarship of integration, bridging the considerable gap between the theoretical description of the algorithm and an implementation using the techniques expected in modern simplex codes.
The publication venue may or may not provide implicit information about the quality of a contribution. The open-source hosting site SourceForge, which imposes minimal conditions for acceptance of a project, offers very little implicit information. On the other hand, the Apache Foundation sets a high bar: projects must be approved at several levels, and code contributions require approval through a vote of senior developers. An individual claiming an open-source contribution should explicitly state any acceptance criteria of the hosting site. Coin-OR sets a moderate bar for acceptance. Once accepted, a project is assigned a classification on a scale of 1-5 using a number of the factors mentioned in the following paragraphs; the classification is subject to review over the project’s life cycle. To reiterate a point made earlier, the classification encodes a wealth of information for a reviewer familiar with Coin-OR, but most members of a review committee will require an explanation.
Open-source software differs from traditional publication in its requirement for ongoing development and maintenance. If releases are used as evidence of a significant new contribution, the release policy should be explicitly stated and the contribution summarized.
Other supplementary information should address software process and project maturity, as well as indicators of content and impact. Most of them will need to be explicitly stated, and only rarely will all of them (or even a large fraction) be available and applicable for a single contribution. To evaluate project and process maturity:
- How many active developers work on the project, and what is the level of interaction and review among developers?
- What is the level of software process maturity of the project? What is the expected level of portability across hardware and software platforms? Does the project use a standard configuration and build system? A version control system? Defined releases? Regression testing? A procedure for bug tracking?
- Is there user documentation? Developer documentation?
- What are the expectations for maintenance and enhancement?
To evaluate content (absolute and in the context of the enclosing project):
- What is the size of the contribution, in some widely-used software complexity metric? Where the contribution is inseparable from a larger project, what is the percentage of the larger project affected by the contribution?
- Does the individual claiming the contribution receive an acknowledgement of authorship in the source code?
- What is the contribution? New functionality? Enhancements to existing functionality? Refactoring?
- How much of the code directly implements the (theoretical) algorithm? How much more is required to achieve a usable implementation? How much more to achieve a high quality, robust implementation?
To evaluate usage and significance of a contribution:
- What are the download statistics for the project? If separable from the enclosing project, what are the download statistics for the specific contribution?
- Are there identifiable instances where the contribution has been included in other projects, or cited as an influence?
- Is the software available through third-party sites? Is the software linked to by third-party sites?
- Are there third-party evaluations of the contribution with respect to alternative software?
The Review Process
The most important implicit indicator of the presence of a scholarly contribution in a publication is that it has undergone a formal review before acceptance for publication. Open-source software has no equivalent of the formal review process used for standard publications, nor is such a framework likely to appear in the near future. What, then, is reasonable to expect?
Where quasi-formal review exists, it is typically framed as the development of a consensus among developers that a contribution has sufficient utility and technical quality for inclusion in the main body of the project’s code. This review can be bruising but it is not strictly at arm’s-length.
There is, however, a rigorous informal review process by arm’s-length reviewers. An open-source software contribution is subject to relentless and pervasive review for relevance and correctness by a diverse user community. A "bad review" for relevance means that the software goes nowhere, never establishing a community of users large enough to produce the critical mass of developers required to maintain it. If the software passes the relevance review, it will be subject to continuous review for correctness by the same user community.
The two types of review described above do not capture one important aspect of traditional publication review, the informed assessment of the novelty and significance of the contribution by individuals with expertise in the field of the contribution. This is critical for the scholarship of discovery. We assert that the review processes for open-source software focus instead on the criteria most important to the scholarships of integration and application: the quality, correctness, and relevance of the contribution. By definition, in these scholarships absolute novelty is of lesser importance than the skill and insight required to bridge the gap between abstract algorithm and usable software.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment