Computer science is both a science and an art. Its scientific aspects range from the theory of computation and algorithmic studies to code design and program architecture. Yet, when it comes time for implementation, there is a combination of artistic flare, nuanced style, and technical prowess that separates good code from great code.
Like art, code is simultaneously subjective and non-subjective. The non-subjective aspects of coding include "hard" ideas that must be followed to create good code: design patterns, project structures, the use of common libraries, and so on. Although these concepts lay the foundation for developing high-quality, maintainable code, it is the nuances of a programmer's technique and toolsalignment, naming, use of white space, use of context, syntax highlighting, and IDE choicethat truly make code clear, maintainable, and understandable, while also giving code the ability to clearly communicate intent, function, and usage.
This separation between good and great code occurs because every person has an affinity for his or her own particular coding style based on his or her own good (or bad) habits and preferences. Anyone can write code within a design pattern or using certain "hard" techniques, but it takes a great programmer to fill in the details of the code in way that is clear, concise, and understandable. This is important because just as every person may draw a unique meaning or experience from a single piece of artwork, every developer or reader of code may infer different meanings from the code depending on naming and other conventions, despite the architecture and design of the code.
From another angle, programming may also be seen as a form of "encryption." In various ways the programmer devises a solution to a problem and then encrypts the solution in terms of a program and its support files. Months or years later, when a change is called for, a new programmer must decrypt the solution. This is usually not an enviable task, which can mainly be blamed on a failure of clear communication during the initial "encryption" of the project. Decrypting information is simple when the necessary key is present. So, too, is understanding old code when special attention has been paid to what the code itself communicates.
To address this issue, some works have defined a single coding standard for an entire programming language,7 while others have acquiesced to accepting naming conventions as long as they are consistent.6 Beautiful code has been defined in general terms as readable, focused, testable, and elegant.1 The more extreme case is the invention of an entire programming language built around a concrete set of ideals, such as Ruby or Python. Ruby emphasizes brevity, simplicity, flexibility, and balance.4 The principles behind Python are clear in the Zen of Python,5 where the focus lies on beauty, simplicity, readability, and reliability.
Our approach to this issue has been to develop a system of coding guidelines (available online3). While these guidelines come from an educational environment, they are designed to be useful to practitioners as well. The guidelines are based on a few broad principles that capture some fundamental principles of communication and elevate the notion of coding conventions to a higher level. The use of these conventions will also improve the sustainability of a code base. This article looks at these underlying principles.
One area not considered here is the use of syntax highlighting or IDEs. While either one may make code more readable (because of syntax highlighting or code folding, among others) and easier to manage (for example, quickly looking up or refactoring functions and/or variables), our guidelines have been developed to be IDE and color neutral. They are meant to reflect foundational principles that are important when writing code in any setting. Also, while IDEs can help improve readability and understanding in some ways, the features found in these tools are not standard (consider the different features found in Visual Studio, Eclipse, and VIM, for example). Likewise, syntax highlighting varies greatly among environments and may easily be changed to match personal preference. The goal of the following principles is to build a foundation for good programming that is independent of the programming IDE.
In a recent ACM Queue article, Poul-Henning Kamp2 makes the fascinating point that much of the style of programming languages stems from the ASCII character set and typewriter-based terminals. Programming languages make no use of the graphical properties and options of modern devices. While code must be written with the clarity of good English grammar, it is not English text. Instead it is more like math and tables.
This is a far-reaching principle. First, it speaks directly to the use of fonts. Do not use a variable-width (proportional) font for program code, as code is not text. Fixed-width fonts (for example, Courier and Data Gothic) look appealing and allow easy alignment of code. Proportional (variable-width) fonts prevent proper alignment, and even more importantly, do not "look like" code.
While one should continue to think of a program as a sequence of actions or as an algorithm at a high level, each section of code should also be thought of as a presentation of a chart, table, or menu. In figures 1, 2, and 3 notice the use of vertical alignment to show symmetry. This is a powerful method of communication.
In the case when a long line of code spills into multiple lines, we suggest breaking and realigning the code.a For example, instead of
A programmer creates a name for something with full knowledge of its use, and often many names make sense when one knows what the name represents. Thus, the programmer has this problem: creating a name based on a concept. The true challenge, however, is precisely the opposite: inferring the concept based on the name! This is the problem that the program reader has.
Consider the simple name
taken from the common C++ header file
<iostream.h>. An inexperienced or unfamiliar programmer may suddenly be mentally barraged with a bout of questions such as: Is it an integer? A pointer? An array or a structure? A method or a variable? Does sp stand for saved pointer? Is sput an operation to be done n times? Do you pronounce it sputn or s-putn or
sput-n or s-put-n?
We advocate basing names on conventional English usagein particular, simple, informal, abbreviated English usage. Consider the following more specific guidelines:
Some examples of this broad principle are shown in Figure 4.
There is an interesting but small issue when considering examples such as:
countFiles is a good name, it is not an optimal name since it is a verb. Verbs should be reserved for procedure calls that have an effect on variables. For functions that have no side effects on variables, use a noun or noun phrase. One does not usually say
We suggest that
is a slight improvement. More importantly, this enforces the general rule that verbs denote procedures, and nouns or adjectives denote functions.
All other things being equal, shorter programs are always better. As an example, local variables that are used as index variables may be named
i, j, k, and so on. An array index used on every line of a loop need not be named any more elaborately than
elementNumber obscures the details of the computation through excessive description. A variable that is rarely used may deserve a long name: for example,
MaxPhysicalAddr. When variable names are long, especially if there are many of them, it quickly becomes difficult to see what's going on. A variable name can often be shortened by relying on the context in which it is used. For example, the variable
Store in a stack implementation rather than
Major variables (objects) that are used frequently should be especially short, as seen in the examples in Figure 5. For major variables that are used throughout the program, a single letter may encourage program clarity.
While written and spoken communication may reach a high level of clarity, it is often left wanting of meaning if not accompanied by the personal touch of nonverbal cues and tendencies. An individual's body language helps clarify the spoken word. In a similar sense, the programmer relies on white spacewhat is not said directlyin the code to communicate logic, intent, and understanding.
An example is the use of blank lines between conceptually different sections of code. Blank lines should improve readability as they separate logically different segments of the code and thus provide the literary equivalent of a section break. Appropriate places to use blank lines include:
Consider the code listing in Figure 6. Individual blank spaces should also be used to show the logical structure within a single statement. Strategic blank spaces within a line simplify the parsing done by the human reader. At a minimum, blank spaces should be included after the commas in argument lists and around the assignment operator
"=" and the redirection operators
On the other hand, blank spaces should not be used for unary operators such as unary minus
(), address of
(*), member access
(++), and decrement
Also, if it makes sense, put two to three statements on one line. This practice has the effect of simplifying the code, but it must be used with discretion and only where it is sensible to do so.
The case statement used in Figure 1 brings up a general point: very simple decision statement structures can be tersely presented, showing the alternative code simply, and, if possible, without braces, as in the example in Figure 7.
It is not uncommon for simple conditions to be mutually exclusive, creating a kind of generalized case statement. This, as is common practice, can be printed as a chain, as in Figure 8.
Of course, it may be that the structures are truly nested, and then one must use either nested spacing or functions to indicate the alternatives. Again, the general point is to let the structure drive the layout, not the syntax of the programming language.
In the brace wars, we do not take a strong stand on the various preferences shown in Figure 9, but we do feel strongly that the indent is vital, as it is the indent that shows the structure.
The ability to communicate clearly is an issue that is faced in all facets of the human experience. Programmers must achieve a level of clarity, continuity, and beauty when writing code. This means focusing on the code and its clarity, balance, and symmetry, not on its length or comments. While this concept does not advocate the removal of comments or negate their use and importance in appropriate situations, it does suggest that programmers must use comments wisely and judiciously. The focus should be on developing code that, for the most part, clearly communicates intent and functionality. This practice will automatically reduce the need for many comments.
Although the guidelines presented here are used in an educational setting, they also have merit in industrial environments. Students who are educated using these guidelines will most likely use them (or some variant) as they enter industry. To demonstrate this, we have developed an example that applies these guidelines to two very different styles. The first is the Unix style. It is terse, often making use of vowel deletion, and is often found in realistic applications such as operating-system code. This is not to imply that all or most system programmers use this style, only that it is not unusual. Figure 10 shows a small example of this style.
We call the second style the textbook style, as illustrated in Figure 11. Again, this in no way means to imply that all or most textbooks use this style, only that the style in the example is not unusual. In this style the focus is on learning. This means that there is frequent commenting, and the code is well spread out. For the purposes of learning and understanding the details of a language, this style can be excellent. From a practical perspective or for any program of some scale, this style does not work well as it can be overwhelming to use or to read. Moreover, this style makes it difficult to see the overall design, as if one is stuck under the trees and cannot see the forest around.
Figure 12 is a rework of the function in figures 10 and 11, using the guidelines discussed here to make a smooth transition between academic and practical code. This figure shows a balance of both styles, relying more directly on the code itself to communicate intent and functionality clearly. Compared with the textbook style, the resultant code is shorter and more compact while still clearly communicating meaning, intent, and functionality. When compared with the Unix style, the code is slightly longer, but the meaning, intent, and functionality are clearer than the original code.
Figure 13 illustrates the guidelines presented here in another setting. This is a function taken from a complex program (10,000 lines) related to power-system reliability and energy use regarding PHEVs (plugin hybrid electric vehicles). The program makes numerous calculations related to the effect that such vehicles will have on the current power grid and the effect on generation and transmission systems. This program attempts to evaluate the reliability of power systems by developing a model for reliability evaluation using a Monte Carlo simulation.
While the previous examples show the merit of the guidelines presented here, one argument against such guidelines is that making changes to keep a certain coding style intact is time consuming, particularly when a version-control system is used. In the face of a time-sensitive project or a project that most likely will not be updated or maintained in the future, the effort may not be worthwhile. Typical cases include class projects, a Ph.D. thesis, or a temporary application.
If, however, the codebase in question has a long lifespan or will be updated and maintained by others (for example, an operating system, server, interactive Web site, or other useful application), then almost any changes to improve readability are important, and the time should be taken to ensure the readability and maintainability of the code. This should be a matter of pride, as well as an essential function of one's job.
The authors would like to thank David Marcus and Poul-Henning Kemp for their insightful comments while completing this work, as well as the software engineering students who have contributed to these guidelines over the years.
Beautiful Code Exists, if You Know Where to Look
Software Development with Code Maps
Robert DeLine, Gina Venolia and Kael Rowan
Reading, Writing, and Code
1. Heusser, M. Beautiful code. Dr. Dobb's (Aug. 2005); http://www.ddj.com/184407802.
2. Kamp, P-H. Sir, please step away from the ASR-33! ACM Queue 8, 10 (2010); http://queue.acm.org/detail.cfm?id=1871406.
3. Ledgard, H. Professional coding guidelines. 2011 Unpublished report, University of Toledo; http://www.eng.utoledo.edu/eecs/faculty_web/hledgard/softe/upload/.
4. Molina, M. What makes code beautiful. Ruby Hoedown, 2007.
5. Peters, T. The Zen of Python. PEP (Python Enhancement Proposals). Aug. 20, 2004; http://www.python.org/dev/peps/pep-0020/.
6. Reed, D. Sometimes style really does matter. J. Computing Sciences in Colleges 25, 5 (2010), 180187.
7. Sun Developer Network. Code conventions for the Java programming language, 1999; http://java.sun.com/docs/codeconv/.
Figure 1. Use of vertical alignment to show symmetry.
Figure 2. Example of cluttered presentation.
Figure 4. Examples of basing names on conventional English usage.
Figure 5. Keeping names short and simple.
Figure 6. Example of code that uses white space well.
Figure 7. Decision statement structure, tersely presented.
Figure 8. Case statement presented as a chain.
Figure 9. Examples of K&R, ANSI, and Whitesmiths coding styles.
Figure 10. Example of a systems-programming coding style.
Figure 11. Example of a textbook coding style.
Figure 12. Example of a coding style using the guidelines presented here.
Figure 13. Realistic and complex example of code following the guidelines presented here.
©2011 ACM 0001-0782/11/1200 $10.00
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.
Hi, you are not taking your twitter communication serious, are you?
The article is extremely inconsistent with regard to the examples.
The guidelines described here are NOT applied to the examples.
A very good example is Fig. 13: http://deliveryimages.acm.org/10.1145/2050000/2043191/figs/f13.jpg
I hope it is just a technical problem like tabs vs. spaces and an editing process that is not used to handling source code.
Please consider fixing this article, it is just embarrassing.
Thanks for the comments and reading of the paper! There are some errors in the examples, and we appreciate you bringing this to our attention. We will see if we can get them corrected.
--Robert C. Green
The errors in the figures were very distracting, e.g., Fig. 1 was a disaster:
-- several duplicate case labels
-- "case default" is invalid syntax
-- System.out.printIn is not defined
Whoever wrote this gets an "F".
All writers in Communications should compile their examples before submitting -- it's an old rule that still holds.
Thanks for your article!
I would have liked to see how this relates to Knuth's Literate Programming... Any ideas on that?
While Im not very familiar with literate programming, I believe there is definitely some overlap with what we have presented. For example, consider this excerpt from Knuths Literate Programming:
"The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible"
This is a statement that could have easily been included in this paper and clearly works well with what we have presented. This is particularly true regarding the guideline Let Simple English be Your Guide.
One difference appears to be our guideline of Focus on the Code, Not the Comments. Literate programming seems to be focused deeply on comments and documentation instead of letting the code speak for itself through its design.
Again, this is a cursory comparison and it would be interesting to perform a deeper and more through comparison with our guidelines.
Thanks a lot for your inspiring article. Every developer should strive for better code quality and readable variable/class/method names. My gold standard when it comes to code quality is "Clean Code: A Handbook of Agile Software Craftsmanship (Robert C. Martin)". There are a number of differences to your suggestions, for example Robert C. Martin does not recommend the horizontal alignment of code, because he thinks the real problem is the length of the lists - and I agree with that, although I used to align my code for years (it's still useful with certain programming languages or assembler code, when you can't avoid long lists).
The following letter was published in the Letters to the Editor of the April 2012 CACM (http://cacm.acm.org/magazines/2012/4/147353).
As an admirer of the "artistic flare, nuanced style, and technical prowess that separates good code from great code" explored by Robert Green and Henry Ledgard in their article "Coding Guidelines: Finding the Art in the Science" (Dec. 2011), I was disappointed by the authors' emphasis on "alignment, naming, use of white space, use of context, syntax highlighting, and IDE choice." As effective as these aspects of beautiful code may be, they are at best only skin deep.
Beauty may indeed be in the eye of the beholder, but there is a more compelling beauty in the deeper semantic properties of code than layout and naming. I also include judicious use of abstraction, deftly balancing precision and generality; elegant structuring of class hierarchies, carefully trading between breadth and depth; artful ordering of parameter lists, neatly supporting common cases of partial application; and efficient reuse of library code, leveraging existing definitions with minimum effort. These are subjective characteristics, beyond the reach of objective scientific analysismatters of taste not of factso represent aspects of the art rather than the science of software.
Formalizing such semantic properties is more difficult than establishing uniform coding conventions; we programmers spend our professional lifetimes honing our writing skills, not unlike novelists and journalists. Indeed, the great American essayist Ralph Waldo Emerson (18031882) anticipated the art in the science of software like this: "We ascribe beauty to that which is simple; which has no superfluous parts; which exactly answers its end; which stands related to all things; which is the mean of many extremes." It is to this standard I aspire.
The following letter was published in the Letters to the Editor of the April 2012 CACM (http://cacm.acm.org/magazines/2012/4/147353).
Along with the solid rules of programming laid out by Robert Green and Henry Ledgard (Dec. 2011), I add: Since programs are meant to be read, they should also be spell-checked, with each name parsed into separate words that are checked against a dictionary, and common shortenings and abbreviations (such as "num" for "number" and "EL" for "estimated lifetime") included in the dictionary to help standardize ways of expressing names and making programs more readable.
The exception to this spelling rule, as suggested in the article, is the locality of reference, such that index variables like "I" do not have to be spelled out when used locally within a loop. However, newer program constructs (such as for _ each) would mostly eliminate the need for such variables. Meanwhile, parameter names should have fuller names, so their intent could be determined by reading the header without also having to refer to the implementation.
Moreover, the code in the article's Figure 13 (an example of the kind of code covered in the article) could have been broken into multiple routines at the points each comment was inserted. This would have separated the flow from the details and made the code easier to understand. This way, each of the smaller functions would have been simpler to test, with testability a proven indicator of code quality.
The following letter was published in the Letters to the Editor of the March 2012 CACM (http://cacm.acm.org/magazines/2012/3/146236).
In the same way cultural myths like the Bhagavat Gita, Exodus, and the Ramayan have been told and retold down through the ages, the article "Coding Guidelines: Finding the Art in Science" by Robert Green and Henry Ledgard (Dec. 2011) should likewise be retold to help IT managers, as well as corporate management, understand the protocols of the cult of programming. However, it generally covered only the context of writing educational, or textbook, and research coding rather than enterprise coding. Along with universities, research-based coding is also done in the enterprise, following a particular protocol or pattern, appreciating that people are not the same from generation to generation even in large organizations.
Writing enterprise applications, programmers sometimes write code as prescribed in textbooks (see Figure 11 in the article), but the article made no explicit mention of Pre or Post conditions unless necessary. Doing so would help make the program understandable to newcomers in the organization.
Keeping names short and simple might also be encouraged, though not always. If a name is clear enough to show the result(s) the function or variable is meant to provide, there is no harm using long names, provided they are readable. For example, instead of isstateavailableforpolicyissue, the organization might encourage isStateAvailableForPolicyIssue (in Java) or is_state_available_for_policy_issue (in Python). Compilers don't mind long names; neither do humans, when they can read them.
In the same way understanding code is important when it is read, understanding a program's behavior during its execution is also important. Debug statements are therefore essential for good programming. Poorly handled code without debug statements in production systems has cost many enterprises significant time and money over the years. An article like this would do well to include guidance as to how to write good (meaningful) debug statements as needed.
San Antonio, TX
The following letter was published in the Letters to the Editor of the March 2012 CACM (http://cacm.acm.org/magazines/2012/3/146236).
The article by Robert Green and Henry Ledgard (Dec. 2011) was superb. In industry, too little attention generally goes toward instilling commonsense programming habits that promote program readability and maintainability, two cornerstones of quality.
Additionally, evidence suggests that mixing upper- and lowercase contributes to loss of intelligibility, possibly because English uppercase letters tend to share same-size blocky appearance. Consistent use of lowercase may be preferable; for example, "counter" may be preferable to "Counter." There is a tendency to use uppercase with abbreviations or compound names, as in, say, "CustomerAddress," but some programming languages distinguish among "CustomerAddress," "Customeraddress," and "customerAddress," so ensuring consistency with multiple program components or multiple programmers can be a challenge. If a programming language allows certain punctuation marks in names, then "customer_address" might be the optimal name.
It is also easy to construct programs that read other programs and return metrics about programming style. White space, uppercase, vertical alignment, and other program characteristics can all be quantified. Though there may be no absolute bounds separating good from bad code, an organization's programmers can benefit from all this information if presented properly.
Displaying all 10 comments