Computing Applications Kode Vicious

Sanity vs. Invisible Markings

Tabs vs. spaces
  1. Article
  2. Author
programming code surrounded by white space

back to top 

Dear KV,

My team resurrected some old Python code and brought it up to version 3. The process was made worse by the new restriction of not mixing tabs and spaces in the source code. An automatic cleanup that allowed the code to execute by replacing the tabs with spaces caused a lot of havoc with the comments at the ends of lines. Why does anyone make a language in which white space matters this much?

White Out

Dear White,

Ever edited a Makefile? Although there is a long tradition of the significant use of white space in programming languages, all traditions need to change. In Python, many people have taken issue with the choice to have white space—and not braces—to indicate the limits of blocks of code, but since the developers did not change their minds on this with version 3 of Python, I suspect we are all stuck with it for quite a bit longer, and I am quite sure that there will be other languages, big and small, where white space remains significant.

If I could change one thing in the minds of all programming language designers, it would be to impress upon them—forcefully—the idea that anything that is significant to the syntactic or structural meaning of a program must be easily visible to the human reader, as well as easily understood by the systems used by developers.

Let’s deal with that last point first. Making it easy for tools to understand the structure of software is one of the keys to having tools that help programmers prepare proper programs for computers. Since the earliest days of software development, programmers have tried to build tools that show them—before the inevitable edit-compile-test-fail-edit endless loop—where there might be issues in the program text. Code editors have added colorization, syntax highlighting, folding, and a host of other features in a desperate, and some might say fruitless, attempt to improve the productivity of programmers.

When a new language comes along, it is important for these signifiers in the code to be used consistently; otherwise your editor of choice has little or no ability to deploy these helpful hints to improve productivity. Allowing any two symbols to represent the same concept, for example, is a definite no-no. Imagine if you could have two types of braces to delineate blocks of code, just because two different parts of the programming community wanted them, or if there were multiple syntactic ways to dereference a variable. The basic idea is there must be one clear way to do each thing that a language must do, both for human understanding and for the sanity of editor developers. Thus, the use of invisible, or near-invisible, markings in code, especially tabs and spaces, to indicate structure or syntax.

Invisible and near-invisible markings bring us to the human part of the problem—not that code editor authors are not human, but most of us will not write new editors, though all of us will use editors. As we all know, once upon a time computers had small memories and the difference between a tab, which is a single byte, and a corresponding number of spaces (8) could be a significant difference between the size of source code stored on a precious disk, and also transferred, over whatever primitive and slow bus, from storage into memory.

Changing the coding standard from eight spaces to four might improve things, but let’s face it, none of this has mattered for several decades. Now, the only reason for the use of these invisible markings is to clearly represent the scope of a piece of code relative to the pieces of code around it.

In point of fact, it would be better to pick a single character that is not a tab and not a space and not normally used in a program—for example, Unicode code point U+1F4A9—and to use that as the universal indentation character. Editors would then be free to indent code in any consistent way based on the user’s preferences. The user could have any number of blank characters used per indent character—8, 4, 2, some prime number, whatever they like—and programmers could choose their very own personal views of the scope. On disk, this format would cost only one character (two bytes) per indent, and if you wanted to see the indent characters, a common feature of modern editors, you flip a switch, and voila, there they all are. Everyone would be happy, and we would finally have solved the age-old conundrum of tabs vs. spaces.


q stamp of ACM Queue Related articles
on queue.acm.org

File-System Litter
Kode Vicious

A Generation Lost in the Bazaar
Poul-Henning Kamp

Demo Data as Code
Thomas A. Limoncelli

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More