Typing Without Typing

There is a considerable literature on automatic type inference, particularly for functional programming languages starting with the work of Hindley and Milner. In looking at the practice of object-oriented development in a statically typed programming language, my colleagues and I came to the conclusion that type inference is not a substitute for static typing but a complement, and belongs in the programming environment rather than in the programming language. The YouTube video given as reference illustrates the integration of the concept in a development environment, but let me first explain what it is all about. (See also the Appendix to this article, which defines static typing and auxiliary concepts precisely.)

Programming languages, if intended for the development of serious software, should be statically typed. Surprisingly, some resistance to this concept persists, and one can even see proposals for new untyped languages. The advantages of static typing are, however, overwhelming: reliability, through the guaranteed avoidance of an entire class of execution-time failures (type mismatches); and easier debugging and maintenance, through the greater clarity that explicit declarations bring to program texts.

In a statically typed programming language, programmers must indeed specify a type for every entity (variable, symbolic constant, routine). This requirement is easy to justify by the goal of program clarity, and also by the helps it offers to programmers: having to declare a type for every entity encourages you to focus your thinking and to make sure you know what you are doing. In some cases, however, it can become a burden, particularly for:

Trivial declarations: you may need a local variable, for example an integer, for the temporary benefit of some computation, and may resent having to declare it as i: INTEGER. Not a big deal for a couple of variables, but develops into a nuisance if it happens all the time in the daily practice of writing programs.
Complex types: an example arises in Eiffel for programs that make a heavy use of "agents". The agent mechanism introduces into this object-oriented language the equivalent of a functional sub-language, allowing programmers to define objects that represent routines and, more generally, higher-order functionals as in a functional language. The declarations can, however, be tedious to write.

Here is an example of using agents and declaring their types. If you have a procedure (method)

800×600

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 place_call (source, destination: SUBSCRIBER; priority: INTEGER)

you may want to pass it as a functional object, for example to produce a UI where a call to place_call will take place whenever the program’s user clicks a certain button. To achieve this result it suffices to turn the procedure into an agent:

800×600

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 call_button.left_click_set_action (agent place_call)

800×600 The procedure set_action, which ensures that an event of a given type such as left_click, occurring on a given graphical element such as call_button, triggers a given operation such as place_call, must declare its argument as denoting a procedure with the signature illustrated by place_call: two arguments of type SUBSCRIBER and one argument of type INTEGER. This gives the declaration
800×600

set_action (operation: PROCEDURE [ANY,
TUPLE [SUBSCRIBER, SUBSCRIBER, INTEGER])

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 The TUPLE in PROCEDURE types makes it possible to describe procedures with an arbitrary signature (number and type of arguments). Although the example remains simple, writing it is already a bit unwieldy. The nuisance level increases with higher-level functionals: an agent representing a procedure that takes another procedure as argument will be of a type such as PROCEDURE [ANY, TUPLE [PROCEDURE [ANY, TUPLE […]], …].

This is the kind of situation where automatic type inference can become useful. Renouncing static typing would make no sense: if anything, such sophisticated uses require more static typing than simpler mechanisms because they can badly go wrong, and the type system will catch errors. In addition, the full declarations as illustrated above are useful as documentation when you read the program: you want to know what kind of procedure set_action expects. But spelling everything out can be bothersome when you write the program, if the context would suffice to determine the types.

Type inference is the process of automatically deducing types that the programmer has not explicitly specified. Type inference is often proposed as a replacement for making the language explicitly typed, and hence requiring programmers to include declarations, but it is more productive to treat is as a complement to this approach. After all, programmers often view variable declaration as a benefit, not a burden, since they help them clarify their thinking. But they may need support in writing them, especially in the two cases mentioned: trivial declarations of local variables, and complex types as needed for higher-level functionals.

This analysis suggests that type inference should leave the programming language and its type system alone, and instead find its place in the development environment (IDE). It has led to the automated type fixing in the latest release of EiffelStudio. The process (for the EiffelStudio user) is very simple: if you are lazy, you can omit the type declarations of some local variables. At first you get, as has always been the case, a compiler error message, or several messages if you omitted many types. The new element is a little "Fix" button next to the error message, enabling you to ask the compiler to come up with a proposed type for a missing declaration, or some of them, or all of them.

Alexander Kogtenkov, a key contributor to the type inference mechanism, has released a short video demo [1]. Watch in particular what happens starting at 1:13; in what may be the most concentrated functionality moment in the history of YouTube programming videos: a whole host of missing declarations, including some really complex ones with TUPLE and agents (PROCEDURE) types pop up, magically inferred by the mechanism.

We have not yet had the time to publish the underlying theory; let me simply indicate that it uses information on where a variable is used and where it is set to find a type that is compatible with all. The usual benefits of static typing remain:

Reliability: if there is an incompatibility, meaning that the tool cannot infer a type that guarantees type safety, the programmer has to go back and find out what is wrong.
Program clarity: type inference is feature of the IDE, not affecting the type system of the programming language. Once type inference has fixed the program, the types will appear in the text, for the benefit of future readers of the program. (It may be possible to hide them for conciseness, but again this is only in the graphical environment, not in the actual text.)

At the same time, type inference frees programmer from a considerable burden, for types that are tedious to write out or too trival to bother. Initial reaction shows that programmers already use the mechanism extensively, and greatly appreciate the simplification.

Reference

[1] Alexander Kogtenkov: Automated Fixing in EiffelStudio, YouTube video demo, 18 July 2014, available here.

Appendix: static typing

Here is the definition of static typing as used in this article, starting with the definitions of supporting concepts: type mismatch, type-safe program, static type checker.

A type mismatch is an attempt, at execution time, to apply an operation to a value for which the operation is not defined: for example, if the value is an integer and you try to add a character (rather than another integer) to it; or if it is an object denoting a person and instead of applying a relevant operation, such as "compute age", you mistakenly try an operation defined for other kinds of object, such as addition.

A program is type-safe if none of its executions can result in a type mismatch.

A static type checker is a tool that can analyze a program and determine whether it is type-safe. "Static" means that the type checker proceeds solely by analyzing program texts; it does not need to execute the corresponding programs, as a dynamic tool would.

We are only interested in a static type checker that is sound, meaning that it will not miss any violation: if it accepts a program, then it provides a mathematical-like guarantee that none of its executions can ever produce a type mismatch.

In principle it is easy to write a sound static type checker: just make it reject all programs. This observation leads to introducing another criterion, beyond soundness, for a checker to be useful. Consider a programming language whose definition includes a "type system": a set of "type rules" (requiring for example that all variables be declared with an explicit type, and imposing restrictions on using these variables, for example integer variables may only use integer operations). Then a static type checker is complete if it accepts any program that satisfies the type rules. The goal is to obtain static type checkers that are both complete and sound: they accept all programs that satisfy the rules; and they only accept type-safe programs.

A programming language is statically typed if it is possible to write a sound and complete static type checker for it.

A statically typed programming language includes type rules, requiring programs to satisfy certain conditions. Typically, type rules fall into two categories: first, programmers must write type declarations for variables, associating them with specific type names such as INTEGER; second, all operations on such variables must satisfy specific constraints associated with their declared types (for example a variable declared of type INTEGER may only participate in operations defined for that type). Inheritance and other advanced mechanisms lead to further type rules, all intended to guarantee that programs will be type-safe. The set of all type rules for a language is called its type system.

In practice, a static type checker is usually not a separate tool but one of the functions of a compiler: the compiler will refuse to generate code for programs found not to be type-safe.

The alternative to static typing is dynamic typing, whereby programs in the given programming language may produce type mismatches during execution. The terms "typed" and "untyped" are often used as synonyms for statically and dynamically typed. Note, however, that every programming language is in the end typed: since a type mismatch implies that an operation cannot be applied, it will eventually be detected, if only at execution time, for example by causing the program to crash. The point of static typing is to eliminate all potential type mismatches before attempting any execution.

For most interesting problems of computer science, achieving both soundness and completeness is an undecidable problem. Type safety is an exception to this usual property: it is possible to design realistic type systems, such as those of successful statically typed programming languages, which compilers can enforce, and which at the same time guarantees the absence of type errors. This achievement is only possible because it limits itself to a specific kind of run-time errors, those captured by the type system. Other kinds of execution-time errors are still possible for a program that has been successfully compiled. The scope of type systems has, however, expanded in recent years, to rule out new kinds of error previously thought to be beyond the reach of type rules; for example, the Eiffel type system includes void safety, which guarantees the absence of null-pointer dereferencing (x.f where x is a null pointer or void reference).