Automating Proofs

An example of a four-color map. The four-color map theorem says no more than four colors are required to color the regions of a two-dimensional map so no two adjacent regions have the same color.

Over the past two decades, mathematicians have succeeded in bringing computers to bear on the development of proofs for conjectures that have lingered for centuries without solution. Following a small number of highly publicized successes, the majority of mathematicians remain hesitant to use software to help develop, organize, and verify their proofs.

Yet concerns linger over usability and the reliability of computerized proofs, although some see technological assistance as being vital to avoid problems caused by human error.

Troubled by the discovery in 2013 of an error in a proof he co-authored almost 25 years earlier, Vladimir Voevodsky of the Institute for Advanced Study at Princeton University embarked on a program to not only employ automated proof checking for his work, but to convince other mathematicians of the need for the technology.

Jacques Carette, assistant professor in the department of computing and software at McMaster University in Ontario, Canada, and a promoter of the idea of ‘mechanized mathematics’, says, “There are both technical and social forces at work. Even though mathematicians doing research are trying to find new knowledge, they are quite conservative about the tools they use. Tools can take some time to adopt: tools such as (the algebra systems) Maple and Mathematica took a solid 20 years before they became pervasive.

“Personally, once I got over the hurdle of learning how these things work, it now speeds me up. I can be bold in my conjectures and the computer will tell me that I’m wrong. The computer is good at spotting problems in the small details: things that humans are typically really bad at.”

Jeremy Avigad, a professor in the philosophy and mathematical sciences departments at Carnegie Mellon University, says of formal proof technology: “I believe that it will become commonplace. It’s a natural progression. We care that our math is precise and correct and we now have a technology that helps in that regard. But the technology is not yet ready for prime time. There is a gap: it’s usability.”

For mathematicians working on problems that seemed insurmountable if tackled purely by hand, the usability gap was less of an issue than trust in the results by others. The length and complexity of a proof of the 1611 conjecture by Johannes Kepler on the most efficient method for packing spheres that was developed by Thomas Hales while working at the University of Michigan in the late 1990s led to a reviewing process that took four years to complete. Even after that length of time, reviewers claimed they could only be 99% certain of the proof’s correctness. To fully demonstrate the proof’s correctness, Hales started a collaborative project called FlySpeck built on the use of automated proof-checking software. The group completed its work early in 2015 with the publication of a paper that described the process used.

In building the original proof of the Kepler conjecture, Hales and his colleagues had to develop software that performed computation in a way that lent itself to building a reliable proof. A large part of the proof lies in a lengthy series of inequalities that needed to be demonstrated using computation. Rather than rely on conventional floating-point arithmetic and the imprecision that introduces, the group had to develop software to perform interval arithmetic and automated differentiation using Taylor expansions “so we were able to get rigorous upper and lower bounds,” Hales explains.

“Once I got over the hurdle of learning how these things work, it now speeds me up. I can be bold in my conjectures and the computer will tell me that I’m wrong.”

Similar issues of trust greeted the first computerized proof of the theorem that argued only four colors are needed to distinguish between adjacent regions on a 2D map. Forty years ago, working at the University of Illinois at Urbana-Champaign, Kenneth Appel and Wolfgang Haken developed computer programs to demonstrate there were no counterexamples to the theorem. Still, the programs were tedious to check by hand.

In 2005, Georges Gonthier of Microsoft Research and Benjamin Werner of French research institute Inria used a computerized proof assistant, Coq, to develop a proof that did not rely on counterexamples generated by programs. Moving to a position where the Coq kernel could be trusted involved less manual inspection.

“Most proof systems are built on a very small trusted base. In terms of what is carrying out inferences, that code base is very small. Then on top of that, you write very complicated software that parses the code, but ultimately, it calls this very small thing that actually checks the proof,” Avigad explains. “People have also built mechanisms where you can have a complex proof system that checks the proof and then have another checker check the proof independently. The probability of there being a mistake in one and the other having the same mistake is very low.”

Avigad adds: “Mathematicians are great pragmatists. People will do what they need to do and if they see something is useful, they will apply it. But right now the feeling is that there is a very steep learning curve. You spend some time learning logic and formal methods, then spend some time learning to prove what are really quite trivial formal theorems.”

A key stumbling block is the level of detail required by the computer to prepare a formal proof. Despite the apparent rigor of mathematical notation and language, “there is a lot that’s left implicit,” Avigad says. “In a proof, there are many small inferences. Specifying each one explicitly is not something we typically do.”

Automated proof assistants need far more detail from the mathematician, says Tobias Nipkow, a member of the theorem-proving group at the Technical University of Munich’s department of information technology. “We are still at the stage where in order to formalize anything non-trivial, you frequently have to include a significant amount of foundational material that a mathematician would simply take for granted.”

One approach is to build online archives of formalized mathematical knowledge that could be accessed by proof tools. An example of such a repository is the Archive of Formal Proofs (http://afp.sourceforge.net/), which uses Sourceforge to hold a collection of proof libraries and examples written for the Isabelle prover software, one of the tools used for the FlySpeck project.

“Equally important are more powerful automatic proof procedures. This would enable mathematicians to take bigger steps in their proofs and reduce the tedium of computer-assisted proofs,” Nipkow adds.

Carette says a key problem in developing more usable proof assistants with the help of online repositories is the breadth of techniques needed to produce the software. “The technical challenges involved in these two enterprises are quite different. They tend to be done by different people.”

A further issue is the tendency for branches of mathematics to form silos that use quite different techniques to approach the problem of building proofs. Large-scale proofs such as FlySpeck rely on extensive computation, as well as the analytical steps associated with mathematical proofs.

“Systems to do computation have been entirely different to those that do proofs,” says Carette.

Even with trust in the results and increased usability of the tools used to develop computerized proofs, what happens if the proof is only ever processed by a machine, without calling for a human to understand what it contains?

Says Avigad, “Formal verification can cover the correctness part of a proof, but it doesn’t convey the knowledge.”

Adds Carette, “Being able to understand the proof is a real issue. But a trend that’s started recently is to write a paper that says in the introduction, ‘this paper is a reformulation of a set of results’. You start with a set of proofs that are machine-checked, but the paper you write is an explanation of what is going on. That appears to work. People feared the explanation step would go away, but it hasn’t.”

Even with formal verification as a basis, there remain fears that errors will still creep into proofs, says Avigad. “People in the community are very sensitive to this, but software has been built on an architecture that contains safeguards designed to maintain correctness.”

As the technology becomes more widespread, a rapid shift in acceptance could take place in certain sub-disciplines, if not in mathematics as a whole, Nipkow says. “The ease with which mathematics can be formalized with a proof assistant depends to some degree on the subject area. As a result, in some areas we may see more of an enthusiasm for such formalizations develop. “I don’t really see that yet. But, I do expect to see further, isolated formal landmark proofs,” Nipkow says, pointing to examples such as Hales’ proof of the Kepler conjecture and the work on the four-color map theorem.

Carette adds, “You may see situations where, at the highest level, if the paper doesn’t come with a computer-assisted proof, it’s likely that the result will be rejected. A rapid change from almost nobody using the technology to computer-assisted proofs becoming a de facto standard? It can happen.”

Having better understanding among those writing the software for mathematicians will help improve the spread of the technology, Carette says. “I’m not sure we understand the process that mathematicians go through when they do their work to be able to predict which technical advance will provide the biggest win. What is being discovered more and more now is that proof alone is insufficient.

“Mathematics is made up of a lot of things: proof; computation; knowledge management; plotting graphs. All of these have to come together to make a useful system.”

What is required, in effect, is a successor to the Maple software, originally developed in 1980 at the University of Waterloo to support automated proofs with an extensive set of related tools.

However, there is a problem with incentivizing the development of more complete automated mathematical assistants. “The work is slowed down by the amount of development effort needed that isn’t rewarded. If you don’t get a paper out of a development, academia doesn’t recognize it as work. That’s changing, but slowly,” says Carette, pointing to the work performed by research institutes, where more time is available for staff to develop software.

Avigad says he expects computers ultimately “will help us find new ways to do mathematics that are currently too complex for us to do manually. It’s a very exciting time. Over the decades, mathematics will change.”

Figures

Figure. An example of a four-color map. The four-color map theorem says no more than four colors are required to color the regions of a two-dimensional map so no two adjacent regions have the same color.