Static program analysis is a key component of many software development tools, including compilers, development environments, and verification tools. Practical applications of static analysis have grown in recent years to include tools by companies such as Coverity, Fortify, GrammaTech, IBM, and others. Analyses are often expected to be sound in that their result models all possible executions of the program under analysis. Soundness implies the analysis computes an over-approximation in order to stay tractable; the analysis result will also model behaviors that do not actually occur in any program execution. The precision of an analysis is the degree to which it avoids such spurious results. Users expect analyses to be sound as a matter of course, and desire analyses to be as precise as possible, while being able to scale to large programs.
Soundness would seem essential for any kind of static program analysis. Soundness is also widely emphasized in the academic literature. Yet, in practice, soundness is commonly eschewed: we are not aware of a single realistic whole-programa analysis tool (for example, tools widely used for bug detection, refactoring assistance, programming automation, and so forth) that does not purposely make unsound choices. Similarly, virtually all published whole-program analyses are unsound and omit conservative handling of common language features when applied to real programming languages.
The typical reasons for such choices are engineering compromises: implementers of such tools are well aware of how they could handle complex language features soundly (for example, assuming that a complex language feature can exhibit any behavior), but do not do so because this would make the analysis unscalable or imprecise to the point of being useless. Therefore, the dominant practice is one of treating soundness as an engineering choice.
In all, we are faced with a paradox: on the one hand we have the ubiquity of unsoundness in any practical whole-program analysis tool that has a claim to precision and scalability; on the other, we have a research community that, outside a small group of experts, is oblivious to any unsoundness, let alone its preponderance in practice.
Soundness is not even necessary for most modern analysis applications, however, as many clients can tolerate unsoundness.
We introduce the term soundy for such analyses. The concept of soundiness attempts to capture the balance, prevalent in practice, of over-approximated handling of most language features, yet deliberately under-approximated handling of a feature subset well recognized by experts. Soundiness is in fact what is meant in many papers that claim to describe a sound analysis. A soundy analysis aims to be as sound as possible without excessively compromising precision and/or scalability.
Our message here is threefold:
The typical (published) whole-program analysis extolls its scalability virtues and briefly mentions its soundness caveats. For instance, an analysis for Java will typically mention that reflection is handled "as in past work," while dynamic loading will be (silently) assumed away, as will be any behavior of opaque, non-analyzed code (mainly native code) that may violate the analysis' assumptions. Similar "standard assumptions" hold for other languages. Indeed, many analyses for C and C++ do not support casting into pointers, and most ignore complex features such as
Could all these features be modeled soundly? In principle, yes. In practice, however, we are not aware of a single sound whole-program static analysis tool applicable to industrial-strength programs written in a mainstream language! The reason is sound modeling of all language features usually destroys the precision of the analysis because such modeling is usually highly over-approximate. Imprecision, in turn, often destroys scalability because analysis techniques end up computing huge resultsa typical modern analysis achieves scalability by maintaining precision, thus minimizing the datasets it manipulates.
Soundness is not even necessary for most modern analysis applications, however, as many clients can tolerate unsoundness. Such clients include IDEs (auto-complete systems, code navigation), security analyses, general-purpose bug detectors (as opposed to program verifiers), and so forth. Even automated refactoring tools that perform code transformation are unsound in practice (especially when concurrency is considered), and yet they are still quite useful and implemented in most IDEs. Third-party users of static analysis resultsincluding other research communities, such as software engineering, operating systems, or computer securityhave been highly receptive of program analyses that are unsound, yet useful.
While an unsound analysis may take arbitrary shortcuts, a soundy analysis that attempts to do the right thing faces some formidable challenges. In particular, unsoundness frequently stems from difficult-to-model language features. In the accompanying table, we list some of the sources of unsoundness, which we segregate by language.
All features listed in the table can have significant consequences on the program, yet are commonly ignored at analysis time. For language features that are most often ignored in unsound analyses (reflection,
We strongly feel that:
a. We draw a distinction between whole program analyses, which need to model shared data, such as the heap, and modular analysesfor example, type systems. Although this space is a continuum, the distinction is typically well understood.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.
Hear hear! I frequently make points like this to counter unreasonable claims by some tool vendors that their offerings are invariably sound. Most end users are prepared to believe such claims with little skepticism because they don't know to ask "under what assumptions?"
Disclaimer: I lead the team at GrammaTech that develops CodeSonar, a tool that is definitely not sound, nor soundy for that matter.
The use of the "soundiness" resonated with me for several reasons. First of all, Colbert's use of "truthiness" came to mind. He, of course, was talking opinion ruling over fact.
But, this current usage bears directly on the issue of truthfulness in many ways but on these two, in particular: 1) if a computational system is not sound, when do we know if it is reliable (ignore, please, for the moment, the multitude of issues being raised - but be cognizant that many system developers think that they have no problem in this regard due to the numerical nature of their domain processing) operationally; 2) how do we increase awareness of lurking problems in a climate where hot-dogging via code (encouraged by the companies involved) throw out systems, continually, and seemingly on the fly and do we try to alter the willy-nilly patterns that seem so prevalent the past decade or so?
In short, everywhere "lmost" true is our norm? Just as "soundiness" is something to engineer, so too is "truth" itself. Hence, my focus: truth engineering.
Displaying all 2 comments