In 2020, poor-quality software systems led to financial losses of approximately USD 2.08 trillion in the U.S. alone.19 Formal methods, such as bounded model checking (BMC), help to improve software quality, but they often fail to scale to the size and complexity of software. To solve these problems, we have been developing two frameworks—VeriAbs and VeriFuzz—with novel techniques, tools, and strategies.
VeriAbs
Aim. Prove code correctness or generate tests that demonstrate correctness violations.
Innovation. VeriAbs implements several novel verification techniques, such as loop shrinking,20 aimed at verifying programs with complex loops, and improving the operational efficiency of popular verification techniques, such as k-induction.3,9 Further, in our experience, the application of an appropriate sequence of techniques is more effective at verifying complex programs than any single technique. As shown in Figure 1, VeriAbs implements four novel strategies aimed at verifying different kinds of code, where each strategy consists of a sequence of verification techniques:
Figure 1. VeriAbs architecture (S: Program Safe, F: Property Fails, U: Unknown).
Strategy 1 comprises only k-induction, which is effective on loops with unstructured control flow. This does not occur in real-world code but in synthetic programs to challenge verification engines. In practice, we switch off the first strategy and retain k-induction as the final step of the default strategy.
Strategy 2 is useful for verifying quantified properties, such as whether arrays are sorted. This kind of code is commonly found in the automotive sector, where domain-specific parameters and configurations are coded using arrays and are accessed using pointers. This strategy correctly verifies 87% of all quantified properties in automotive applications.17
Strategy 3 aims to verify implementations of reactive systems and email clients that admit inputs to loops with very short ranges. As a result, 50% of the properties of such benchmarks in the International Competition on Software Verification (SV-COMP)5 could be verified using this strategy.
Strategy 4 is the default strategy of VeriAbs and handles programs containing linear acceleration of variables, which are very common in practice. This strategy works well in practice; it helped not only with the verification of real-world software13 but also to eliminate false positives from those generated by static analysis tools,10 such as the TCS Embedded Code Analyzer (ECA).18
Impact. The main technical novelty of VeriAbs is in the various abstractions it implements to scale program verification, which resulted in four publications7,13,17,20 and six patents. The architectural advances in VeriAbs are published in a tool paper1 and several competition-contribution papers.2,9,12,15 To benchmark against other verifiers, VeriAbs has participated in SV-COMP since 2017, competing with more than 30 state-of-the-art verifiers. The results show that VeriAbs solves about 20% more programs than the competition,5 helping VeriAbs win a gold medal in SV-COMP’s ReachSafety category each year since 2019.
VeriAbs has also been used in different industrial projects.8,10,14,16,21 This experience helped us refine the strategies in VeriAbs to successfully verify code sizes of 1.5 MLOC for properties such as array index out of bounds, division by zero, variable value overflow and underflow, and null pointer dereferences. VeriAbs verified around 60% of these properties across domains including automotive, office automation, networking, device drivers, and security.
VeriFuzz
Aim. Automatic test generation for the discovery of complex software errors related to functionality, security, and privacy.
Innovation. Figure 2 shows the architecture of the VeriFuzz framework, which combines coverage guided fuzzing (CGF), BMC, and static analysis (SA) technology in novel ways for automatic test-case generation for programs. VeriFuzz uses the open source engines AFL24 and CBMC6 respectively for CGF and BMC, and TCS’s in-house SA framework PRISM.18
Figure 2. VeriFuzz architecture.
The input to VeriFuzz is a program P and a desired coverage ∅. First, VeriFuzz extracts some syntactic features and classifies P into a category K, which determines the parameters to the BMC and the fuzzer. Next, P is instrumented to measure coverage, and BMC is then invoked to generate an initial test corpus Tc. The fuzzer invokes Pe, the executable of instrumented P, with the initial corpus Tc and measures the initial coverage. Each test from Tc is mutated several times to generate a set of new test inputs Tg. Pe is repeatedly executed with each test input tg ∈ Tg and coverage is measured. The fitness check determines whether tg should be added to Tc or not. This process is repeated until ∅ is achieved. VeriFuzz is efficient (takes only a few minutes) and effective (able to achieve ∅).
Impact. VeriFuzz combines the strengths of BMC and CGF for efficient and effective test generation,22 which led to a nomination for a best paper award. The architectural advances in VeriFuzz are published in competition-contribution papers.11,23 To benchmark it against state-of-the-art test engines, VeriFuzz has participated in the International Competition on Software Testing (Test-Comp)4 each year since 2019, competing with more than 10 state-of-the-art test engines and evaluated on more than 3,000 C benchmarks. VeriFuzz has earned seven gold and five silver medals in Test-Comp and discovered errors 5- to 30-times faster than other top tools. In the industry, VeriFuzz was recently evaluated on an already well-tested smart contract implementation, where it exposed security errors, especially in the handling of Unicode string character buffers that were hitherto undiscovered.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment