Thesis

Abstraction via Program Transformation (doctoral thesis)

In the field of computer-aided verification, abstraction techniques are indispensable as they play a critical role in reducing the complexity of the verification process to manageable sizes. However, these techniques are usually tightly integrated into tools, resulting in undesired complexity and neglect of reusable design. If one wants to employ a particular abstraction in several tools, the common approach is to implement the abstraction separately for each tool. However, this results in unnecessary duplication of an effort and inconsistent results as different implementations may vary. This thesis aims to address this challenge by developing a solution to reduce the duplication and complexity of abstraction. The overarching objective is to create a tool-independent program abstraction and utilize it to design program analysis techniques that can be applied by any tool, thereby reducing its complexity and allowing abstraction reusability. What is common between tools is the program representation they operate with. One such representation that has gained popularity is the LLVM intermediate representation of the Clang compiler. It can serve as a shared source of truth between tools and encode additional information for program analysis. In this doctoral thesis, we propose a solution to tool-independent abstraction by reconceiving it in terms of the intermediate representation. The key idea of the presented approach is to express abstract semantics in terms of intermediate representation, resulting in the abstraction being understandable to any tool which uses LLVM IR. We refer to this approach as compilation-based abstraction, as it essentially compiles abstract semantics into the program being analyzed. The compilation-based abstraction approach has broad applicability in the field of program analysis. This thesis demonstrates its potential in recasting symbolic execution as program abstraction, enabling explicit tools to perform symbolic analysis or even run symbolic programs natively. The focus will also be on more complex program primitives such as C arrays or strings, which require elaborated abstraction. The approach is extended to encompass aggregate types and library-level abstractions. Additionally, attention is given to dynamic memory abstraction, specifically the problem of ambiguous dynamic memory layout, which is often overlooked in analysis tools. Furthermore, the compilation-based abstraction approach opens up new possibilities for refinement techniques that can enhance the precision of the employed abstraction. This thesis explores the potential for instrumenting refinement directly into the program and counterexample-guided syntactic abstraction of program representation. In summary, this thesis provides an autonomous solution for instrumenting abstraction into program representation. The presented approach addresses various issues related to program domain interactions, control flow, and dynamic memory. It includes adaptations of symbolic execution, software model checking, and syntactic refinement loops. Moreover, by enabling the native execution of compiled abstraction, this approach not only streamlines the program abstraction design process but also enhances the efficiency of program analysis. Our main contributions culminated in the development of an LLVM Abstraction & Refinement Tool – LART. This tool implements all discussed abstractions and has been shown to be effective both in conjunction with other tools and in producing natively executable abstractions.
Abstractions via Program Transformations (advanced master thesis)

In computer-aided verification, most of the tools leverage abstraction techniques to reduce the complexity of analyzed systems. Even though these techniques are widely adopted, they are usually tightly integrated into tools, causing undesired complexity, and neglect any reusable design. In my research, I focus on the investigation of abstraction-based techniques used in program verification as abstract interpretation, counterexample guided abstraction or symbolic execution. For these techniques, I devise self-contained alternatives and general-purpose analyses. As a solution for self-contained abstraction, I propose a transformation-based method. This method utilizes instrumentation to insert abstraction directly to the program and consequently quits of responsibility for abstraction from the verification tools. Up until now, I have developed transformation-based symbolic execution and string abstraction for explicit model checkers. In the future, I plan to generalize the transformation-based technique for a wider variety of abstract domains such as predicate or pointer abstraction. Furthermore, I intend to adapt traditional abstraction refinement techniques to the self-contained abstraction.
Symbolic Model Checking via Program Transformations (master thesis)

To show reliability of software, developers usually reach out for testing and static analysis. However, to prove correctness, all behaviours of a program need to be checked. In this respect, formal verification methods aim to provide an automated approach to verification. A big obstacle are inputs because they massively increase the number of behaviours of the program. In this thesis, we present a technique which enables verification tools to perform automated checking of programs with inputs. A generally known approach is to interpret operations with input values in an abstract way. In this case, abstraction needs to be implemented in the verification tool. We propose that instead, an abstraction can be compiled into the program. Hence, the program can be verified by a tool even though the tool itself does not support abstraction of inputs. We implement the proposed approach as an LLVM-to-LLVM transformation which inserts an abstraction to the program. The applicability of the approach is demonstrated by transforming programs to represent their inputs symbolically. This, in turn, enables an essentially explicit-state model checker to verify the program. For evaluation purposes, we have chosen DIVINE as the model checker.
Introduction to Data Structures in Examples (bachelor thesis)

This thesis presents methods and structures proposed for teaching and home study of data structures. The thesis describes creation process of concrete exercises and their solutions. A part of exercises cover practical implementations, that contains code templates to be solved and a package of tests. Themes covered in the thesis are elementary data structures such as stack, queue, linked list and their modifications. Moreover, the thesis covers dynamic data structures for searching: binary heap, searching trees, red-black trees and B-trees. The thesis also comprehends data structures for the implementation of dictionaries.

2022

Verification of Programs Sensitive To Heap Layout.
By Lauko, H., Korenčik, L. and Ročkai, P.
In ACM Trans. Softw. Eng. Methodol., vol. 31, no. 4

Most C and C++ programs use dynamically allocated memory (often known as a heap) to store and organize their data. In practice, it can be useful to compare addresses of different heap objects, for instance, to store them in a binary search tree or a sorted array. However, comparisons of pointers to distinct objects are inherently ambiguous: The address order of two objects can be reversed in different executions of the same program, due to the nature of the allocation algorithm and other external factors.This poses a significant challenge to program verification, since a sound verifier must consider all possible behaviors of a program, including an arbitrary reordering of the heap. A naive verification of all possibilities, of course, leads to a combinatorial explosion of the state space: For this reason, we propose an under-approximating abstract domain that can be soundly refined to consider all relevant heap orderings.We have implemented the proposed abstract domain and evaluated it against several existing software verification tools on a collection of pointer-manipulating programs. In many cases, existing tools only consider a single fixed heap order, which is a source of unsoundness. We demonstrate that using our abstract domain, this unsoundness can be repaired at only a very modest performance cost. Additionally, we show that, even though many verifiers ignore it, ambiguous behavior is present in a considerable fraction of programs from software verification competition (sv-comp).
From Spot 2.0 to Spot 2.10: What’s New?
By Duret-Lutz, A., Renault, E., Colange, M., Renkin, F., Gbaguidi Aisse, A., Schlehuber-Caissier, P., Medioni, T., Martin, A., Dubois, J., Gillard, C. and Lauko, H.
In Computer Aided Verification

Spot is a C++17 library for LTL and }}\backslashomega }}ω-automata manipulation, with command-line utilities, and Python bindings. This paper summarizes its evolution over the past six years, since the release of Spot 2.0, which was the first version to support }}\backslashomega }}ω-automata with arbitrary acceptance conditions, and the last version presented at a conference. Since then, Spot has been extended with several features such as acceptance transformations, alternating automata, games, LTL synthesis, and more. We also shed some lights on the data-structure used to store automata.
On the Optimization of Equivalent Concurrent Computations. 2022

In this submission, we explore the use of equality saturation to optimize concurrent computations. A concurrent environment gives rise to new optimization opportunities, like extracting a common concurrent subcomputation. To our knowledge, no existing equality saturation framework allows such an optimization. The challenge with concurrent environments is that they require non-local reasoning since parallel computations are inherently unrelated and disjoint. This submission presents a new approach to optimizing equivalent concurrent computations: extending e-graphs to capture equal concurrent computations in order to replace them with a single computation.
LART: Compiled Abstract Execution.
By Lauko, H. and Ročkai, P.
In Tools and Algorithms for the Construction and Analysis of Systems

LART – LLVM Abstraction and Refinement Tool – originates from the divine model-checker, in which it was employed as an abstraction toolchain for the llvm interpreter. In this contribution, we present a stand-alone tool that does not need a verification backend but performs the verification natively. The core idea is to instrument abstract semantics directly into the program and compile it into a native binary that performs program analysis. This approach provides a performance gain of native execution over the interpreted analysis and allows compiler optimizations to be employed on abstracted code, further extending the analysis efficiency. Compilation-based abstraction introduces new challenges solved by lart, like domain interaction of concrete and abstract values simulation of nondeterministic runtime or constraint propagation.

2021

2020

Abstracting Strings for Model Checking of C Programs.
By Lauko, H., Olliaro, M., Cortesi, A. and Ročkai, P.
In Applied Sciences: Static Analysis Techniques (SAT)

Data type abstraction plays a crucial role in software verification. In this paper, we introduce a domain for abstracting strings in the C programming language, where strings are managed as null-terminated arrays of characters. The new domain M-String is parametrized on an index (bound) domain and a character domain. By means of these different constituent domains, M-Strings captures shape information on the array structure as well as value information on the characters occurring in the string. By tuning these two parameters, M-String can be easily tailored for specific verification tasks, balancing precision against complexity. The concrete and the abstract semantics of basic operations on strings are carefully formalized, and soundness proofs are fully detailed. Moreover, for a selection of functions contained in the standard C library, we provide the semantics for character access and update, enabling an automatic lifting of arbitrary string-manipulating code into our new domain. An implementation of abstract operations is provided within a tool that automatically lifts existing programs into the M-String domain along with an explicit-state model checker. The accuracy of the proposed domain is experimentally evaluated on real-case test programs, showing that M-String can efficiently detect real-world bugs as well as to prove that program does not contain them after they are fixed.
On Symbolic Execution of Decompiled Programs.
By Korenčik, L., Ročkai, P., Lauko, H. and Barnat, J.
In Software Quality, Reliability, and Security (QRS)

In this paper, we present a combination of existing and new tools that together make it possible to apply formal verification methods to programs in the form of x86_64 machine code. Our approach first uses a decompilation tool (remill) to extract low-level intermediate representation (LLVM) from the machine code. This step consists of instruction translation(i.e. recovery of operation semantics), control flow extraction and address identification. The main contribution of this paper is the second step, which builds on data flow analysis and refinement of indirect (i.e. data-dependent) control flow. This step makes the processed bitcode much more amenable to formal analysis.To demonstrate the viability of our approach, we have compiled a set of benchmark programs into native executables and analysed them using two LLVM-based tools: DIVINE, a software model checker and KLEE, a symbolic execution engine. We have compared the outcomes to direct analysis of the same programs.

2019

Extending DIVINE with Symbolic Verification Using SMT.
By Lauko, H., Štill, V., Ročkai, P. and Barnat, J.
In Tools and Algorithms for the Construction and Analysis of Systems (TACAS)

DIVINE is an LLVM-based verification tool focusing on analysis of real-world C and C++ programs. Such programs often interact with their environment, for example via inputs from users or network. When these programs are analyzed, it is desirable that the verification tool can deal with inputs symbolically and analyze runs for all inputs. In DIVINE, it is now possible to deal with input data via symbolic computation instrumented into the original program at the level of LLVM bitcode. Such an instrumented program maintains symbolic values internally and operates directly on them. Instrumentation allows us to enhance the tool with support for symbolic data without substantial modifications of the tool itself. Namely, this competition contribution uses SMT formulae for representation of input data.
String Abstraction for Model Checking of C Programs.
By Cortesi, A., Lauko, H., Olliaro, M. and Ročkai, P.
In Model Checking Software (SPIN)

Automatic abstraction is a powerful software verification technique. In this paper, we elaborate an abstract domain for C strings, that is, null-terminated arrays of characters. We describe the abstract semantics of basic string operations and prove their soundness with regards to previously established concrete semantics of those operations. In addition to a selection of string functions from the standard C library, we provide semantics for character access and update, enabling automatic lifting of arbitrary string-manipulating code into the domain.

2018

Symbolic Computation via Program Transformation.
By Lauko, H., Ročkai, P. and Barnat, J.
In Theoretical Aspects of Computing (ICTAC)

Symbolic computation is an important approach in automated program analysis. Most state-of-the-art tools perform symbolic computation as interpreters and directly maintain symbolic data. In this paper, we show that it is feasible, and in fact practical, to use a compiler-based strategy instead. Using compiler tooling, we propose and implement a transformation which takes a standard program and outputs a program that performs a semantically equivalent, but partially symbolic, computation. The transformed program maintains symbolic values internally and operates directly on them; therefore, the program can be processed by a tool without support for symbolic manipulation.

2017

Model Checking of C and C++ with DIVINE 4.
By Baranová, Z., Barnat, J., Kejstová, K., Kučera, T., Lauko, H., Mrázek, J., Ročkai, P. and Štill, V.
In Automated Technology for Verification and Analysis (ATVA)

The fourth version of the DIVINE model checker provides a modular platform for verification of real-world programs. It is built around an efficient interpreter of LLVM code which, together with a small, verification-oriented operating system and a set of runtime libraries, enables verification of code written in C and C++.
Optimizing and Caching SMT Queries in SymDIVINE.
By Mrázek, J., Jonáš, M., Štill, V., Lauko, H. and Barnat, J.
In Tools and Algorithms for the Construction and Analysis of Systems (TACAS)

This paper presents a new version of the tool SymDIVINE, a model-checker for concurrent C/C++ programs. SymDIVINE uses a control-explicit data-symbolic approach to model checking, which allows for the bit-precise verification of programs with inputs, by representing data part of a program state by a first-order bit-vector formula. The new version of the tool employs a refined representation of symbolic states, which allows for efficient caching of smt queries. Moreover, the new version employs additional simplifications of first-order bit-vector formulas, such as elimination of unconstrained variables from quantified formulas. All changes are documented in detail in the paper.

2016

SymDIVINE: Tool for Control-Explicit Data-Symbolic State Space Exploration.
By Mrázek, J., Bauch, P., Lauko, H. and Barnat, J.
In Model Checking Software (SPIN)

We present SymDIVINE: a tool for bit-precise model checking of parallel C and C++ programs. It builds upon LLVM compiler infrastructure, hence, it uses LLVM IR as an input formalism. Internally, SymDIVINE extends the standard explicit-state state space exploration with SMT machinery to handle non-deterministic data values. As such, SymDIVINE is on a halfway between a symbolic executor and an explicit-state model checker. The key differentiating aspect present in SymDIVINE is the ability to decide about equality of two symbolically represented states preventing thus repeated exploration of the state space graph. This is crucially important in particular for verification of parallel programs where the state space graph is full of diamond-shaped subgraphs.