$ Chapter 2 — A Gentle Introduction to Static Analysis
Chapter 2 — A Gentle Introduction to Static Analysis
21 April 2021

2.2 Abstraction

Example: Semantic Property of Interest: Reachability

Realistic application for C programs:

  • where it dereference a null pointer

  • where it writes over a dangling pointer


Definition: Abstraction
We call abstraction a set 𝒜 of logical properties of program states, which are called abstract properties or abstract elements. A set of abstract properties is called an abstract domain.

Definition: Concretization
Given an abstract element aa of 𝒜, we call concretization the set of program states that satisfy it. We denote it by γ(a)\gamma(a)

Abstraction is not unique – some abstractions yield simpler computer representations and less costly algorithms than others:

Definition: Best Abstraction

We say that aa is the best abstraction of the concrete set SS if and only if

  • Sγ(a)S \in \gamma(a), and

  • for any aa^{\prime} that is an abstraction of SS (i.e., Sγ(a)S \in \gamma(a^{\prime})), then aa^{\prime} is a coarser abstraction than aa

If SS has a best abstraction, then the best abstraction is unique. When it is defined, we let α denote the function that maps any concrete set of states into the best abstraction of that set of states.

The best abstraction may not be available. But the impossibility to define or compute the best abstraction is in no way a serious flow for the analysis, as it just lead to conservative but sound results.

Think of the "reachability" semantic property of a program that acts on 2D space:

abstraction

idea

intervals abstraction

the abstract elements of the interval abstract are defined by constraints of the form lXX,XhX,lYY,YhYl_{\rm{X}} ≤ \rm{X}, \rm{X} ≤ h_{\rm{X}}, l_{\rm{Y}} ≤ \rm{Y}, \rm{Y} ≤ h_{\rm{Y}}

  • always has the best abstraction

  • non-relational

  • simpler, more efficient representation

convex polyhedra abstraction

the abstract elements of the convex polyhedra abstract domain are conjunctions of linear inequality constraints, e.g.

  • XY0.5\rm{X} - \rm{Y} ≥ -0.5

  • X2.5\rm{X} ≤ 2.5

  • may not have the best abstraction

  • relational

  • more expressive, but more complex and less efficient

2.3 A Computable Abstract Semantics: Compositional Style

A compositional approach to static analysis: to analyze a sequence of commands, "composes" the analyses of each sub-command

2.3.1 Abstraction of Initialization

2.3.2 Abstraction of Post-Conditions

Definition: Sound analysis by abstract interpretation in compositional style
We consider a static analysis function analysis that inputs a program and an abstract pre-condition and returns an abstract post-condition. We say that analysis is sound if and only if the following condition holds:
If an execution of p from a state (x, y) generates that state (x′, y′),
then for all abstract element aa such that (x, y) ∈ γ(aa),
(x′, y′) ∈ γ(analysis(p, aa))

The definition above entails that:

2.3.3 Abstraction of Non-Deterministic Choice

Abstract interpretation will produce an over-approximation of both cases, as the union of two sets of abstract elements.

Note
This is another reason for the incompleteness of the analysis: it cannot express precise disjunctive properties (the several solutions will be presented in section 5.1).

2.3.4 Abstraction of Non-Deterministic Iteration

An interesting fact about static analysis
No analysis can ensure that an iteration will terminate with the abstract post-condition (even if computed), because the halting problem cannot be computed exactly in finite time.

Algorithm: analysis(iter{p},a)\texttt{analysis(iter\{p\}}, a)

R ← a;
repeat
    T ← R;
    R ← widen(R, analysis(p, R));
until inclusion(R, T)
return T;
Note

Widening is another source of potential incompleteness, but fortunately there are many techniques to make the analysis of loops more precise.

Example: Loop unrolling

To unroll the first iteration of the loop into the union of {{}; b} can ease the effect of the succeeding widenings.

program that acts on 2D space

abstract iteration

original

init({(x,y) | 0 ≤ y ≤ 2x and x ≤ 0.5})
iter{
  translation(1, 0.5)
}

abstract-iteration

loop unrolled

init({(x,y) | 0 ≤ y ≤ 2x and x ≤ 0.5})
{} or {
  translation(1, 0.5)
}
iter{
  translation(1, 0.5)
}

abstract-iteration-loop-unrolled

2.3.5 Verification of the Property of Interest

The analysis discussed so far actually computes as intermediate results over-approximations for all the interesting states of the input program, and thus we can just monitor them and use it for the verification of the property of interest.

2.4 A Computable Abstract Semantics: Transitional Style

Idea: compute, from the outset, all occurring intermediate states

2.4.1 Semantics as State Transitions

The goal of the analysis: to collect all the states occurring in all possible transition sequences of the input program.

The algorithm works very similarly to the "compositional style", but it will compute states per statement and iterates until all the statement states get converged.

2.5 Core Principles of a Static Analysis

The three-stage approach for static analysis:

  1. selection of the semantics and properties of interest

  2. choice of the abstraction

  3. derivation of the analysis algorithms from the semantics and from the abstraction