memcad

The MemCAD analyzer
README

MemCAD is a static analyzer to carry out shape analysis of C programs.
This file states the main principles underlying the analysis and shows
its basic use. The reference manual provides more detailed information
about the parameterization of the analysis.

Presentation

MemCAD stands for *Mem*ory *C*omposite *A*bstract *D*omain.
It is an experimental tool for program verification, that focuses mainly
on memory properties.

MemCAD is a static analyzer for C programs, specialized in the
inference and verification of memory properties. It performs a
forward analysis by abstract interpretation. More precisely, it
starts from abstract pre-conditions and computes abstract
post-conditions. Unless specified otherwise, the pre-condition
assumes an empty memory state (no variable is defined) The analysis is
designed to be sound, which means that the invariants it computes
should over-approximate the concrete behaviors of the analyzed program
(though, note that MemCAD is offered with no guarantee, even though
the design intent is to ensure its soundness ---e.g., the analyzer may
not be free of implementation bugs). For instance, when a new local
variable is defined and before it is uninitialized, it is assumed to
hold any possible value. To allow for the verification of a code
fragment under some assumption (e.g., that a function is called in a
state where some structure are already built and stored in memory),
the analyzer features specific assumption commands.

During the analysis, MemCAD attempts to prove the absence of memory
errors and assertion violations.
In particular, it attempts to prove:

  • the absence of memory errors, such as the dereference of null or
    invalid pointers;

  • the absence of accesses (read or write) outside the bounds of
    memory blocks (that includes array out of bound errors);

  • that regular assertions are satisfied;

  • that structural assertions (that are described in a specific
    language) are also satisfied, which means that certain structural
    invariants hold.

The main use of MemCAD consists in verifying the preservation of
structural assertions such as the preservation of well-formed dynamic
structures such as lists, trees, possibly nested into arrays.

The analysis is conservative and may fail to prove properties that
hold in all concrete executions. Messages reporting the analyzer
cannot prove the correctness of some operations do not necessarily
mean that there exists a concrete execution where they do not hold, or
even that the analyzer cannot prove them under different settings. In
many cases, they stem for imprecision in the analysis, that could
either be solved using other settings, or require some deeper
refinements of the abstract interpreter implemenation (addition of an
abstract domain, refinement of abstract transfer functions and
analysis algorithms). In the current version of the MemCAD
implementation, shape properties (well formedness of structures with
respect to purely pointer properties) and basic numerical properties
(linear constraints) can be specified and are handled rather
precisely. Also, set abstraction allow to verify the preservation of
shared data structures. The analysis handles basic relations between
pointer properties, numerical properties and sharing properties, but
cannot handle precisely all such properties (the current version is
not able to generalize numerical invariants on red-black trees, for
instance).

The abstract domain of MemCAD is highly parametric and modular. The
analyzer relies on a library of abstract domains that can often be
enabled or disabled independently. Some domains require some
parameterization, which can be performed either automatically or
manually.

The MemCAD analyzer inputs C programs, and optional user-supplied
description of data-structures of interest. The purpose of the
data-structure descriptions is to parameterize the abstract domain of
the analyzer. The data-structure descriptions can be inferred
automatically when the structures involve no sharing or pointer
relations (as in doubly-linked lists or trees with parent pointers).
In the other hand, basic structures such as singly-linked trees or
binary trees can be inferred from their type definitions. Not all C
features are supported and the programs to analyze must be compilable
by the clang compiler. Moreover, MemCAD does not aim at handling all
features of the C programming language. The following C features are
not currently supported by MemCAD, although support of several of
these could be added easily (the main purpose of MemCAD is to design,
implement and assess memory abstractions, not numerical abstractions):

  • floating point variables;

  • inlined assembly code;

  • recursive functions;

  • function pointers;

  • pointers used as arrays (though core of the analyzer could
    handle this feature well, the frontend and typing phase currently
    do not).

Basic use

We consider a very simple program manipulating singly-linked lists to
illustrate a first analysis example, and assume that this example
program is stored in a file named "list.c" located in the
current directory:

struct elt* next;
  int data;
} elt;

typedef elt* list;

void main() {
  list l;
  int i = 0;
  _memcad("add_inductive(l,elt)");
  while(l != 0){
    l->data = i++;
    l = l->next;
  }
  _memcad("check_inductive(l,elt)");
}

First, let us note that the meta-command _memcad is used to
control the analysis, and state assumptions and proof goals using a
meta-syntax specific to the analyzer.
Two instances of this command are used here:

  • the first one consists of the meta-command add_inductive
    and requires the analyzer to assume at the point where it appears the
    value stored in variable l satisfies some inductive predicate in
    separation logic named elt (this predicate describes a structure
    that can either be inferred from type definitions or set manually; more
    details on this predicate will be provided below);

  • the second one consists of the meta-command check_inductive
    and requires the analysis to prove that the value stored in variable
    l still satisfies the inductive predicate named elt.

Essentially, those two commands are used here so as to specify an abstract
pre-condition (to assume) and an abstract post-condition (to prove).

Then, the analysis is launched by the command below:

./analyze -a -auto-ind list.c

The -a option instructs the executable to perform the analysis.
The -auto-ind option activates automatic inference of the set
of inductive definitions to be used for the analysis from the type
definitions in file "list.c".
This generates one inductive definition elt to be used as a
parameter of the abstract domain before actually launching analysis.
Thus, the execution is split into two phases:

  1. collection of inductive definitions computed from the type
    definitions (under the assumption that no sharing is used), and
    instantiation of the abstract domain; essentially, this phase
    computes an inductive definition in separation logic by
    a.elt ::= (emp, a=0) / (a.n -> b * a.d -> c * n.elt, a!=0)
    (internally, the analysis takes the memory layout into account
    and utilizes numeric offsets rather than field names)

  2. forward abstract interpretation of the program using this
    inductive definition as a parameter of the abstract domain, and
    starting from the entry point of the program with an empty memory
    state (no variable, and no heap space); this phase over-approximates
    soundly each variable definition and program statement;
    while doing so, the _memcad meta-commands respectively
    lets the analysis make a memory assumption (which would be satisfied
    if replaced by any code fragment that onstructs a valid structure,
    that can be described by elt), and requires it to verify
    a memory property.

Note that an automatically generated inductive definition is named
after the name of the struct it is derived from. In our example, the
struct is named elt, so elt must be used as
the inductive name in both the add_inductive and in the
check_inductive MemCAD commands.

The analyzer output will end like this:

[...]
Final status report (0.1901 sec spent)
- assertions proved:     1 (context)
- assertions not proved: 0 (context)

If the count on the line 'assertions not proved' is non zero, it means
the program could not be verified (i.e., at least one memory assertion
could not be proved).

The automatically generated inductive definition can be found in the file
named 'auto.ind'.

Advance use in the documentation

The documentation in pdf format provides detailed information about
more use cases, options, and the parameterization of the analysis.

Install
Published
22 Oct 2021
Sources
memcad-v1.1.0.tar.gz
sha512=7e37933fb3c2b67d166906f1ac38855b6819d4ce8f015f84b98012a513cf254eea5081b5d37fe69689dead565f8a1973418a023c9cee76426a5bdb93815c681f
Dependencies
Reverse Dependencies