Formal verification of the skip list of Tezos includes the work on verification of the following properties :
- Validity of a cell (containing
content
and a list ofback pointers
), - Validity of a dereferencing operator,
- Validity of encoding functions for
cell
andpointer
, - Validity of backpointers,
- Verification of
equality
definition, - Verification of group of functions and definitions serving the organization
of data-structure :
next
,genesis
,best_skip
,mem
,pp
and others, - Validity of a
back_path
(the list of backpointers), - Uniqueness of
express line
(back_path
, list of pointers, generated with usage ofbest_skip
function).
note
The skip-list is a relative of the linked list, but it has its own advantages
and characteristics. The skip-list consists of two paths : main
and optimized
.
the Main
path is a common sorted linked list, and optimized
path is an
express line
where elements are skipped. The latter gives skip-list a good
complexity for the main operations: O(log n). This is a noticeable
advantage that distinguishes skip-list from data structures with complexity
O(n), where search and insertion algorithms are forced to visit every node.
Axioms and Predicates​
In the Tezos' skip_list implementation the maintenance of dereferencing operator
and the last cell of the sequence is left to the client. Which means that for
us specifics of this operator and final cell are opaque. In order to verify
the properties of skip_list
with usage of these parties we had to give them
our specification.
There are several ways to fill such gaps: to create an axiom, to add a property or a validity predicate.
So in this case an axiom is what properties our object has. If we substitute into the axiom, an object that does not have the expected properties, we will, of course, get a contradiction.
Axiom index_ptr_eq : forall `{FArgs} {content_type ptr_type : Set}
(deref : ptr_type -> option (cell content_type ptr_type))
(cell_1_ptr cell_2_ptr : ptr_type)
(cell1 cell2 : cell content_type ptr_type),
deref cell_1_ptr = Some cell1 ->
deref cell_2_ptr = Some cell2 ->
cell1.(cell.index) =i cell2.(cell.index) = true ->
cell_1_ptr = cell_2_ptr.
Fact f : False.
Proof.
assert FArgs by
(constructor; unfold Skip_list_repr.S_Parameters; constructor; exact 0).
assert (cell unit bool) by
(constructor;
[ exact tt | constructor; [ exact nil | exact None ] | exact 0 ]).
pose (index_ptr_eq (fun b : bool => Some H0) true false H0 H0).
intuition.
Qed.
In our work we prefer validity predicates, and we add the predicates in our proofs.
Module Cell_Index.
Module Valid.
Definition t `{FArgs} {content_type ptr_type : Set}
(deref : ptr_type -> option (cell content_type ptr_type)) : Prop :=
forall (cell_1_ptr cell_2_ptr : ptr_type)
(cell1 cell2 : cell content_type ptr_type),
deref cell_1_ptr = Some cell1 ->
deref cell_2_ptr = Some cell2 ->
cell1.(cell.index) =i cell2.(cell.index) = true ->
cell_1_ptr = cell_2_ptr.
End Valid.
End Cell_Index.
Validity of a back_path
​
The function back_path
produces a valid path according to valid_back_path
.
Proof of two final lemmas devoted to express line
of the skip_list
: validity
of such path and its uniqueness.
The first step in this work was the creation of the so-called proof skeleton.
In the proof skeleton all auxiliary statements were extracted and formulated
as a separate lemmas. Generally we started at the last main lemma, going back,
defining smaller helper auxiliary lemmas, leaving them Admitted
for the first
draft of the proof.
At the heart of this procedure was the proof of correct work of auxiliary
function valid_path
. Proof was made by induction and required
one_step_of_reduction
to be retrieved as a separate stand-alone proved
statement.
Some functions from the skip_list_repr
code are marked by
#[bypass_check(guard)]
. This means that Coq can not guess the decreasing
argument of such functions, which in turn, means that the function is not total,
we have no guarantee that it will not diverge. We created Simulations
for such
functions, equipped by fuel (a natural number) in order to be able to use them
in proofs and reason about them. Our simulations do not have guards
, but we
had to axiomatize equality of initial functions and our simulations.
Having fuel in the definitions also gives us more room to maneuver. For example, restricting the amount of fuel with predicates gives us more flexibility later in proofs by induction, while using a dedicated, opaque function is easier and more reliable, since we don't need to have any idea of how much fuel we need.
Parameter bp_compute_fuel : forall `{FArgs} {A B : Set}
(deref : A -> option (cell B A)) (cell_ptr : A) (target_index : int),
Datatypes.nat.
During this work we proved many properties of path
, generated by
function back_path
, such as
- All elements of
path
can not beNone
, - Pointer to the
target index
is the last pointer in thepath
, - If
path
has just one pointer - thispath
is valid, - Function
pack_path
would always return at least one pointer, - Cell indices in the
path
are strictly ordered, - Special properties of functions
mem
,best_skip
,back_pointer
.
We devoted a separate blog-post to the process of verification of one of stated
functions - the function mem
.
Future work​
For future work, we would like to:
- Prove lemma
back_path_is_unique
. - Get rid of
axioms of equivalence
. As it was mentioned above, we had to axiomatize equality of initial functions and our simulations.
We hope that it will be possible to solve this issue with the RecDef
library.
replacing {struct}
parameter by {measure}
and convincing Coq that function will
stop work when {measure}
reached. For measure we can use the value of
target_cell_index
. This question requires additional research.