Skip to main content

Verifying the skip-list

· 5 min read

Formal verification of the skip list of Tezos includes the work on verification of the following properties :

  • Validity of a cell (containing content and a list of back pointers),
  • Validity of a dereferencing operator,
  • Validity of encoding functions for cell and pointer,
  • Validity of backpointers,
  • Verification of equality definition,
  • Verification of group of functions and definitions serving the organization of data-structure : next, genesis, best_skip, mem, pp and others,
  • Validity of a back_path (the list of backpointers),
  • Uniqueness of express line (back_path, list of pointers, generated with usage of best_skip function).

The skip-list is a relative of the linked list, but it has its own advantages and characteristics. The skip-list consists of two paths : main and optimized. the Main path is a common sorted linked list, and optimized path is an express line where elements are skipped. The latter gives skip-list a good complexity for the main operations: O(log n). This is a noticeable advantage that distinguishes skip-list from data structures with complexity O(n), where search and insertion algorithms are forced to visit every node.

Axioms and Predicates

In the Tezos' skip_list implementation the maintenance of dereferencing operator and the last cell of the sequence is left to the client. Which means that for us specifics of this operator and final cell are opaque. In order to verify the properties of skip_list with usage of these parties we had to give them our specification.

There are several ways to fill such gaps: to create an axiom, to add a property or a validity predicate.

So in this case an axiom is what properties our object has. If we substitute into the axiom, an object that does not have the expected properties, we will, of course, get a contradiction.

Axiom index_ptr_eq : forall `{FArgs} {content_type ptr_type : Set}
(deref : ptr_type -> option (cell content_type ptr_type))
(cell_1_ptr cell_2_ptr : ptr_type)
(cell1 cell2 : cell content_type ptr_type),
deref cell_1_ptr = Some cell1 ->
deref cell_2_ptr = Some cell2 ->
cell1.(cell.index) =i cell2.(cell.index) = true ->
cell_1_ptr = cell_2_ptr.

Fact f : False.
assert FArgs by
(constructor; unfold Skip_list_repr.S_Parameters; constructor; exact 0).
assert (cell unit bool) by
[ exact tt | constructor; [ exact nil | exact None ] | exact 0 ]).
pose (index_ptr_eq (fun b : bool => Some H0) true false H0 H0).

In our work we prefer validity predicates, and we add the predicates in our proofs.

Module Cell_Index.
Module Valid.
Definition t `{FArgs} {content_type ptr_type : Set}
(deref : ptr_type -> option (cell content_type ptr_type)) : Prop :=
forall (cell_1_ptr cell_2_ptr : ptr_type)
(cell1 cell2 : cell content_type ptr_type),
deref cell_1_ptr = Some cell1 ->
deref cell_2_ptr = Some cell2 ->
cell1.(cell.index) =i cell2.(cell.index) = true ->
cell_1_ptr = cell_2_ptr.
End Valid.
End Cell_Index.

Validity of a back_path

The function back_path produces a valid path according to valid_back_path. Proof of two final lemmas devoted to express line of the skip_list: validity of such path and its uniqueness.

The first step in this work was the creation of the so-called proof skeleton. In the proof skeleton all auxiliary statements were extracted and formulated as a separate lemmas. Generally we started at the last main lemma, going back, defining smaller helper auxiliary lemmas, leaving them Admitted for the first draft of the proof.

At the heart of this procedure was the proof of correct work of auxiliary function valid_path. Proof was made by induction and required one_step_of_reduction to be retrieved as a separate stand-alone proved statement.

Some functions from the skip_list_repr code are marked by #[bypass_check(guard)]. This means that Coq can not guess the decreasing argument of such functions, which in turn, means that the function is not total, we have no guarantee that it will not diverge. We created Simulations for such functions, equipped by fuel (a natural number) in order to be able to use them in proofs and reason about them. Our simulations do not have guards, but we had to axiomatize equality of initial functions and our simulations.

Having fuel in the definitions also gives us more room to maneuver. For example, restricting the amount of fuel with predicates gives us more flexibility later in proofs by induction, while using a dedicated, opaque function is easier and more reliable, since we don't need to have any idea of how much fuel we need.

Parameter bp_compute_fuel : forall `{FArgs} {A B : Set}
(deref : A -> option (cell B A)) (cell_ptr : A) (target_index : int),

During this work we proved many properties of path, generated by function back_path, such as

  • All elements of path can not be None,
  • Pointer to the target index is the last pointer in the path,
  • If path has just one pointer - this path is valid,
  • Function pack_path would always return at least one pointer,
  • Cell indices in the path are strictly ordered,
  • Special properties of functions mem, best_skip, back_pointer.

We devoted a separate blog-post to the process of verification of one of stated functions - the function mem.

Future work

For future work, we would like to:

  • Prove lemma back_path_is_unique.
  • Get rid of axioms of equivalence. As it was mentioned above, we had to axiomatize equality of initial functions and our simulations.

We hope that it will be possible to solve this issue with the RecDef library. replacing {struct} parameter by {measure} and convincing Coq that function will stop work when {measure} reached. For measure we can use the value of target_cell_index. This question requires additional research.