We did quite a lot of work at the end of 2021. We present here our main results and progress on the formal verification of the protocol of Tezos. This effort aims to discover hidden bugs and prevent new ones, using mathematical methods and the proof system Coq.
Repr filesโ
There are (at the time of writing) 37 files ending with _repr.ml
in the Tezos protocol. These files describe basic data structures made to be stored in the storage system.
They contain the definition of many data-encodings to define serialization to binary or JSON format. We verified most of these data-encodings, to check that the serialization functions are bijections. We give an example of verification of _repr.ml
file in Receipt_repr.v. We designed an automated procedure to verify data-encodings. We describe this procedure in a previous blog post. For example, from receipt_repr.ml we verify the encoding balance_encoding
, composed of 189 lines of OCaml, fully automatically. The proof is in Receipt_repr.balance_update_encoding_is_valid.
We deduced many invariants on the Tezos data structures from the definitions of the data-encodings. Typical invariants are the positivity of numerical values or the constant length of hashes. We check that these invariants are valid for many primitive functions creating new values.
We verified the serialization functions of kind RPC_arg.v and Path_encoding.v. These functions serialize values for the API calls or the computation of paths in the key-value database of Tezos.
Storage systemโ
We started verifying the storage system storage.ml with specification and proofs in Storage.v. We mainly looked at the storage description Storage_description and Storage_generated. The goal is to show a simulation with standard OCaml data structures, as explained at the beginning of Storage.v. That is only a start for this task, and there is still a lot to do to verify the storage.
Michelson interpreterโ
We translated, using coq-of-ocaml, the current version of the Michelson interpreter to Coq in Script_interpreter.v
. This required adding the corresponding annotations in the OCaml code to translate the pattern-matching on GADTs. You can see these annotations in our branch in script_interpreter.ml. This file contains around 2.000 lines of OCaml code. We started the verification process of the interpreter defining a dependently typed version of the Michelson AST, but this work is still in progress. Having a dependently typed AST will help the verification by avoiding unsafe cast operations which are necessary for the translation of OCaml GADTs.
Integersโ
We formalized the various kinds of OCaml integers (int
, int32
, int64
, and Z.t
) with Coq's integer type Z. For bounded integers, we apply modulo operations on the result of each primitive, showing that the results stay within the expected bounds. We think that translating all the integers to the same type Z
simplifies the proofs by:
- limiting type conversions;
- allowing a direct usage of the automated tactic lia on linear arithmetic.
As an important application, we verify:
- many primitives from the saturation arithmetic library Saturation_repr.v, and
- the correctness of overflow checks, for example in Tez_repr.v.
Totalโ
As a result, we now have around 10.000 lines of Coq proofs. As a reference, there are about 45.000 lines of OCaml in the .ml
files of the protocol. We plan to continue our work, focusing on the Michelson interpreter and the storage system.