Skip to main content

· 4 min read
Tait van Strien

Tezos uses the json-data-encoding library throughout the protocol to serialize data for use in RPC, writing to disk or placing in a block. RPC is a synchronous operation. Consequently, it is paramount to verify the correctness of code as well as the logical equivalence and semantics of performance-enhanced procedures to proven ones.

This library has been subject to stack overflows, as seen in issues like this. We will translate and verify this library to validate its logic and simplify debugging these issues.

· 5 min read

After each new release of the protocol there comes a need to validate it. One of the options would be to start from scratch, but fortunately there is no need for that since many parts of the protocol are only slightly changed or stay the same. That allows us to reuse old proofs in consequent versions and speed up the validation.

Of course there is a small number of proofs which stop working due to the changes and that has to be repaired. This process can sometimes be tricky since it requires an understanding of how different functions and proofs depend on each other, but that makes it very interesting. In this post, we will show examples of broken proofs, discuss reasons for their failure, and show techniques we used to fix issues.

· 7 min read
Guillaume Claret

Property based tests are a way to find bugs in the code by running a boolean function (representing an expected property) on randomly generated inputs. In this blog post, we show how we translate these tests to corresponding Coq theorems. Once we have a formal representation, we prove them correct on any inputs. Thanks to this technique, we can:

  • have a bridge between the specification given by the tests and the specification in Coq;
  • make sure that the properties verified in the tests are true with a mathematical degree of certainty.

We have done this work with students from the University of Groningen:

· 5 min read
Guillaume Claret

The backward compatibility is an important property to maintain between two releases of the protocol of Tezos. For example, this ensures that existing smart contracts still work after a protocol upgrade. That property can be hard to ensure by the protocol developers as the codebase's size increases. At the time of writing, there are 86,346 lines of code in src/proto_alpha/lib_protocol in the master branch of Tezos, counting all the .ml and .mli files.

We propose using formal verification to check that the backward compatibility is preserved after each protocol release, with a mathematical degree of certainty. More precisely, we show that for every inputs and storage contexts, a smart contract working with version Jakarta of Tezos will generate the exact same output running on the next version K. Even if there are infinitely many possible inputs, we can verify all of them by writing a mathematical proof in the formal system Coq.

In this blog post, we describe our plan for this verification effort.

· 8 min read
Natalie Klaus

OCaml developers at Tezos often use fold_left function in the code. Maybe the reason is that it is a bit easier to write efficient code using fold_left than fold_right.

fold_left and fold_right from OCaml library: List

val fold_left : ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a

fold_left f init [b1; ...; bn] is f (... (f (f init b1) b2) ...) bn.

val fold_right : ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b

fold_right f [a1; ...; an] init is f a1 (f a2 (... (f an init) ...)).
Not tail-recursive.

And from formal verification point of view, it is much easier to prove statements about fold_right either than about fold_left. Why? The reason is in how these functions are organized. Formal verification often uses mathematical induction as an instrument for proving statements on the lists.

· 4 min read
Natasha Klaus

As you may already know from the article Simulations - dependently-typed version, in the Proto_alpha folder we keep code, generated by the coq-of-ocaml. Many of the fixpoints here are preceded by the annotation #[bypass_check(guard)], which means that guard checking is locally disabled. And this means that we can not consider these functions to be total. In such cases, proof engineers usually feel irritation, because "total" is one of their favorite words. We know that total functions never crush and they always return a well-typed result within a finite time.


Thanks to Alan Turing we know that the halting problem - the difficulty of determining whether a specific program terminates or not - is undecidable. Coq can't determine if a function is total in general, according to the original definition of totality. Instead, it analyzes a function's syntax and makes a conservative approximation.

Our current goal is that the doubles defined in the folder Simulations, whose equivalence with their prototypes from Proto_alpha should be proved, would be total (have enabled Coqs guard checking on fixpoint). We already mentioned the Guard Checking question here.

Coq is analyzing the function's syntax to consider it to be total. One way to convince Coq that a function is total is to show that there is a decreasing argument that converges to the base case.

Let's consider the function parse_ty_aux from Proto_alpha and its clone dep_parse_ty_aux from the Simulations folder. parse_ty_aux is having {struct node_value} annotation. The point of the {struct ident} annotation is to tell the Coq, which argument decreases along the recursive calls. node_value : Alpha_context.Script.node is designed in such a way that it is not trivial to show that the function converges. Coq can not detect that each and every recursion call is done on the direct subterm of a given node_value argument (and it is indeed actually not the case).

· 7 min read
Guillaume Claret

In OCaml, we create a compare function for many data structures to have:

  • an ordering, and:
  • an equality function.

These compare functions should behave as follows:

  • compare x y returns -1 when x is "lesser" than y;
  • compare x y returns 0 when x is "equal" to y;
  • compare x y returns 1 when x is "greater" than y.

The compare functions are useful to implement many algorithms and derive data structures such as maps and sets. Having an issue in the consistency of such functions can lead to bugs in the code using them. An example of inconsistency is to have both:

  • compare x y = 1 and:
  • compare y x = 1

for some values of x and y. A complex example of compare function in the code of Tezos is located in the Script_comparable.v file to compare arbitrary Michelson values.

In this blog post we will see how we specify the compare functions and verify them.

· 6 min read
Natasha Klaus

During our work on verification of Tezos source code, we faced a problem: some proofs were taking too long time to execute. For example, function which proves equality of parse_ty_aux and dep_parse_ty_aux was executing about 5 minutes, according to CI (Not acceptable! We couldn't even merge this proof!).

For the information why we actually needed to prove the equality of parse_ty_aux and dep_parse_ty_aux, and why we had to create a function-simulation dep_parse_ty_aux, please check this article: Simulations - dependently-typed version

What is the reason of such a long execution period? First of all the length of original parse_ty_aux is about 500 lines of code. So the term itself is rather big, and each step of its reduction is a time-taking procedure.

Many other functions that also need to be verified are even longer, for example the Fixpoint parse_data_aux is having 4932 lines of code. Therefore, it is very important that the proofs for such a long functions are optimized.

· 5 min read
Natasha Klaus

During our work on verification of Tezos source code, we run into an interesting question: how to implement OCaml's GADTs in Coq? The essence of this problem, as well as its solution, were described in the article: Translating the Michelson interpreter where we show how we introduce casts in the generated Coq code.

In this article, we will consider subsequent work in this direction. To reach one of our main goals, proof of the compatibility of the parsing and unparsing functions for the Michelson types and values, we need to avoid casts. A powerful Coq's instrument, the dependent types, will help us to solve the casts problem. We create a kind of clone of coq-of-ocaml's code, which is a dependently typed version.

The whole process can be structured as follows:

  • the Simulations folder (along with its purpose and content),
  • definition of dependent types,
  • definition of functions with dependent types (analogs of functions with casts),
  • proof of the equivalence between functions with dependent types and functions with casts,
  • proof of assertions and properties for functions with dependent types.

and this is how it is all organized in the repository:

  • we keep files, generated by coq-of-ocaml in folder Proto_alpha
  • we keep dependently-defined types and functions in folder Simulations
  • we keep our proofs in folder Proofs

· 4 min read
Natasha Klaus

For the Proof Engineer, the formalization of code in Coq is like an amazing adventure. But sometimes it can be tedious and monotonous due to the same repetitive actions that programmers should perform. Fortunately, Coq provides us with proof automation which helps us significantly increase our productivity.

In this post, we will show how we use/write tactics in our work on Tezos code formalization. In particular, we will touch the following topics:

  • presentation of some features of Ltac (the Coq's tactic language),
  • recursive proof search,
  • built-in Coq tactics