Skip to main content

Translating the Michelson interpreter

ยท 7 min read
Guillaume Claret

Verifying the implementation of the interpreter of Michelson smart contracts is of importance because one of the strengths of Tezos is to be able to formally verify the smart contracts. In order to make this verification sound, we need to show that the smart contracts verification framework ๐Ÿฌ Mi-Cho-Coq is coherent with the actual implementation of Michelson. In this blog post, we show how we translated the Michelson interpreter from the OCaml language to Coq. This is the first step to show the coherence of Mi-Cho-Coq with the implementation.

The main file of the interpreter is script_interpreter.ml (around 1,800 lines). The abstract syntax tree of smart contracts is given in script_typed_ir.mli (around 1,600 lines). We translate the OCaml code to Coq using coq-of-ocaml. A difficulty in translating the interpreter is that it heavily relies on GADTs in OCaml, a feature that does not exist in Coq. We show how by adding code annotations and cast axioms we can do this translation.

The generated Coq code for the interpreter is in Script_interpreter.v.

Why are GADTs hard to translate to Coqโ€‹

Here is an example of GADT in OCaml, to have a tag-less representation of values:

type _ ty =
| Bool : bool ty
| Int : int ty

let sum (type a) (t : a ty) (x1 : a) (x2 : a) : a =
match t with
| Bool -> x1 || x2
| Int -> x1 + x2

The values x1 and x2 are tag-less (without the need to store a constructor in memory to differentiate between bool and int), but we can still pattern-match on them using the _ ty typing information. The interpreter of Michelson uses this technique to have an optimized representation of values. If we naively translate this interpreter to Coq, we get:

Inductive ty : Set -> Set :=
| Bool : ty bool
| Int : ty int.

Definition sum {a : Set} (t_value : ty a) (x1 x2 : a) : a :=
match t_value with
| Bool => orb x1 x2
| Int => add x1 x2
end.

This does not compile with the error:

In environment
a : Set
t_value : ty a
x1 : a
x2 : a
The term "x1" has type "a" while it is expected to have type "bool".

A technique to go around this issue is to use the convoy pattern:

Definition sum {a : Set} (t_value : ty a) : a -> a -> a :=
match t_value with
| Bool => fun x1 x2 => orb x1 x2
| Int => fun x1 x2 => add x1 x2
end.

This works by getting the values x1 and x2 after the match, once we know the type a. We already see a non-trivial code change to make to do the translation to Coq (what happens if the variables x1 and x2 are also used before the match, for example?). But even with such techniques, there are some programs that cannot be translated. For example, with a specialized version:

let sum_only_int (t : int ty) (x1 : int) (x2 : int) : int =
match t with
| Int -> x1 + x2

it will be very hard to tell Coq that the branch for Bool is impossible, as this would require reasoning about equalities between types. Unless we use supplementary axioms, it is impossible to distinguish between types with the same cardinality in Coq.

note

With GADTs the OCaml compiler reasons by unification over the types. In Coq, with dependent types, we reason by computing over values. We believe this is why GADTs do not have a simple direct translation in dependently typed systems.

How do we translate GADTsโ€‹

With our tool coq-of-ocaml, we choose to translate GADTs to Coq by:

  • removing all the type parameters of the GADTs (these are there for decoration and do not impact the runtime behavior of the code);
  • adding dynamic casts in Coq when needed, following user annotations in the OCaml source.

Here is how we annotate our sum function:

let sum (type a) (t : a ty) (x1 : a) (x2 : a) : a =
match[@coq_match_gadt_with_result] t, x1, x2 with
| Bool, (x1 : bool), (x2 : bool) -> x1 || x2
| Int, (x1 : int), (x2 : int) -> x1 + x2

We add the @coq_match_gadt_with_result attribute on the match, and add the two variables x1 and x2 in the match to annotate them with their type for each matching branch. These type annotations are verified by the OCaml compiler. Then coq-of-ocaml generates the following Coq code which compiles:

Definition sum {a : Set} (t_value : ty) (x1 : a) (x2 : a) : a :=
match (t_value, x1, x2) with
| (Bool, x1, x2) =>
let '[x2, x1] := cast [bool ** bool] [x2, x1] in
cast a (orb x1 x2)
| (Int, x1, x2) =>
let '[x2, x1] := cast [int ** int] [x2, x1] in
cast a (add x1 x2)
end.

We see that we cast all the values at the entrance and exit of each match branch. We define the cast primitive as an axiom:

Axiom cast : forall {A : Set} (B : Set), A -> B.

together with the property that it behaves as the identity function when we know that the two types are equal:

Axiom cast_eval : forall {A : Set} {x : A}, cast A x = x.

We think that these axioms are reasonable as they are verified by the OCaml compiler during type-checking. While doing the proofs, this is up to the person writing the proof to show that types are equal in order to eliminate the cast axioms. For impossible branches, we add an attribute @:

let sum_only_int (t : int ty) (x1 : int) (x2 : int) : int =
match[@coq_match_with_default] t with
| Int -> x1 + x2

generating the translated code:

Definition sum_only_int (t_value : ty) (x1 : int) (x2 : int) : int :=
match t_value with
| Int => add x1 x2
| _ => unreachable_gadt_branch
end.

where unreachable_gadt_branch is also an axiom, and should never be accessible given the right pre-conditions over sum_only_int.

Which changes did we make in the Tezos codeโ€‹

We did the changes and annotations which we just described on these small examples on the Tezos code. Our changes are in a separated branch guillaume-claret@proto_alpha-coq-of-ocaml which we maintain separately in order not to impact the performance of the production code. We regularly rebase our changes on the master branch in order to stay up to date with respect to the last changes.

Most of our changes were on the interpreter itself and a few ones on auxiliary functions, like the functions computing the gas cost of instructions in script_interpreter_defs.ml. The final Coq code for the interpreter is visible in Script_interpreter.v.

Next stepsโ€‹

Our next steps are to:

  1. Show that the cast axioms can be removed by writing a proven equal definition of the interpreter using dependent types. This will also simplify further reasoning on the interpreter.
  2. Show various small properties about the interpreter (like that the gas cost of the instructions is always positive), and define the invariants about the Michelson AST.
  3. Show that the implementation of Michelson is coherent with Mi-Cho-Coq. We will do a proof by simulation for that.