Verifying the implementation of the interpreter of Michelson smart contracts is of importance because one of the strengths of Tezos is to be able to formally verify the smart contracts. In order to make this verification sound, we need to show that the smart contracts verification framework ๐ฌ Mi-Cho-Coq is coherent with the actual implementation of Michelson. In this blog post, we show how we translated the Michelson interpreter from the OCaml language to Coq. This is the first step to show the coherence of Mi-Cho-Coq with the implementation.

The main file of the interpreter is script_interpreter.ml (around 1,800 lines). The abstract syntax tree of smart contracts is given in script_typed_ir.mli (around 1,600 lines). We translate the OCaml code to Coq using coq-of-ocaml. A difficulty in translating the interpreter is that it heavily relies on GADTs in OCaml, a feature that does not exist in Coq. We show how by adding code annotations and cast axioms we can do this translation.

The generated Coq code for the interpreter is in Script_interpreter.v.

## Why are GADTs hard to translate to Coqโ

Here is an example of GADT in OCaml, to have a tag-less representation of values:

`type _ ty =`

| Bool : bool ty

| Int : int ty

let sum (type a) (t : a ty) (x1 : a) (x2 : a) : a =

match t with

| Bool -> x1 || x2

| Int -> x1 + x2

The values `x1`

and `x2`

are tag-less (without the need to store a constructor in memory to differentiate between `bool`

and `int`

), but we can still pattern-match on them using the `_ ty`

typing information. The interpreter of Michelson uses this technique to have an optimized representation of values. If we naively translate this interpreter to Coq, we get:

`Inductive ty : Set -> Set :=`

| Bool : ty bool

| Int : ty int.

Definition sum {a : Set} (t_value : ty a) (x1 x2 : a) : a :=

match t_value with

| Bool => orb x1 x2

| Int => add x1 x2

end.

This does not compile with the error:

`In environment`

a : Set

t_value : ty a

x1 : a

x2 : a

The term "x1" has type "a" while it is expected to have type "bool".

A technique to go around this issue is to use the convoy pattern:

`Definition sum {a : Set} (t_value : ty a) : a -> a -> a :=`

match t_value with

| Bool => fun x1 x2 => orb x1 x2

| Int => fun x1 x2 => add x1 x2

end.

This works by getting the values `x1`

and `x2`

after the `match`

, once we know the type `a`

. We already see a non-trivial code change to make to do the translation to Coq (what happens if the variables `x1`

and `x2`

are also used before the `match`

, for example?). But even with such techniques, there are some programs that cannot be translated. For example, with a specialized version:

`let sum_only_int (t : int ty) (x1 : int) (x2 : int) : int =`

match t with

| Int -> x1 + x2

it will be very hard to tell Coq that the branch for `Bool`

is impossible, as this would require reasoning about equalities between types. Unless we use supplementary axioms, it is impossible to distinguish between types with the same cardinality in Coq.

##### note

With GADTs the OCaml compiler reasons by **unification** over the **types**. In Coq, with dependent types, we reason by **computing** over **values**. We believe this is why GADTs do not have a simple direct translation in dependently typed systems.

## How do we translate GADTsโ

With our tool `coq-of-ocaml`

, we choose to translate GADTs to Coq by:

- removing all the type parameters of the GADTs (these are there for decoration and do not impact the runtime behavior of the code);
- adding dynamic casts in Coq when needed, following user annotations in the OCaml source.

Here is how we annotate our `sum`

function:

`let sum (type a) (t : a ty) (x1 : a) (x2 : a) : a =`

match[@coq_match_gadt_with_result] t, x1, x2 with

| Bool, (x1 : bool), (x2 : bool) -> x1 || x2

| Int, (x1 : int), (x2 : int) -> x1 + x2

We add the @coq_match_gadt_with_result attribute on the `match`

, and add the two variables `x1`

and `x2`

in the `match`

to annotate them with their type for each matching branch. These type annotations are verified by the OCaml compiler. Then `coq-of-ocaml`

generates the following Coq code which compiles:

`Definition sum {a : Set} (t_value : ty) (x1 : a) (x2 : a) : a :=`

match (t_value, x1, x2) with

| (Bool, x1, x2) =>

let '[x2, x1] := cast [bool ** bool] [x2, x1] in

cast a (orb x1 x2)

| (Int, x1, x2) =>

let '[x2, x1] := cast [int ** int] [x2, x1] in

cast a (add x1 x2)

end.

We see that we cast all the values at the entrance and exit of each match branch. We define the `cast`

primitive as an axiom:

`Axiom cast : forall {A : Set} (B : Set), A -> B.`

together with the property that it behaves as the identity function when we know that the two types are equal:

`Axiom cast_eval : forall {A : Set} {x : A}, cast A x = x.`

We think that these axioms are reasonable as they are verified by the OCaml compiler during type-checking. While doing the proofs, this is up to the person writing the proof to show that types are equal in order to eliminate the `cast`

axioms. For impossible branches, we add an attribute `@`

:

`let sum_only_int (t : int ty) (x1 : int) (x2 : int) : int =`

match[@coq_match_with_default] t with

| Int -> x1 + x2

generating the translated code:

`Definition sum_only_int (t_value : ty) (x1 : int) (x2 : int) : int :=`

match t_value with

| Int => add x1 x2

| _ => unreachable_gadt_branch

end.

where `unreachable_gadt_branch`

is also an axiom, and should never be accessible given the right pre-conditions over `sum_only_int`

.

## Which changes did we make in the Tezos codeโ

We did the changes and annotations which we just described on these small examples on the Tezos code. Our changes are in a separated branch guillaume-claret@proto_alpha-coq-of-ocaml which we maintain separately in order not to impact the performance of the production code. We regularly rebase our changes on the `master`

branch in order to stay up to date with respect to the last changes.

Most of our changes were on the interpreter itself and a few ones on auxiliary functions, like the functions computing the gas cost of instructions in script_interpreter_defs.ml. The final Coq code for the interpreter is visible in Script_interpreter.v.

## Next stepsโ

Our next steps are to:

- Show that the
`cast`

axioms can be removed by writing a proven equal definition of the interpreter using dependent types. This will also simplify further reasoning on the interpreter. - Show various small properties about the interpreter (like that the gas cost of the instructions is always positive), and define the invariants about the Michelson AST.
- Show that the implementation of Michelson is coherent with Mi-Cho-Coq. We will do a proof by simulation for that.