Skip to main content

Michelson

Gitlab

The main difficulty in verifying the implementation of Michelson is the use of GADTs in OCaml. For the OCaml developers, it gives:

  • guaranty that the type-checker and interpreter produce well-typed terms;
  • tagless matching on the values.

On our side, this:

  • complexifies the maintenance for the translation with coq-of-ocaml;
  • generates a lot of cast and cast_exists dynamic casts in the match, preventing the proof work.

As a solution we:

  • manually define a dependent version of the Michelson's implementation, doing the translation from GADTs to dependent types by hand;
  • formally verify that this dependent translation is equivalent to the OCaml version to make sure it is correct.

We have to do this translation to dependent types by hand as coq-of-ocaml does not support it. This is also the opportunity to clean a few other things, like writing definitions with verified termination.

Simulations

The way we proceed is the following:

  1. We re-define everything in a Simulations/ folder, prefixing all the new definitions with dep_ for dependent-type (and avoiding name collisions) to manually translate the GADTs of Michelson. For the AST, we use the notion of "type family" to have types parameterized by values rather than other types.
  2. We show that these new definitions are equal to the OCaml definitions with the casts. These equality lemmas are the only place where we talk about these casts. As other proofs, they are in the Proofs/ folder.
  3. Once this is done, we can reason about the dependently typed code for Michelson.

Architecture of the simulations

The starting file is Script_family with the type Ty.t describing the types of values that can appear in Michelson. With Ty.t come:

  • Stack_ty.t: like Ty.t but for the stacks, so a list of Ty.t,
  • Ty_to_set : the actual Set implementing the values corresponding to a certain Ty.t in the GADT version of the interpreter with cast. These are the types of values on which the interpreter computes.
  • With_family.ty_to_dep_Set: the actual Set implementing the values corresponding to a certain Ty.t in the dependent version of the interpreter. This type is almost the same as Ty.to_Set apart from:
    • the lambda type, as in the dependent version we need annotations on the input and output stacks;
    • the set, map, and big_map types, as we avoid using existential types for the implementation of these data structures in the simulation to simplify the proofs;
    • the contract type that is parametrized by its type.
  • All the usual types used in the OCaml implementation, parametrized by Ty.t instead of a polymorphic type used by a GADT in OCaml. Many of these types are defined in Simulations/Script_typed_ir.v. For each of them, we also provide a conversion function back to the non-dependent version (the reverse direction is impossible in general). For example, there are both:

A tricky thing is the handling of the Script_comparable.dep_compare_comparable function over Michelson values. We need this function to define the dependent version of set, map, and big_map types using the corresponding _Set.Make and Map.Make functors from OCaml. What we do:

  • we define this comparison function over Ty.to_Set, because at this point With_family.ty_to_dep_Set is not yet defined since it depends on the sets and maps types;
  • ideally, it should be defined over With_family.ty_to_dep_Set but we use the fact that the two coincide over comparable types;
  • we use the two functions With_family.to_value and With_family.of_value to go back and forth from With_family.ty_to_dep_Set to Ty.to_Set as the key values (on which the comparison function runs) do not live in the same space as the other values. The function With_family.of_value is partial as it is only complete on comparable types, but this is enough for the keys. Compatibility lemmas between to_value and of_value are given in Proofs/Script_typed_ir.v.

The function With_family.to_value is important to go from values of the dependent interpreter to the GADT interpreter.

Simulation proof

The main technique to write simulations is to:

  • look at the OCaml code;
  • look at the translated Coq code;
  • look at other examples.

The main goal is to remove axioms, in particular:

  • cast/cast_exists
  • unreachable_gadt_branch
  • non-termination
  • assert or other side-effects

We take an example to see how to define the simulations and simulation proofs with unparse_comparable_data in script_ir_translator.ml. In OCaml the code is:

let[@coq_struct "ty"] rec unparse_comparable_data :
type a loc.
loc:loc ->
context ->
unparsing_mode ->
a comparable_ty ->
a ->
(loc Script.michelson_node * context) tzresult Lwt.t =
fun ~loc ctxt mode ty a ->

In the translated Coq with GADTs:

#[bypass_check(guard)]
Fixpoint unparse_comparable_data {loc a : Set}
(loc_value : loc) (ctxt : Alpha_context.context) (mode : unparsing_mode)
(ty : Script_typed_ir.comparable_ty) (a_value : a) {struct ty}
: M? (Alpha_context.Script.michelson_node loc × Alpha_context.context) :=

We write the simulation as follows:

Fixpoint dep_unparse_comparable_data {loc : Set} {a}
(loc_value : loc) (ctxt : Alpha_context.context) (mode : unparsing_mode)
(ty : With_family.ty a) (a_value : With_family.ty_to_dep_Set a) {struct ty}
: M? (Alpha_context.Script.michelson_node loc × Alpha_context.context) :=

We lost the fact that the type should be comparable, but we will get that property back in the proofs as a pre-condition. Note how we translated:

  • types parametrized by a,
  • the type a alone.

We also used this simulation to have a definition that terminates without disabling the termination checker with #[bypass_check(guard)]. The lemma to verify this simulation is stated as follows:

Fixpoint dep_unparse_comparable_data_eq {a} {loc : Set}
(l_value : loc) (ctxt : Alpha_context.context) (mode : unparsing_mode)
(ty : With_family.ty a) (value : With_family.ty_to_dep_Set a) {struct ty} :
With_family.is_Comparable ty →
dep_unparse_comparable_data l_value ctxt mode ty value =
unparse_comparable_data l_value ctxt mode
(With_family.to_ty ty) (With_family.to_value value).
Proof.
(* ... *)

Note how we use With_family.to_ty and With_family.to_value to convert elements from the dependent world to the GADT world.

Eliminate cast and cast_exists

The Coq translation of code using a lot of GADTs uses a lot of cast and cast_exists axioms. They are defined in the Basics.v file of coq-of-ocaml. These axioms make dynamic casts in Coq and are necessary as we remove the type parameters of the GADTs during the translation (without these casts, the match operations in Coq would not have enough information to type-check). When doing the proofs, we need to show that these axioms are actually like the identity. Generally, to eliminate the casts in a function f we proceed in two steps:

  1. defining a function dep_f in the Simulations/ folder, using dependent types and no casts;
  2. showing the two are equal, with a proof in the Proofs/ folder.

All the cast can be eliminated with a rewrite cast_eval, unless there is an error in the user's dependent definitions. The cast_exists is harder, as we need to give values for the existential variables, and the unification of Coq sometimes needs help to find them. There are a few tactics to help:

  • rewrite_cast_exists_eval_eq from coq-of-ocaml but it does not work very well;
  • rewrite_cast_exists from /docs/environment/v7 that works better (recommended approach);
  • sometimes we need to do a match goal by hand. Here is an example of match goal that was useful:
match goal with
| _ : ?t |- context[cast_exists ?T _] =>
rewrite (cast_exists_eval_1 (T := T) (E1 := t))
end

The axiom cast_exists_eval_1 is defined in /docs/environment/v7 too.