Michelson
The main difficulty in verifying the implementation of Michelson is the use of GADTs in OCaml. For the OCaml developers, it gives:
- guaranty that the type-checker and interpreter produce well-typed terms;
- tagless matching on the values.
On our side, this:
- complexifies the maintenance for the translation with
coq-of-ocaml
; - generates a lot of
cast
andcast_exists
dynamic casts in thematch
, preventing the proof work.
As a solution we:
- manually define a dependent version of the Michelson's implementation, doing the translation from GADTs to dependent types by hand;
- formally verify that this dependent translation is equivalent to the OCaml version to make sure it is correct.
We have to do this translation to dependent types by hand as coq-of-ocaml
does not support it. This is also the opportunity to clean a few other things, like writing definitions with verified termination.
Simulations​
The way we proceed is the following:
- We re-define everything in a
Simulations/
folder, prefixing all the new definitions withdep_
for dependent-type (and avoiding name collisions) to manually translate the GADTs of Michelson. For the AST, we use the notion of "type family" to have types parameterized by values rather than other types. - We show that these new definitions are equal to the OCaml definitions with the casts. These equality lemmas are the only place where we talk about these casts. As other proofs, they are in the
Proofs/
folder. - Once this is done, we can reason about the dependently typed code for Michelson.
Architecture of the simulations​
The starting file is Script_family
with the type Ty.t
describing the types of values that can appear in Michelson. With Ty.t
come:
Stack_ty.t
: likeTy.t
but for the stacks, so a list ofTy.t
,Ty_to_set
: the actualSet
implementing the values corresponding to a certainTy.t
in the GADT version of the interpreter withcast
. These are the types of values on which the interpreter computes.- With_family.ty_to_dep_Set: the actual
Set
implementing the values corresponding to a certainTy.t
in the dependent version of the interpreter. This type is almost the same asTy.to_Set
apart from:- the
lambda
type, as in the dependent version we need annotations on the input and output stacks; - the
set
,map
, andbig_map
types, as we avoid using existential types for the implementation of these data structures in the simulation to simplify the proofs; - the
contract
type that is parametrized by its type.
- the
- All the usual types used in the OCaml implementation, parametrized by
Ty.t
instead of a polymorphic type used by a GADT in OCaml. Many of these types are defined in Simulations/Script_typed_ir.v. For each of them, we also provide a conversion function back to the non-dependent version (the reverse direction is impossible in general). For example, there are both:- With_family.ty a type of signature
Ty.t -> Set
corresponding to Script_typed_ir.ty - With_family.to_ty a function of signature
forall {a : Ty.t}, With_family.ty a -> Script_typed_ir.ty
- With_family.ty a type of signature
A tricky thing is the handling of the Script_comparable.dep_compare_comparable function over Michelson values. We need this function to define the dependent version of set
, map
, and big_map
types using the corresponding _Set.Make
and Map.Make
functors from OCaml. What we do:
- we define this comparison function over
Ty.to_Set
, because at this pointWith_family.ty_to_dep_Set
is not yet defined since it depends on the sets and maps types; - ideally, it should be defined over
With_family.ty_to_dep_Set
but we use the fact that the two coincide over comparable types; - we use the two functions With_family.to_value and With_family.of_value to go back and forth from
With_family.ty_to_dep_Set
toTy.to_Set
as the key values (on which the comparison function runs) do not live in the same space as the other values. The functionWith_family.of_value
is partial as it is only complete on comparable types, but this is enough for the keys. Compatibility lemmas betweento_value
andof_value
are given in Proofs/Script_typed_ir.v.
The function With_family.to_value
is important to go from values of the dependent interpreter to the GADT interpreter.
Simulation proof​
The main technique to write simulations is to:
- look at the OCaml code;
- look at the translated Coq code;
- look at other examples.
The main goal is to remove axioms, in particular:
cast
/cast_exists
unreachable_gadt_branch
- non-termination
assert
or other side-effects
We take an example to see how to define the simulations and simulation proofs with unparse_comparable_data
in script_ir_translator.ml
. In OCaml the code is:
let[@coq_struct "ty"] rec unparse_comparable_data :
type a loc.
loc:loc ->
context ->
unparsing_mode ->
a comparable_ty ->
a ->
(loc Script.michelson_node * context) tzresult Lwt.t =
fun ~loc ctxt mode ty a ->
In the translated Coq with GADTs:
#[bypass_check(guard)]
Fixpoint unparse_comparable_data {loc a : Set}
(loc_value : loc) (ctxt : Alpha_context.context) (mode : unparsing_mode)
(ty : Script_typed_ir.comparable_ty) (a_value : a) {struct ty}
: M? (Alpha_context.Script.michelson_node loc × Alpha_context.context) :=
We write the simulation as follows:
Fixpoint dep_unparse_comparable_data {loc : Set} {a}
(loc_value : loc) (ctxt : Alpha_context.context) (mode : unparsing_mode)
(ty : With_family.ty a) (a_value : With_family.ty_to_dep_Set a) {struct ty}
: M? (Alpha_context.Script.michelson_node loc × Alpha_context.context) :=
We lost the fact that the type should be comparable, but we will get that property back in the proofs as a pre-condition. Note how we translated:
- types parametrized by
a
, - the type
a
alone.
We also used this simulation to have a definition that terminates without disabling the termination checker with #[bypass_check(guard)]
. The lemma to verify this simulation is stated as follows:
Fixpoint dep_unparse_comparable_data_eq {a} {loc : Set}
(l_value : loc) (ctxt : Alpha_context.context) (mode : unparsing_mode)
(ty : With_family.ty a) (value : With_family.ty_to_dep_Set a) {struct ty} :
With_family.is_Comparable ty →
dep_unparse_comparable_data l_value ctxt mode ty value =
unparse_comparable_data l_value ctxt mode
(With_family.to_ty ty) (With_family.to_value value).
Proof.
(* ... *)
Note how we use With_family.to_ty
and With_family.to_value
to convert elements from the dependent world to the GADT world.
Eliminate cast
and cast_exists
​
The Coq translation of code using a lot of GADTs uses a lot of cast
and cast_exists
axioms. They are defined in the Basics.v file of coq-of-ocaml
. These axioms make dynamic casts in Coq and are necessary as we remove the type parameters of the GADTs during the translation (without these casts, the match
operations in Coq would not have enough information to type-check). When doing the proofs, we need to show that these axioms are actually like the identity. Generally, to eliminate the casts in a function f
we proceed in two steps:
- defining a function
dep_f
in theSimulations/
folder, using dependent types and no casts; - showing the two are equal, with a proof in the
Proofs/
folder.
All the cast
can be eliminated with a rewrite cast_eval
, unless there is an error in the user's dependent definitions. The cast_exists
is harder, as we need to give values for the existential variables, and the unification of Coq sometimes needs help to find them. There are a few tactics to help:
rewrite_cast_exists_eval_eq
fromcoq-of-ocaml
but it does not work very well;rewrite_cast_exists
from /docs/environment/v7 that works better (recommended approach);- sometimes we need to do a
match goal
by hand. Here is an example ofmatch goal
that was useful:
match goal with
| _ : ?t |- context[cast_exists ?T _] =>
rewrite (cast_exists_eval_1 (T := T) (E1 := t))
end
The axiom cast_exists_eval_1
is defined in /docs/environment/v7 too.