Skip to main content

Verifying a tail recursive definition

ยท 5 min read
Guillaume Claret

An issue with recursive functions is that they may raise a stack-overflow error if the number of recursive calls is too high. To avoid these errors, many functions in Tezos are written in tail-recursive style. This style is usually harder to read and this could lead to mistakes in the code. We show in this post how we verify a tail-recursive function by comparing it to its naive non tail-recursive version.

What to verifyโ€‹

In the protocol file script_repr.ml we define the following function to compute the size of a Michelson expression:

let rec micheline_fold_aux node f acc k =
match node with
| Micheline.Int (_, _) -> k (f acc node)
| Micheline.String (_, _) -> k (f acc node)
| Micheline.Bytes (_, _) -> k (f acc node)
| Micheline.Prim (_, _, subterms, _) ->
micheline_fold_nodes subterms f (f acc node) k
| Micheline.Seq (_, subterms) ->
micheline_fold_nodes subterms f (f acc node) k

and[@coq_mutual_as_notation] [@coq_struct "subterms"] micheline_fold_nodes
subterms f acc k =
match subterms with
| [] -> k acc
| node :: nodes ->
micheline_fold_nodes nodes f acc @@ fun acc ->
micheline_fold_aux node f acc k

let micheline_fold node init f = micheline_fold_aux node f init (fun x -> x)

let micheline_nodes node = micheline_fold node 0 @@ fun n _ -> n + 1

We first define a generic micheline_fold fold function over Michelson nodes, using a continuation-passing style. The continuation is the function k, representing what remains to be done. We then define the micheline_nodes function to compute the number of nodes per Michelson expression.

We would like to show that this definition is equivalent to the following Coq code, where the recursion is written in a more direct style:

Module Reference_micheline_nodes.
Reserved Notation "'micheline_nodes_list".

Fixpoint micheline_nodes (node : Script_repr.node) : int :=
let micheline_nodes_list := 'micheline_nodes_list in
match node with
| Micheline.Int _ _ | Micheline.String _ _ | Micheline.Bytes _ _ => 1
| Micheline.Prim _ _ subterms _ | Micheline.Seq _ subterms =>
micheline_nodes_list subterms +i 1
end

where "'micheline_nodes_list" :=
(fix micheline_nodes_list
(subterms : list Script_repr.node) {struct subterms} : int :=
match subterms with
| [] => 0
| cons n nodes => micheline_nodes n +i micheline_nodes_list nodes
end).

Definition micheline_nodes_list := 'micheline_nodes_list.
End Reference_micheline_nodes.

The translationโ€‹

We use the @coq_mutual_as_notation annotation in OCaml to translate the mutually recursive definition to Coq. With this attribute, we generate a recursive definition inside another recursive definition, instead of a mutually recursive definition. This is simpler to reason about than a mutually recursive definition. We use a notation to make the two recursive definitions available at top-level.

Here is the generated Coq translation, available in src/Proto_alpha/Script_repr.v:

Reserved Notation "'micheline_fold_nodes".

Fixpoint micheline_fold_aux {A B : Set}
(node_value : node) (f : A -> node -> A) (acc_value : A) (k : A -> B) : B :=
let micheline_fold_nodes {A B} := 'micheline_fold_nodes A B in
match node_value with
| Micheline.Int _ _ => k (f acc_value node_value)
| Micheline.String _ _ => k (f acc_value node_value)
| Micheline.Bytes _ _ => k (f acc_value node_value)
| Micheline.Prim _ _ subterms _ =>
micheline_fold_nodes subterms f (f acc_value node_value) k
| Micheline.Seq _ subterms =>
micheline_fold_nodes subterms f (f acc_value node_value) k
end

where "'micheline_fold_nodes" :=
(fun (A B : Set) => fix micheline_fold_nodes
(subterms : list (Micheline.node location Michelson_v1_primitives.prim))
(f : A -> node -> A) (acc_value : A) (k : A -> B) {struct subterms} : B :=
match subterms with
| [] => k acc_value
| cons node_value nodes =>
micheline_fold_nodes nodes f acc_value
(fun (acc_value : A) => micheline_fold_aux node_value f acc_value k)
end).

Definition micheline_fold_nodes {A B : Set} := 'micheline_fold_nodes A B.

Definition micheline_fold {A : Set}
(node_value : node) (init_value : A) (f : A -> node -> A) : A :=
micheline_fold_aux node_value f init_value (fun (x : A) => x).

Definition micheline_nodes (node_value : node) : int :=
micheline_fold node_value 0
(fun (n : int) =>
fun (function_parameter : node) =>
let '_ := function_parameter in
n +i 1).

The proofโ€‹

We verify that the definition in OCaml is equivalent to the simplified reference definition in Coq with this lemma, verified in src/Proto_alpha/Proofs/Script_repr.v:

Lemma micheline_nodes_like_reference (node : Script_repr.node)
: Script_repr.micheline_nodes node =
Reference_micheline_nodes.micheline_nodes node.

To verify this lemma, we first assert that calling micheline_fold_aux with a continuation k is like calling the continuation k on micheline_fold_aux:

Fixpoint micheline_fold_aux_continuation {A : Set}
(node : Script_repr.node) (f : A -> Script_repr.node -> A) {struct node}
: forall (acc_value : A) {B : Set} (k : A -> B),
Script_repr.micheline_fold_aux node f acc_value k =
k (Script_repr.micheline_fold_aux node f acc_value id).

Then we verify the use of the function micheline_fold_aux for our particular case:

Fixpoint micheline_nodes_like_reference_aux (node : Script_repr.node) acc
: Script_repr.micheline_fold_aux node (fun acc '_ => acc +i 1) acc id =
Reference_micheline_nodes.micheline_nodes node +i acc.

From there we conclude our proof. Overall, all the proofs we have written there are quite straightforward. We primarily use symbolic evaluation and induction. We also use the automated lia tactic for arithmetic questions related to the addition +i.