In this blog post, I propose to make you discover how we deal with internal errors
and show that they do not occur while running the protocol! I will first give you
a glimpse of the so-called error monad (but without going into the details of
the notion of monad ๐). You actually don't need to have any knowledge of monads in computer programming or elsewhere to read this article: we will take them step by step. But
if you are already familiar on them, you can skim through the first parts of this post and have a look at the monadic let ... in
s we use and how we articulate them with validity predicates. We conclude by giving a first glimpse at how we use these validity predicates to manually specify information which is lost during the automatic translation
from OCaml to Coq.
Internal Errorsโ
What is an internal error?โ
Let us present incrementally our methodology to check that internal errors do not occur. But what is an internal error? Well, internal errors are those that should never occur whatsoever. They may be put in regard to user errors, e.g., when a user produces a bad (for instance, ill-typed) block intentionally or not:
- First, we select a list of internal errors. We have hitherto mostly focused on Tezos's "internal error" and storage errors. What should be considered as an internal error (vs. a user error) is a matter of consensus between Tezos developers and this list may be updated.
- Then, we verify the protocol one function at a time. We shall refine this
later, but let's say for now that, for each function
f
, we want to state and prove a theorem of the form:If the inputs
x1
, ...,xn
off
satisfy some pre-conditionsP1
,...,Pn
, then:- if
f
succeeds, then the outputf x1 ... xn
satisfies some post-conditionP
. - if
f
fails (i.e., raises errors), then it didn't raise any internal error.
- if
Of course, pre-conditions are pivotal to prove such a statement, and must be
specified carefully (more on this below). Checking a post-condition ensures
compositionality between our various validity lemmas, in the sense that the
output of a function f
may be taken as an input of another function g
.
The example below will illustrate all of this.
The Error Monadโ
For each type A
, we define a new type M? A
called the error monad (or
the result monad) of A
. Intuitively, every e : M? A
is morally an expression of
type A
which either evaluates to a value a : A
, or raises an error in the
course of its evaluation.
Concretely, the error monad M? A
is built from A
(and another type parameter B
that
we mostly put under the carpet) with two constructors:
Ok a
(witha : A
) specifies a computation inA
which succeeds and outputs a resulta
Error b
specifies an errorb
. Basically,b
can be thought as string (or a list of strings) which contains associated error message + some extra-information.
The definition and notation of M? A
can be found
in Error_monad.v
and in Pervasives.v. This actually emulates the result monad of
Tezos. Monads are a very rich structure in logic, category theory, theoretical computer science as well as in practical programming, but we may blissfully ignore all their subtleties here.
Then, checking that internal errors do not occur is about verifying that the only
Error b
s that may pop up are not in the list of internal errors, with the
function not_internal.
Monadic let
and monadic composition (bind)โ
One way to compute the composition of two functions f : A -> B
and g : B -> C
on an input a : A
is to write let b := f a in g b
.
Likewise, one can "compose" two functions f : A -> M? B
and g : B -> M? C
when applied to an input a : A
. It can be written as:
let? b := f a in g b
The so-called monadic let? ... in
works the same as a regular let ... in
except that it propagages any error that occurs in f
in the expected way, e.g. if f
or g
raise an error then let? b := f a in g b
raises the same error. Thus, let? ... in
allows performing what is called
monadic composition (which is sometimes referred to as the
bind
operator of the monad). This bind
operator for the monad error is written >>=?
in the OCamlcode.
If you look at the Tezos repository, you will see that monadic composition is pervasive in the code of the protocol!
Remark: Formally, let? ... in
is defined by (pseudo-)pattern-matching as follows:
- If
f a = Error foo
, thenlet? b := f a in g b
outputsError foo
. - If
f a = Ok b0
andg b0 = Error bar
, thenlet? b := f a in g b
outputsError bar
. - If
f a = Ok b0
andg b0 = Ok c0
, thenlet? b := f a in g b
outputsc0
.
Specifying validityโ
Validity predicates for valuesโ
For most types A
, we need to define what it is to be valid for an object of
A
. Intuitively, the validity predicate on a given type A
must be simple,
natural and in fine (i.e., when proofs are completed) ensure that a valid input
cannot cause an internal error (notice that the criteria must indeed be
natural to avoid (implicitly) tautological definition such as "a
is valid
when a
does not raise any internal error when taken as a parameter.")
For instance, on the type Int32
, validity is about not being out of bounds.
Module Valid.
Definition t (n : int32) : Prop :=
Int32.min_int โค n โค Int32.max_int.
This predicate allows us for instance to check that there is no overflow while doing operations such as addition or
multiplication when suitable pre-conditions are met. To have an example, have a look at the proof file of
Saturation_repr.v
, e.g. at the lemma
add_is_valid
, which
uses a validity predicate which is very similar to Int32.Valid.t.
above.
Validity properties for functionsโ
Whereas specifying validity predicates on various types pertains to the realm of definitions, verifying that given protocol function are valid pertains to that of theorems and proofs.
First of all, to state the lemmas about the validity of a certain function, we
resort to the letP?
binder, which may be thought of as a particular case of letP?
,
when the second function g
is actually a predicate arriving in Prop
, the type
of propositions.
Thus, for e : M? A
and P : A -> Prop
, depending on the case, the proposition
letP? a := e in P a
means that:
- When the evaluation of
e
succeeds and outputsa0 : A
(i.e.,e
evaluates toOk a0
), thenletP? a := e in P a
iffa0
satisfies the predicateP
; - When the evaluation of
e
raises an error (i.e.,e
evaluates toError b
), thenletP? a := e in P a
iffb
is not internal.
Actually, letP?
is a notation wrapping the operator bind_prop
which does pattern-matching on M? A
In practice, the verification of a function f
articulates the letP?
both with
validity pre- and post-conditions on the inputs and outputs of f
as follows:
Lemma f_is_valid : (* in mock-code *)
forall (x1 ... xn), Valid1 x1 /\ ... /\ Validn xn
-> letP? (y1,...,yp) := f x1 ... xn in
Valid1' y1 /\ ... /\ Validp' yp
As expected, the above lemma specifies that, when the inputs of f
are valid, then
if the computation of f
succeeds, it outputs a valid value, and if it does not, it
does not cause any internal error. Notice that the lemma f_is_valid
is an (almost)
formal version of the kind of properties we desired to capture, as sketched in Paragraph What is an internal error?
Building up validity predicatesโ
Main principlesโ
Usually, the nature of the validity predicate for a type A
follows the nature of
A
, e.g., for records
(i.e., inductive types with named fields), the validity predicate often takes the
shape of a record with the same field names. For instance,
step_constants
has three fields, amount
and balance
of type Tez_repr.t
and level
of type
Script_int.n_num.t
. The validity predicate
of step_constants
is then canonically defined as follows:
Module step_constants.
Module Valid.
Record t (sc : step_constants) : Prop := {
amount : Tez_repr.Valid.t sc.(step_constants.amount);
balance : Tez_repr.Valid.t sc.(step_constants.balance);
level : Script_int.n_num.Valid.t sc.(step_constants.level);
}.
End Valid.
End step_constants.
As the example of step_constants
above illustrates, validity predicates are built on
top of another, e.g. if a type C
has a constructor
Ctor: A -> B -> C
, then the validity predicate of C
should look like:
t x :=
match x with
| Ctor a b => A.Valid.t a -> B.Valid.t b -> t (Ctor a b)
(* snip for other constructors *)
end.
For instance, the validity predicate for the type Ticket_scanner.has_tickets
, which
has a unique constructor Has_tickets : Ticket_scanner.Ticket_inspection.has_tickets
-> Script_typed_ir.ty -> Ticket_scanner.has_tickets
should be:
Module has_tickets.
Module Valid.
Inductive t {a : Ty.t} (ht : Ticket_scanner.has_tickets) (ty : Script_typed_ir.ty) -> Prop :=
| Has_tickets a ht ty :
a = Script_typed_ir.ty.Valid.to_Ty_t ty ->
Ticket_inspection.has_tickets.Valid.t a ht ->
Script_typed_ir.ty.Valid.t ty ->
t (Ticket_scanner.Has_tickets ht ty).
End Valid.
End has_tickets.
As a side note, the predicate has_tickets.Valid.t
is not yet updated as this blog
post is being written, but it will match the above definition soon.
note
An interesting feature of our work at Formal Land can be noticed, whether it is for validity predicate or specifying internal errors, any modification we make is propagated through the whole Coq code;
- For instance, changing a validity predicate for a type
A
will automatically modify the validity predicate for any typeB
that uses the typeA
as a field; - or adding another error in the internal error list will modify everywhere the
not_internal
checks.
Then, Coq type-checker will catch every failure that could come from any modification. This ensures that our work does not regress silently when the protocol evolves.
Specifying lost information in validity predicatesโ
Look at the implicit parameter a : Ty.t
in the validity predicate of has_tickets
quoted above. It comes from the fact that in the OCaml code, the type has_tickets
is parametrized with 'a
, but observe that in the Coq code (which was
automatically produced by coq-of-ocaml
), has_tickets
does not have a type
parameter. Indeed, one of the roles of validity predicates is to also retrieve all the
relevant information that was lost during the automatic translation (beside
setting conditions that cannot be specified in OCaml types, such as integer
bounds). This topic is subtle and we may come back to it in the future.
To conclude this blog entry, I'll just mention the fact that the use of validity predicates goes beyond verifying that internal errors do not occur!. It is only on valid inputs that we can hope to describe the behavior of a function. The use of validity predicates is thus crucial to capture the specification of Michelson functions, beyond the fact that they may or (hopefully) may not raise internal errors!