The data_encoding library is used throughout the Tezos codebase for various serializing/deserializing tasks. Because of this, in our effort to formally verify the protocol, there are many proofs involving encoding.
Simply speaking, data_encoding
has a series of primitives for encoding/decoding scalar
data types and combinators for non-scalar data types, combinators work by
combining primitives together with functions and letting users compose smaller
encodings into bigger ones.
Having axioms and lemmas for primitives and combinators enable us to verify the encodings, but in practice, these proofs can be very mechanical.
For example here is a proof involving data_encoding.
Lemma fixed_encoding_is_valid
: Data_encoding.Valid.t (fun _ => True) Constants_repr.fixed_encoding.
eapply Data_encoding.Valid.implies.
eapply
Data_encoding.Valid.conv,
Data_encoding.Valid.obj10;
constructor;
try first [
apply Data_encoding.Valid.uint8 |
apply Data_encoding.Valid.uint16 |
apply Data_encoding.Valid.int31
];
eapply Data_encoding.Valid.list_value;
eapply Data_encoding.Valid.conv_with_guard;
apply Data_encoding.Valid.int64_value.
intro x; destruct x; intuition.
repeat constructor.
Qed.
All these Data_encoding.Valid.*
lines can now be replaced by a simple call of
data_encoding_auto
. This tactic will try to apply each of the combinator axioms
from Data_encoding.Valid
and then use the Data_encoding_db
hint database to
solve the remaining subgoals regarding data_encoding
primitives. If the proof
cannot be fully solved it will remain subgoals that are specific for each encoding.
Now here is the same proof but with data_encoding_auto
.
Lemma fixed_encoding_is_valid
: Data_encoding.Valid.t (fun _ => True) Constants_repr.fixed_encoding.
Data_encoding.Valid.data_encoding_auto.
intro x; destruct x; intuition.
repeat constructor.
Qed.
Some data_encoding
combinators involve arbitrary functions; these are case
for encoding variants of sum types, and conv
for providing a conversion
between encodings. Because of this it is not always possible to fully automate the
encoding proofs, but we can leave all the mechanical work for the machine so we
can focus on things that are specific for each encoding.
As we write more encoding proofs we can increment the hints database. Since encodings have a nesting nature, we can use these hints to prove bigger encodings making it easier to write encoding proofs and avoiding repetitive proofs.