The **backward compatibility** is an important property to maintain between two releases of the protocol of Tezos. For example, this ensures that existing smart contracts still work after a protocol upgrade. That property can be hard to ensure by the protocol developers as the codebase's size increases. At the time of writing, there are 86,346 lines of code in src/proto_alpha/lib_protocol in the `master`

branch of Tezos, counting all the `.ml`

and `.mli`

files.

We propose using formal verification to check that the backward compatibility is preserved after each protocol release, with a **mathematical** degree of certainty. More precisely, we show that for every inputs and storage contexts, a smart contract working with version Jakarta of Tezos will generate the exact same output running on the next version K. Even if there are infinitely many possible inputs, we can verify all of them by writing a mathematical proof in the formal system Coq.

In this blog post, we describe our plan for this verification effort.

## Strategyโ

### The property to verifyโ

The general property we are showing is the following commutative diagram:

`input & storage J โโโ smart contract J โโโโโบ output J`

โ โ โ

โ โ โ

migrate migrate migrate

โ โ โ

โ โ โ

โผ โผ โผ

input & storage K โโโ smart contract K โโโโโบ output K

Given a migration function `migrate`

migrating values from the version J of the protocol to the version K, we show that the two computations:

- executing a smart contract in version J and then migrating the result;
- migrating a smart contract and its input to version K and then executing it;

yield the same result.

For that, we plan to import the two protocol versions to Coq in two different folders `Proto_J`

and `Proto_K`

. We use the translator coq-of-ocaml to do this import, converting the OCaml code of the implementation of Tezos to Coq.

### Michelson's implementationโ

The Michelson interpreter in the Tezos protocol contains the three following main files:

- script_typed_ir.mli: the definition of the abstract syntax tree of Michelson;
- script_interpreter.ml: the main function of the interpreter, dispatching over each instruction case (there are around 150 different instructions);
- script_ir_translator.ml: the type-checker/printer of smart contracts, handling the serialization/deserialization of Michelson programs to and from the storage system.

These three files amount to around 10,000 lines of code in total and are calling various auxiliary files. A difficulty with the verification of the Michelson implementation, apart from its size, is the use of GADTs in OCaml. Unfortunately, we do not have a clean translation of GADTs to Coq. Instead, we:

- Erase all the type parameters of the GADTs of OCaml, and add dynamic casts in the
`match`

operations. These casts are unsafe but verified by the type-checker of OCaml. They are necessary as we lost information by erasing the type parameters from the GADTs. This step is done**automatically**by`coq-of-ocaml`

. - Define a new Michelson implementation in Coq using dependent types. We do this step
**manually**. To make this new definition, we start from the code generated by`coq-of-ocaml`

and add dependent types to it. We describe this process in detail in our Guide on Michelson. We then**prove**this transformation to be correct by writing a simulation proof between the code with dependent types and the translated code with dynamic casts.

We will prove the backward compatibility working on the dependent version of the Michelson's implementation.

### Shared definitionsโ

We define most of the types twice for protocol J and protocol K. For example, we have two abstract syntax trees for Michelson in version J and K, even if the two are almost the same. To have a conversion between all of these types, we will make a `migrate`

function for each type, going from the version J to the version K. For the case of the Michelson tree, this function will associate each constructor of the tree in version J to the corresponding constructor in version K.

Having different types for the two protocol versions will force us to duplicate many of our proofs. Fortunately, we can share primitive types such as integers and lists. We do so by sharing the implementation of the environment of the two protocols, as it is already done in OCaml. The environments of the protocols define all the primitives that a protocol can access. It includes a part of the standard library of OCaml, plus additional primitives for cryptography and access to the database.

Sharing the definitions and the proofs over the environment should help to simplify many of the proofs.

## What we have todayโ

Today we are defining the dependent version of the Michelson's implementation of protocol J. We have a complete definition of the abstract syntax tree, a mostly complete definition for the interpreter, and a partial definition for the type-checker/printer. We are writing the proofs of equivalence between the dependent version and the version translated by `coq-of-ocaml`

as we write the dependent definitions.

## Nextโ

Next week, we will start translating the development protocol proto_alpha to Coq usingย `coq-of-ocaml`

. This development version should become the protocol K in a few weeks, so it is a good idea for us to start working with it. Then we will:

- port our proofs and dependent implementation of Michelson to the version K;
- write migration functions between the data types of version J and K;
- express the theorem of backward compatibility, stating that the interpreter of K is backward compatible with J;
- make the actual proof; this will include showing that the various interpreter sub-functions are compatible.

An example of change planned for version K is the removal of the `kinfo`

payload from most of the instructions. This change should improve the performance of the Michelson interpreter as the `kinfo`

payload is not used in most cases. We will show that this global refactorization is safe and preserves the semantics of Michelson.