In the near future, Ethereum will transition to a new Proof of Stake consensus protocol. Instead of the miners used today, Ethereum 2.0, relies on validators to create and validate the blocks of transactions that are added to the distributed ledger. The protocol is designed to be fault-tolerant to up to ⅓ of Byzantine (i.e. malicious or dishonest) validators. Economic staking mechanisms are in place to try and keep dishonest validators under ⅓: validators have to stake some ETH, and if they are dishonest they may be slashed and lose their stake. The process of staking ETH is handled by the Beacon Chain staking deposit contract on the Ethereum network. A validator sends a transaction to "deposit some Ether" by calling the deposit function of the contract. The deposit contract was deployed in November 2020 and, at the time of writing, has topped 7,157,820 ETH ($25 billion USD) in deposits.

This blog post is based on this research paper, an extended abstract of which is to be published at the 24th international symposium on Formal Methods 2021.

Why should we formally verify the Deposit Contract?

The Eth2 deposit contract is a mission-critical component of the Ethereum 2.0 consensus mechanism, also known as the Beacon Chain. Any bugs in the deposit contract could result in inaccurate tracking of deposits, missing registered validators, or cause crashes and downtimes. Any of these compromise the integrity and availability of Ethereum.

Bugs can trigger runtime errors like division-by-zero or array-out-of-bounds. In a networked environment these vulnerabilities can be exploited by attackers to disrupt or take control of the computer system. Other types of bugs can also compromise the business logic of a system e.g., an implementation may contain subtle errors that make the system deviate from its intended specifications (for instance using [katex]\text{\verb|+=|}[/katex] in C/C++ instead of [katex]\text{\verb|=+|}[/katex]; this PVS-studio link has more examples in C++).

It is hard to guarantee that programs and smart contracts implement their intended business logic, have no common runtime errors, and terminate properly. There are notorious examples of smart contract vulnerabilities that have been exploited and publicly reported. In 2016, a reentrance vulnerability in the Decentralized Autonomous Organization (DAO) smart contract was exploited to steal more than $50 Million.

The deposit contract implements the functions, deposit and get_deposit_root. These rely on sophisticated data structures and algorithms to record the list of deposits so that they can be communicated efficiently over the network. The history of deposits is summarized as a unique number, a hash, computed using a Merkle (or Hash) tree. The tree is built (or updated) incrementally after each deposit using the elegant incremental Merkle tree algorithm (Progressive merkle tree, Vitalik Buterin). The algorithm is efficient and concise but as pointed out previously:

The efficient incremental algorithm leads to the deposit contract implementation being unintuitive, and makes it non-trivial to ensure its correctness.

Park, D., Zhang, Y., Rosu, G.: End-to-end formal verification of ethereum 2.0 deposit smart contract. CAV 2020.

Why not write tests?

To catch bugs in the deposit contract, testing is probably the first thing to do. While testing the contract (e.g. with random inputs, or property-based testing) is useful and can detect bugs, it has some severe limitations:

Program testing can be used to show the presence of bugs, but never to show their absence!

E. W. Dijkstra (1970), Notes On Structured Programming.

Unlike testing, program verification aims at proving the absence of bugs. The idea is to use rigorous mathematical and logical reasoning about programs, more precisely about implementations vs specifications. One such logical framework was proposed by Floyd and Hoare in the 1960s. It creates a distinction between how the program computes the result (its implementation) and what the expected result of the program is (the specification). Program, implementations are typically written in a standard programming language e.g., a list-sorting algorithm in Java, and the specification as a logical (first-order logic) statement e.g., the list is sorted.

Automated program verification with Dafny

Program correctness can be precisely stated using so-called Hoare triples of the form:

\{ P \} \quad \textit{Prog} \quad \{ Q \}

Such a triple is valid if and only if: assuming the pre-condition [katex]P[/katex] holds before program [katex]\textit{Prog}[/katex] is executed, then the post-condition [katex]Q[/katex] is guaranteed to hold when [katex]\textit{Prog}[/katex] terminates. Floyd logic provides logical rules that can be used to prove that a Hoare triple is valid. Note that there is a termination condition in the previous statement. A triple does not capture the property that a program necessarily terminates. If we want to prove termination, say for a program with a single while loop, there are ranking functions. A ranking function [katex]f[/katex] is a measure computed using the program variables, that strictly decreases every time the loop body is executed. If the function [katex]f[/katex] takes its values in a well-founded set (in which there are no infinite strictly decreasing sequences of values), the existence of a ranking function is a proof of termination for the while loop.

Proving correctness and termination of programs with Floyd logic and ranking functions used to be a long and tedious process: one had to follow Floyd logic rules and write lengthy (ironically error-prone) pen and paper proofs.

Recently, a lot of progress has been made to improve the verification experience and its soundness. Dafny is a verification-friendly programming language in which you can write logical pre- and post-conditions, functional or object-oriented programs. Dafny automates the verification of Hoare triples: it checks every single step of a proof which greatly reduces the likelihood of reasoning flaws. For example, in a proof by induction the Dafny verifier checks that 1) the base case is valid, 2) the induction step is valid and 3) the induction is well-founded. In Dafny, the verifier also checks the programs for standard runtime errors like division-by-zero, array-out-of-bounds.

This blog post by my colleague Joanne Fuller provides a quick introduction to what you can do with Dafny.

So what does the proof look like?

The algorithms in the deposit contract simulate the computation of a synthesised attribute, hash, on a binary tree (Merkle tree) without building the tree. As Merkle trees are perfect (all levels are full), computing a synthesised attribute on such a tree takes exponential time in the height [katex] h[/katex] of the tree: all the nodes of the tree have to be processed and a Merkle tree has [katex]2^{h + 1} - 1[/katex] nodes.

In our proof of the deposit contract, we followed a standard dynamic programming technique to derive recursive linear-time and space algorithms to compute the attribute on a tree. We prove that these recursive algorithms are correct in the sense that they compute the same value as if we had built a Merkle tree.

The details of the proofs (including accompanying paper and video presentations) can be found in this GitHub repository. At a high-level, the proof shows that some invariants are preserved by the functions of the deposit contract. These invariants guarantee:

  • The absence of runtime errors: like division-by-zero, over/underflows and array-out-of-bounds (accessing elements in an array outside of the range of indices of the array).

  • Functional correctness: the values returned by the functions in the contract are the same as the values that would be returned if we had built a Merkle tree.

Finally, we show that the functions implemented in the deposit contract (proposed in Progressive merkle tree, Vitalik Buterin) compute the same values as the recursive algorithms. By (provable and Dafny-checked) transitivity this shows that the deposit contract is correct.

Writing a Dafny proof

The sample code below (Algo 1) is a slightly simplified version of the Dafny source code of get_deposit_root . It illustrates how correctness and termination can be specified in Dafny. The pre-conditions are written using requires and the post-conditions using ensures. The predicate Valid is an invariant (it is a pre- and post-condition of get_deposit_root) and captures some essential properties that are needed to prove the contract correctness. The post-condition at line 4 specifies that the result computed by get_deposit_root is the same as the one that would be obtained if we had build a Merkle tree. The absence of array-out-of-bounds errors in dereferencing the array branch (lines 19 and 21) are proved using the loop invariants (lines 12 and 13): the Dafny verifier checks that these are indeed loop invariants and this in turn ensures that in the loop body, the value of h is always a valid array index.

The ranking function (line 15) uses the fact that the variable h strictly increases in the loop body (line 24) and, as TREE_HEIGHT is constant, the difference TREE_HEIGHT - h strictly decreases. Together with the invariant at line 12 that proves TREE_HEIGHT - h >= 0, the value of TREE_HEIGHT - h:

  1. is an integer and

  2. strictly decreases at each iteration of the loop and

  3. is bounded from below

which imply that the while loop at line 10 always terminates.

How hard is it to design a proof?

Program verification is challenging because it is not always possible to automatically synthesise a proof for a given Hoare triple. I had to create many loop invariants and other lemmas to design a proof of correctness (and termination) for the deposit contract. The nice thing is they are all written in Dafny!

In a single language we can write programs, specifications, and proofs (as programs). You only need the Dafny verifier and the deposit contract Dafny source code to check any of the proofs and to reproduce the verification results.

It took me roughly 12 weeks to first write a sketch pen and paper proof and then write all the details and proofs in Dafny. The code base has approximately 3500 lines of code, 90% of which is proofs. The proofs are written as (functional) programs, loop invariants or theorems (lemmas in Dafny). The Dafny verifier checks the proofs but can also figure out some steps automatically. There are several strategies or heuristics that can be applied by the verifier to synthesise tedious steps of a proof for instance induction, or reasoning about integers.

Conclusion

We have designed a machine-checkable correctness (and termination) proof for the Solidity version of the deposit contract. The proof ensures that the deposit contract is correct and free of runtime errors. We have also identified some (provably correct) new optimizations of some of the functions of the contract (see this explanation). We hope that this work can serve as a guide for new smart contract verification projects.

If you are interested in this project, feel free to contact us to discuss how the Trustworthy Smart Contracts's team at Consensys might help you to design efficient, bug-free reliable smart contracts, email myself and Joanne Fuller.