# Formal Verification of Ethereum 2.0 Part 1: Fixing the Array-Out-of-Bound Runtime Error

In part one of this series dedicated to Formal Verification of Ethereum 2.0, Joanne Fuller from the ConsenSys R&D illustrates how to locate, understand, and fix a subtle array-out-of-bound runtime error within the state transition component.

The Phase 0 Beacon Chain reference implementation describes the state machine that every Beacon node has to implement in the decentralised Ethereum 2.0 ecosystem. The reference implementation comprises three main sections:

1. State Transition,

2. Simple Serialize (SSZ) library, and

3. The Merkleise library.

The state transition is the most critical component and describes how the state is to be updated upon processing of a new block.

Below is research that supports how to locate, understand, and fix a subtle array-out-of-bound runtime error within the state transition component. This error is dependent on the number of active validators as summarised below and can therefore be fixed by imposing an upper bound on the number of active validators.

Number of active validators (V) | No array-out-of-bound error? |
---|---|

V ≤ 4,194,304 | ✅ |

4,194,304 < V < 4,196,352 | At least one input such that array-out-of-bound |

4,196,352 ≤ V | For all input array-out-of-bound |

Detecting this error would be difficult using any other strategy, e.g. code peer review or testing. In contrast, using formal verification, we can detect and fix the error, we are able to mathematically and mechanically prove the correctness of the fix which goes beyond what testing techniques can do. It is a technical blog (written in a tutorial style) with the aim to allow readers, even those not completely familiar with formal verification, to be able to understand how these techniques are useful in the blockchain context.

##### The Project

The project to formally verify Phase 0 of the Ethereum 2.0 Beacon Chain reference implementation using the verification-friendly language Dafny was complete earlier this year.

The primary goal was to formally verify the absence of runtime errors (under/overflows, array-out-of-bounds, division by zero) and our secondary goal (not covered in this blog) to prove functional correctness.

For a more detailed description of the whole project, you can refer to our paper *Formal Verification of the Ethereum 2.0 Beacon Chain*. To see the full project code, please refer to our GitHub repository: https://github.com/ConsenSys/eth2.0-dafny.

##### An Overview of the State Transition

Let’s begin with a simplified view of the Beacon Chain system (Figure 1). Time is logically divided into *epochs* and each epoch is then divided into a fixed number of *slots*. Blocks are associated with a particular slot; however a slot may be empty and therefore no block will be processed. As shown in Figure 1, the arrival of a block causes the state to change, as does the passing of an epoch boundary (even if no further blocks have arrived). This view of the system is a simplified one; please note that the `BeaconState`

is actually updated at every slot; but in the case of empty slots that aren’t at an epoch boundary, this update is simply an increment to the `slot`

field.

Now let’s consider the `state_transition`

; it governs how the `BeaconState`

is updated. The `state_transition`

is a complex procedure but at the highest level it comprises two distinct steps; `process_slots`

and then `process_block`

(Figure 2).

The `state_transition`

procedure is triggered when a new block arrives for a given slot. The first step (`process_slots`

) progresses the `BeaconState`

through any empty slots such that the state can *‘catch up’* from the last recorded block/slot to the current one. The `BeaconState`

is updated as each empty slot is processed (i.e. the slot field is incremented) and if the empty slot is an epoch boundary, additional processing occurs to implement, for example, justification and finalisation, the rewards and penalties, slashing, etc.

Once the state slot is equal to the slot of the block that has just arrived, the second step (`process_block`

) can begin. Within the block processing section further complexity arises. In particular, the `process_operations`

section contains many components such as processing the proposer and attester slashings, attestations, any new deposits and any voluntary exits; each component affecting its own subset of the `BeaconState`

fields.

##### Using Formal Verification to find an Array-out-of-bound Runtime Error

We are now ready to look at the details of an array-out-of-bound runtime error discovered during our project. This error serves as an important example of the benefits of formal verification and in particular, demonstrates its power when dealing with complex calculations that span multiple levels of function calls and include less familiar operations such as floor division.

##### Where does the error occur?

The function `get_attesting_indices`

is called within the `process_rewards_and_penalties`

, `process_justification_and_finalisation`

, and `process_attestion`

components of the `state_transition`

(Figure 3). In the final line of get_attesting_indices (Figure 4) the set of validator indices to be returned is determined by enumerating over `committee`

such that `committee[i]`

* *is included if `bits[i]`

is set to 1.

The `bits`

* *parameter is a `Bitlist`

with a maximum length of `MAX_VALIDATORS_PER_COMMITTEE`

(i.e. 2048). Hence, if

`|committee| > MAX_VALIDATORS_PER_COMMITTEE`

then an array-out-of-bound runtime error will occur.

##### Can this occur?

This is where we see the power of using a tool like Dafny. When this function was coded in Dafny the verifier automatically checked for runtime errors and in this case the verifier identified a potential array-out-of-bound error. (Note, I use the word ‘potential’ as there are instances where the verifier may not have enough information to determine correctness, or may ‘time out’ while trying to verify.) So we had to trace the problem and further understand the calculation of `committee`

*. *The aim was to determine whether additional information (such as pre and post conditions) would facilitate verification, or whether a bug existed and if so, specifically under what circumstances would it be triggered.

##### Tracing the calculation

The `committee`

variable is set by calling `get_beacon_committee`

, which in turn calls `compute_committee`

as shown in the Python code below (Figures 5 and 6).

Starting with the `return`

statement in `compute_committee`

we see a sequence with length *end-start* is returned, hence:

`|compute_committee(indices, seed, index, count)| `

= `end`

–`start`

= `(len(indices) * (index+1)) // count `

–` (len(indices) * index) // count`

When `compute_committee`

is called from `get_beacon_committee`

we see that:

`indices`

=`get_active_validator_indices`

`seed`

=`get_seed`

`index`

=`(slot % SLOTS_PER_EPOCH) * committees_per_slot + index`

- Note that the
`index`

parameter in`get_beacon_committee`

is different from the`index`

parameter in`compute_committee`

.

- Note that the
`count`

=`committees_per_slot * SLOTS_PER_EPOCH`

Making these substitutions:

`end`

=`(len(get_active_validator_indices) * (((slot % SLOTS_PER_EPOCH) * committees_per_slot + index)+1)) // (committees_per_slot * SLOTS_PER_EPOCH)`

`start`

=`(len(get_active_validator_indices) * ((slot % SLOTS_PER_EPOCH) * committees_per_slot + index)) // (committees_per_slot * SLOTS_PER_EPOCH)`

As `committees_per_slot`

is determined by a call to `get_committees_per_slot`

and is a function of the number of active validators:

`committees_per_slot`

=`max(1, min(MAX_COMMITTEES_PER_SLOT, len(get_active_validator_indices) // SLOTS_PER_EPOCH // TARGET_COMMITTEE_SIZE))`

- It will be a value between 1 and 64 inclusive.

Substituting this formula for `committees_per_slot`

into end and start we conclude that *end-start* is a function of:

`len(get_active_validator_indices)`

,`slot % SLOTS_PER_EPOCH`

where 0 <=`slot % SLOTS_PER_EPOCH`

< 32, and`index`

where 0 <=`index`

<`committees_per_slot`

(i.e. the committee`index`

parameter of`get_beacon_committee`

).

Putting all of this together, when `get_beacon_committee(state, data.slot, data.index)`

is called from `get_attesting_indices`

for a state of a particular number of active validators, it can be done so for multiple (`data.slot`

,`data.index`

) combinations. Also:

- as
`committees_per_slot`

depends on the number of active validators, the range of`index`

also depends on the number of active validators; and - once the number of active validators is sufficiently large, there will be a maximum of 32 × 64 = 2048 of these (
`slot % SLOTS_PER_EPOCH`

,`index`

) combinations to consider for a state with a particular number of active validators.

To determine which combinations of (`len(get_active_validator_indices)`

, `slot % SLOTS_PER_EPOCH`

, `index`

) cause the `committee`

length to exceed `MAX_VALIDATORS_PER_COMMITTEE`

we need to analyse the above formulas. The use of floor division makes this analysis more than simple algebra but we can use visualisations to help understand what is happening.

The relationship between number of active validators and committees_per_slot is shown in Figure 7. Notice that once there are 2^{18} = 262,144 or more active validators, `committees_per_slot`

will reach its upper bound of 64.

The value of `committees_per_slot`

then determines the number of (`slot % SLOTS_PER_EPOCH`

,`index`

) combinations to consider for a state with a particular number of active validators.

In Figure 8 we plot the number of (`slot % SLOTS_PER_EPOCH`

,`index`

) combinations, as well as how many of these input combinations will produce `|committee| = end - start > MAX_VALIDATORS_PER_COMMITTEE`

. We observe that the number of (`slot % SLOTS_PER_EPOCH`

,`index`

) combinations reaches its maximum of 32 × 64 = 2048 at 2^{18}=262,144 active validators.

We can also observe that there are zero (`slot % SLOTS_PER_EPOCH`

,`index`

) combinations where `|committee| > MAX_VALIDATORS_PER_COMMITTEE`

until the number of active validators exceeds 2^{22} = 4,194,304; as shown in the section of graph reproduced in Figure 9.

Hence, once the number of active validators exceeds 2^{22} = 4,194,304 there are (`slot % SLOTS_PER_EPOCH`

,`index`

) combinations that cause `|committee| > MAX_VALIDATORS_PER_COMMITTEE`

. Further, once the number of active validators reaches 2^{22} + 2^{11} = 4,196,352 we can observe that all (`slot % SLOTS_PER_EPOCH`

,`index`

) combinations will result in the error.

##### Example: 4,194,355 active validators

When the number of active validators is 4,194,355, multiple (`slot % SLOTS_PER_EPOCH`

,`index`

) combinations result in the error. Figure 10 shows the particular combinations that result in the error (i.e. the red blocks).

##### Conclusion

This previously undocumented bug was difficult to detect. It required many hours of effort to model the dynamics of the problem; the analysis was quite complex due to the multiple interrelated parameter calculations, and the use of floored integer division. The ability of Dafny to initially detect the problem is significant and provides clear justification for the use of formal verification techniques in the blockchain context. The full analysis of this bug has been reported as an issue to the reference implementation github repository. The issue was confirmed by the reference implementation writers.

##### Further Reading

If you are interested in reading more about formal verification in the Ethereum 2.0, check out the paper by my colleague Franck Cassez discussing the work undertaken to formally verify the Eth2 Deposit Smart Contract.

For regular updates and news on the Merge, visit the ConsenSys Merge Knowledge Base, where you’ll also find Ethereum 2.0 archives with key milestones achieved and other essential resources for developers.