Measuring The Health Of A Stateless Ethereum Ecosystem
In the previous blog in this series, we described the design and quantification of the Stateless Ethereum Bayesian network (BN) model.
As a reminder, a simplified, high-level view of this model is shown in Figure 1. It consists of four sub-models: Block creation, Witness creation, Ethereum network and Block propagation.
In Figure 2 we can see an expanded view of the BN model, with the factors that are common between sub-models clearly visible.
Now that we have a fully quantified model, we can run the model and explore some scenarios like, if the overall latency in the system is high, how does that affect the health of the Ethereum ecosystem?
Running the sub-models
Designing a system as an object oriented BN (OOBN) model, is a more modular approach to modelling. Consequently, we are able to run each of the sub-models independently as a model in its own right. Therefore, in this blog we will run each of the sub-models, before finally running the combined model to predict the probability of a healthy Ethereum ecosystem once Stateless Ethereum is implemented. The probability of a healthy ecosystem is based on the data, expert knowledge, and stateless implementation assumptions as outlined in the previous two blogs of this series.
Running the Ethereum network model
We can now enter ‘evidence’ into the model and propagate this new information through the model to observe predicted changes in probabilities. By ‘evidence’ we mean that we set one or more factors in a model to be in a particular state, instead of the default probability distribution across all its states.
The Ethereum network model has five key factors: Ethereum node type, Peer location, Node location , Node bandwidth and Network latency.
Running this model shows the marginal probabilities for each factor (Figure 4).
For example, we may be interested to know: “If we have a node running in Europe, how is that likely to affect network latency?”.
To represent this scenario, we need to set node location in the model to Europe. We do this by double-clicking on Europe.
In Figure 5, the red line shows that we have entered this information in the model and the node is definitely in Europe (100%).
Running this scenario through the model, we are able to see the updated probabilities and how they compare to the initial probabilities.
We can ask many other questions of the model and observe how scenarios affect the probabilities of other factors.
Running the Block creation model
The block creation model (Figure 6) includes information about block gas limit, difficulty, block gas used, block contents, number of transactions per block, state entries updated and block creation time.
The quantification of this model used block information from the same blocks that were used to create witnesses. The witness size and creation time are captured in the Witness creation BN model.
Running the block creation model, produces the marginal probability distributions as shown in Figure 7.
In a similar way to running a scenario of interest through the Ethereum network model, we are now able to look at the predicted changes in one or more factors of the block creation process when we entered into the model.
For example, we observe in Figure 8 that when the block creation time increases substantially, to around 40-50 seconds per block, the probability distribution of the number of updated state entries shifts to around 200-400 entries.
We also see that the probabilities of the number of transactions per block is expected to change accordingly.
Running the Witness creation model
As mentioned in the previous blog, we used information from implementing the Stateless Ethereum witness specification in a fork of Hyperledger Besu to quantify this BN model.
Difficulty, state entries updated and block gas limit have already been dealt with in the Block creation BN, but are required in this model since they have an influence on the size and creation time of the witnesses, as shown by the arrows (edges) in Figure 9.
As before, to start exploring the potential effects that different scenarios would have on witness size and witness creation time, we enter ‘evidence’ into the model. We then rerun the model to propagate the new information through the model.
It is important to note that when the witness creation process is eventually implemented in Ethereum, compression of witnesses is very likely to also be implemented. This would reduce the size of witnesses, but at the cost of increased time to create witnesses. When we developed this model, we did not have access to any witness compression information.
Therefore, the witness implementation that was used for the BN model is without any compression. At the time of writing, Verkle tries appear to be the preferred solution. Previously other techniques were considered to reduce witness sizes, such as binary trie structures and code chunking / Merkleization.
If witness sizes become large, we can see in Figure 11 that witness creation time probabilities change quite noticeably. Instead of at least half of the creation times (55%) expected to be at or below 10 seconds, once we enter the large witness sizes, this drops to only 14%.
Running the Block propagation model
The remaining five factors, block & witness processing time, uncle rate, block propagation time, node status, and node keeps up with the head of the chain were informed by a mixture of empirical data and expert knowledge.
Of the ten factors that make up the Block propagation model (Figure 12), five have been quantified in other sub-models, viz. witness creation time, block creation time, node bandwidth, network latency and block producer?
Running the fully quantified block propagation model results in the probabilities shown in Figure 13.
Assuming all else remains the same, but block propagation times become high, we observe that uncle rates are predicted to be high (Figure 14). Moreover, the proportion of nodes syncing with the chain is expected to almost double from 15% to 26%.
Running the Combined model
Finally, we run the combined model, which runs all the sub-models. It shows that the health of the Ethereum ecosystem with a basic Stateless implementation, i.e. no compression of witnesses, is expected to be healthy (Figure 15).
The caveat for this result is three-fold: Firstly, the model was quantified using only a subset of Ethereum mainnet data (26,595 blocks), which predates EIP1559. Secondly, the output from the implementation of the Stateless Ethereum witness specification would be different if we implement a technique such as Verkle tries to reduce witness sizes. Lastly, we used expert knowledge to quantify relative influence of parent factors and for key factors where empirical data was not available. Expert knowledge can introduce bias into the model.
For the reasons mentioned above, the probability of the final outcome, Ethereum ecosystem, being healthy, should not be interpreted as an exact result, but rather as a reference point for assessing changes in probability when exploring the effect that various scenarios have on the predicted health of the ecosystem.
In other words, if we observe the difference in predicted ecosystem health between the best and worst case scenarios, we will gain a better understanding of the potential impact of particular situations.
An interesting scenario mentioned to us is “how big can the maximum witnesses be before you lose more than 10%/20%/50% of the non-mining nodes and 1%/5%/10% of mining nodes on the network”. This question is particularly relevant for post-London hard fork, since maximum block sizes can now be double what they previously were, so may have an impact on witness sizes. With the information in the model discussed here, i.e. pre-EIP1559, we can ask a related, but simpler question: “For a non-mining node and a very large witness, how does that affect the ability of a node to keep up with the head of the chain?” This will give some insight into expected changes to the status quo.
We selected two variations on this question and the graphical outcomes are shown in Figure 16. We observe that the probability of having this combination of evidence is 23.7% for the second largest range (Figure 16(b)), and 5.9% for the largest (Figure 16(c)). In other words, these scenarios are not expected to happen very often.
The less severe scenario (Figure 16(b)) shows the probability of keeping up with the head of the chain dropping from 65.2% to 58.2%, a percentage change of 7.0% which is a relative change of 10.8%. For the more severe scenario (Figure 16(c)), i.e. the largest witnesses, the probability of keeping up with the head of the chain drops even further to 54.1%, which translates to a relative change of 16.9%.
An online, interactive version of this BN model is hosted by Hugin on their Demo website. It is therefore possible for anyone to interact with the current version of the model.
Introducing Stateless Ethereum into a functioning, stable ecosystem presents us with an interesting study. We have a pretty good understanding of Ethereum mainnet and its behaviour, which is largely predictable. Moreover, we have a lot of historical information available to us that we can use to study and predict future behaviour. However, the introduction of Stateless Ethereum has the potential to generate behaviour that we did not anticipate. At best we can simulate, or run experiments in test environments, prior to implementation to study any emerging behaviour or discover any potential issues. In that sense the behaviour of the altered Ethereum ecosystem is largely unknown, or may be less predictable.
Modelling the known processes and quantifying them using empirical data, supplemented with the new processes and representations of our knowledge about these processes, enables us to gain a more complete picture of the potential repercussions to other parts of the network.
The ability of Bayesian networks to include diverse data sources, combining empirical data, model output and expert knowledge provides us with the current understanding of the proposed integrated system. However, the presence of expert knowledge can introduce bias into the model. Conducting sensitivity analysis and model verification are helpful in identifying and mitigating bias.
Modelling the known processes and quantifying them using empirical data, supplemented with the new processes and representing our knowledge about these processes, enables us to gain a more complete picture of the potential repercussions to other parts of the network.
The ability of Bayesian networks to include diverse data sources such as empirical data, model output and expert knowledge, provides us with a view of the proposed system that takes into account all available information at the present time.
Bayesian network modeling is typically an iterative process, and the model presented here is the first iteration of the model. Additional versions may be created to include new and emerging research and current data, such as post-EIP1559 Ethereum.
If you have any feedback on this project, suggested improvements to the model structure, additional data sources and suggested scenarios that can be included in the online model,
we would love to hear from you. The evidence and parameter sensitivity analyses, although not presented here, are available on request.
Feel free to get in touch with me via email.