ConsenSys Research

Measuring Blockchain Decentralization

An approach to quantifying decentralization of the Ethereum network over time

Part 3 of the ConsenSys Research Interoperability Series. Originally delivered as a talk at Devcon V, this article identifies and quantifies metrics of decentralization on the Ethereum mainnet with visualizations and robust data.

By Everett Muzzy and Mally Anderson

The Importance of Decentralization

This is the third piece in a series exploring the state and future of interoperability and decentralization in the blockchain ecosystem. In this article, we unpack and examine  the dimensions and importance of decentralization. In our last article, we staged the argument for Ethereum to serve as the base settlement layer of the future blockchain-powered Web3 ecosystem. Our argument, in brief, was that not every blockchain will need to prioritize absolute decentralization. Rather, the Web3 future will be pluralistic; comprised of a multitude of blockchains with varying degrees of decentralization, privacy, confidentiality, functionality, etc. What all these blockchains should share, however, is an ‘anchor’ to a base trust layer––in other words, a global settlement platform onto which all other chains can export their states at periodic intervals. This base trust layer would provide irrevocable security and finality for the entire blockchain ecosystem, granting other protocols built on top of it the ability to maximize particular functionalities even if they require a compromise in decentralization (and thus security). 

For this vision to manifest, however, the blockchain ecosystem must collectively decide to build on top of whatever protocol can prove itself to be the most decentralized. We set out on this effort by first proposing and exploring a new comparative measurement called Decentralized Transactions per Second, or DTPS, as an alternative to throughput. We outlined this approach in our previous article. As we sought to measure existing protocols’ transactions per second and current extent of decentralization, however, we realized that comparing most metrics of decentralization across protocols is like comparing apples and oranges, and arguably no protocol today is sufficiently decentralized and built out to serve as that base settlement layer.  After all, the entire blockchain ecosystem is quite young, and it takes time to build network effects and to scale protocols. We therefore amended our search. Instead of asking which protocol can prove itself now to be most decentralized, we wanted to ask the question: “What does the evolution of decentralization look like across different protocols over time to indicate which one will be best suited?” We have started by focusing on Ethereum. 

So: what do we actually talk about when we talk about decentralization? How can we objectively measure its extent and monitor its evolution over time? Obviously it is not a binary is-or-is-not condition, but a very complex and emergent process that will change as the network grows. What data can we measure objectively? What can we objectively measure on Ethereum right now, and watch change over time? What’s actually happening on mainnet and what does it tell us about the progress we’re making, or not making?

After arriving at our approach, we set out to answer some very specific questions: 

  1. Is Ethereum actually getting more decentralized over time?
  2. Are there metrics that show the network getting more centralized?
  3. Does the data reveal areas we need to focus on addressing or changing?
  4. Given the trends we’re observing, can we make any meaningful predictions about the future?
  5. Which of these metrics can we compare across protocols?
Figure 1: The Subsystems of Ethereum’s architecture that affect its decentralization

Methodology: Subsystems of Decentralization

Our approach to measuring Ethereum’s decentralization over time began with determining which elements of Ethereum’s architecture––both on- and off-chain––most significantly impact its decentralization. We identified 19 key subsystems spread across 4 categories to investigate at this stage in the research (Figure 1), attempting to anchor our conclusions in on-chain data as much as possible. It is important to note that we have omitted some data points we consider important, but were not on-chain or necessarily quantifiable––including concepts like the strength & distribution of power grids on which nodes run and the legal jurisdictions and relative stability of the countries in which large numbers of nodes are hosted. 

There are other people talking about measuring decentralization and/or the nebulous concept of decentralization itself, and we have been trying to situate our approach and our conclusions within the existing discussion. Angela Walch, for example, criticises the blockchain ecosystem’s overuse of the word ‘decentralization’ without having a specific definition. She argues that the vagueness of the term is starting to bleed into legal and regulatory decisions. When it comes to defining and measuring decentralization, however, she warns against a pitfall called “Gresham’s Law of Measurement,” which states that “easy‐to‐calculate quantitative metrics tend to crowd out more relevant but difficult‐to measure assessments.” She goes further, stating that “succumbing to Gresham’s Law of Measurement means allowing measurability to trump meaningfulness. In other words, easily calculated quantitative metrics may provide the illusion of measurability while in actuality not being meaningful” (footnote 1).

We acknowledge that some metrics we explore in this piece––for example, token holding percentages among whales (large ETH holders)––might not be considered the most important or revelatory measure of decentralization. Where we find the true power vectors in blockchain networks are likely in more ambiguous areas,  like the relationships between core devs and major miners. A relationship, however, is difficult to quantify, and we still believe there is utility in starting from the ground up and quantifying as much as possible so we have objective starting points to those more difficult, nuanced studies.

For as many of the data points as we could, we tracked their evolution quarter over quarter as far back as possible––many from the earliest days of Ethereum through gradual adoption, rampant speculation, the big hacks, CryptoKitties, the bubble of early 2018, and subsequent course correction into 2019. Much of the data in this article is provided by Alethio, a data analytics company providing real-time access and analysis into on-chain Ethereum activity. The graphs in this article can be found on Alethio’s public Tableau, under the “Measuring Decentralization” dashboard.

Ecosystem

Account Growth: Total vs. Active
Graph 1: Total Account Growth v. Active Address Growth | 2015 – 2019

Graph 1 shows the growth of accounts on the Ethereum network. The x-axis is time (represented quarter over quarter from 2015) and the Y axis shows the number of addresses. The blue line shows the cumulative growth in all the addresses created on the network over time, and the red line shows the number of active addresses over time. “Active addresses” is defined as the number of distinct addresses that have transacted or made contract calls at least once in that quarter.

As expected, the blue line demonstrates a steady increase in the number of addresses on the Ethereum network. We see active addresses, however, more or less flattening after the bubble of Q4 2017 (footnote 2). The immediate story this graph tells could be that people are simply using the network less after the bubble, as the number of active addresses has more or less stayed flat the past few quarters despite a growth in overall addresses.

Graph 2: Number of Transactions & Contract Calls | 2015 – 2019

Moreover, when we look at the number of transactions and contract calls over time (Graph 2), we see the cumulative number of records quarter-over-quarter more or less aligning with the number of active addresses we saw in Graph 1, including the recent uptick in Q2 2019. What this could indicate, though it would require more research to confirm, is that we see a consistent level of activity despite the growth in the overall number of addresses. In other words, the number of people being ‘active’ on the network stays pretty consistent, and they are transacting a pretty consistent amount quarter-over-quarter. 

There are two ways to look at this possible conclusion. First, one could argue that it points to Ethereum’s continuing utility, as well as the resiliency of network participants committed to using Ethereum even in the face of price fluctuations. Second, one could argue that it indicates a consistent point of centralization on the Ethereum network, with most activity relying on a relatively small group of users who keep transacting over time. We will need to investigate how many of these active addresses are repeat vs. one-off users of the network to better understand what this data means in terms of decentralization.

Account Growth: Total vs. Non-Zero
Graph 3: Total Account Growth v. Non-0 Account Growth | 2015 – 2019

Graph 3 shows account growth over time by total addresses (the gray line) alongside the growth in addresses that hold a “meaningful non-zero” ETH balance (the orange line). We did not think we would get an appropriate view of things if we looked at addresses with an absolute 0 ETH balance, so we defined the threshold by the average transaction fee in ETH in 2019. All the accounts holding a smaller balance than that are considered zero-balance, as it is unlikely for them to be able to cover the gas fee to execute a transaction (footnote 3).

Graph 3 shows a fairly steady linear increase in non-zero addresses quarter-over-quarter with no major bumps, even during dramatic price fluctuations. This data does not necessarily prove there has been a steady increase in the number of individuals holding ETH, as addresses are pseudonymous, but it is not a far-off conclusion. That is good news for the decentralization of Ethereum, suggesting that we can expect a consistently growing number of ETH holders on the network over time, even in the face of price volatility. Moreover, we could suggest that the growing delta between the non-zero addresses and total addresses is increasingly made up by smart contract addresses. This evolution may indicate that the network is still being used as a means of direct peer-to-peer transaction and dapp interaction (i.e. the actions that require a positive-ETH balance), but is also increasingly being used for smart contract functionality. Overall, this would indicate that Ethereum has been supporting more diverse, and thus more decentralized, types of on-chain business logic over time.

Growth in DEXs and DeFi

Graph 4: Defi Use | 2015 – 2019

Decentralized Finance (DeFi), also called Open Finance, has been a major area of growth in the blockchain ecosystem over the last year. The term “DeFi,” however, rests on the assumption that the protocols, dapps, and logic these financial instruments are being built on are themselves decentralized. It is not enough just to claim that any financial tool on a blockchain is decentralized. 

Graph 4 above shows the cumulative percentage of addresses on Ethereum that have transacted with a Defi protocol (including DEXes) over time. For example, in Q2 2019, all the addresses since 2015 that have interacted with a Defi platform made up 0.69% of all the addresses on Ethereum that quarter (~88 million). This graph appears to show Defi usage as a percentage decreasing over time, suggesting that Defi adoption is not growing at the same rate as the number of new network addresses. That conclusion, however, did not line up with what we can observe anecdotally about the ecosystem’s adoption of DeFi, so we looked at the data a different way.

Graph 5: Defi usage, split up between DEX use (top) and non-DEX Defi use (bottom) | 2015 – 2019

When we split up the graph to show DEX usage compared to other Defi platform usage, we see a different story. The bar charts in Graph 5 show the change in decentralized exchange (DEX) (in red) and Defi (orange) usage over time. Just as in Graph 4, the bars show the cumulative number of Ethereum addresses that participated in a DEX or Defi contract. For example, in Q2 2019, all the addresses since 2015 that that have interacted with a non-DEX Defi platform made up 0.018% of all the addresses on Ethereum as of that quarter (footnote 4).

If we just look at the DEX chart (red), we see a decline in DEX usage on the Ethereum network over the course of 2018 and 2019, following the price bubble. In the grand scheme of things, the reduction is small (just tenths of a decimal). The decrease in volume likely has to do with the decreasing number of transactions over the same time period (as we will see in a later graph, the number of transaction calls on Ethereum has fallen since early 2018 highs) and the increasing number of new addresses on Ethereum.

However, when we look at non-DEX Defi growth and adoption (orange), we see that people’s interaction with a growing diversity of Defi applications is increasing dramatically. These opposing trends over time deserve more analysis, but it suggests to us that early in the Defi ecosystem, access to decentralized finance-related applications was largely restricted to DEXes. Now, with more Defi options, people are diversifying their financial activities on Web3 platforms. Though DEX usage is decreasing, people are still putting their ETH to use on a decentralized playing field, without being forced to go through centralized parties.

Overall, therefore, we can say that, though Defi usage as a whole has decreased over the past few quarters in relation to the growing number of Ethereum addresses, that decrease is largely due to a decrease in DEX usage, specifically. We see that non-DEX Defi usage has increased dramatically in the time frame. Because DEXes have been around for much longer, they have a larger user base and skew the data to suggest a decline in Defi usage. In reality, the newest wave of Defi has introduced a host of new protocols, and adoption is rising. From a decentralization perspective, this means the ecosystem is generally moving toward a greater number of active  use cases, dapps, and smart contracts. More options for people to execute decentralized finance means fewer central points of failure for the ecosystem.

Tokens/Coins

Top 10, 100, 1000
Graph 6: ETH Ownership, percentage of total | 2015 – 2019

Graph 6 illustrates ETH ownership of the top 10 (red), 100 (yellow), and 1000 (green) addresses compared to the remaining supply (gray) over time, all of which are demonstrated as a percentage of total cumulative supply as of that quarter.

The story this chart tells is fairly visible. The top 10 and 100 addresses on the Ethereum network are owning a steadily lower percentage of the total ETH holding over time. This downward trend may just be the passive result of increasing supply diluting the percentage of the top whales, but the trend is still significant to overall ownership decentralization. Interestingly, the top 1000 addresses have had a recent increase in their percentage ownership of total ETH supply. Some of the larger accounts from the top 10 and 100 have possibly been ‘pushed down’ into lower tiers in recent quarters, which could account for the recent uptick in the percentage overall of ETH owned by the top 1000 accounts.

Looking at the overall trend since 2015, we see that ETH ownership is becoming more dispersed among addresses. We cannot necessarily assume that more addresses holding smaller amounts of ETH means more new unique individuals participating in the network (addresses are pseudonymous). However, we see the number of non-zero addresses increasing (Graph 3) and the concentration of the top 10 and 100 decreasing alongside each other over time. This could suggest that––contrary to the popular narrative––the crypto bubble was not overwhelmingly followed by whales and holders buying back crypto at all-time lows just to make a buck off the evental market uptick. Rather, the negatively correlated numbers could suggest that more and more new people began accumulating ETH at a steady rate after the bubble burst, which, alongside growing ETH circulation, has reduced the percent concentration of whale holders.

With the ecosystem as young as it is, the unequal concentration of wealth this early on is not necessarily a major red flag for decentralization in the longer term. This is where we return to our methodology in approaching the quantification of decentralization. Rather than attempting to place a judgement today on whether or not Ethereum’s token ownership is ‘decentralized enough’ to perform certain roles in finance, government, business, etc., we instead look at the trend over time, which is moving in an encouraging direction.

Looking ahead, however, ETH concentration in the hands of a few becomes a concern when the network shifts to a Proof-of-Stake (PoS) consensus algorithm in 2020 with the release of Ethereum 2.0. In PoS, influence on the network becomes more closely correlated with ETH ownership, but in theory, owning the amount of ETH required to become a staker is still a lower barrier to entry than becoming a miner. As the Beacon chain grows more functional and as PoS replaces Proof-of-Work (PoW), it will be important to watch out for staking power concentrating in the hands of a few.

Token Circulating Value in ETH
Graph 7: ETH Circulation Volume vs. ERC-20 Circulating Volume | 2015 – 2019

Graph 7 shows the circulating volume of ETH (green line & values on left y-axis) graphed alongside the circulating volume of select ERC-20 tokens (bar chart), which is shown as value in ETH (values on right y-axis). 

The green line on this graph shows the total amount of circulating ETH, i.e. ETH moving between addresses, quarter over quarter. It is essentially correlated to the price of ETH, with the spike in circulating ETH aligning with the price high in late 2017 / early 2018. The bar chart shows the volume of a few significant ERC-20 tokens circulating quarter over quarter. The token volume is represented in ETH units, collected from the amount of ETH they were traded for quarter over quarter (footnote 5). The tokens we measured are the top 10 by market cap plus a few interesting or notable ones we felt were valuable to look at, such as DAI, 0x, Matic, and Loom. 

The purpose of this graph was to see if activity on the network is getting more diverse from both a utility and speculation perspective. What it shows is that, despite a relatively stagnant ETH price recently, the ETH value in circulating tokens is increasing dramatically. Not only is the circulating value of tokens increasing, but the diversity and market share of tokens are increasing, too, suggesting that users are using more ERC-20 tokens and doing more with them across the board. It is particularly important to highlight the four stablecoins that appear on the bar chart in the past few quarters. Stablecoins offer no opportunity to speculate, bolstering our conclusion that the growth in both ETH-value and diversity of ERC20 tokens in the past few quarters is due to people diversifying their activity on the network. This means people have more options, that network activity is not contained in just a few protocols, and that the network is steadily decentralizing along this metric.

Protocol

Mining Pools and Miners
Graph 8: Mining Pool Block Production and Mining Payout as percentages of totals | 2015 – 2019

Graph 8 shows the growing concentration of mining pools over time, as measured by percentage of total block production (top graph) and percentage of total addresses paid out per quarter (bottom graph). In each graph, each color corresponds to the same mining pool––for instance, the green bars on the bottom of each graph are all Ethermine.

The top graph shows the percentage of blocks for which each miner was responsible quarter over quarter. The largest producers are the mining pools. In Q3 2019, for example, we see that Ethermine was responsible for 23.80% of the blocks mined by mining pools that quarter, up from 12.45% in Q3 2016.

The bottom graph shows––of the on-chain addresses that are paid mining rewards––what percentage of those payouts each mining pool was responsible for. It is important to note that this data is only particularly insightful for mining pools that pay out mining addresses directly to users’ on-chain addresses. Mining pools that pay miners out through direct deposit or other off-chain methods cannot be tracked in this data pool.

Over time, we see that four pools have started to dominate the mining pool landscape: Ethermine, F2Pool, SparkPool, and NanoPool. Collectively, they have edged out past competitors like MiningPoolHub and DwarfPool1 over the past year. Today, those four main pools account for over 72% of quarterly block production and pay out to over 83% of the miners across mining pools.

In particular, the data shows a potentially concerning dominance in block production between Ethermine and Sparkpool, which today account for just under 50% of the blocks produced per quarter. Together, Ethermine and Nanopool pay out to nearly 70% of the miners on-chain.

Graph 9: Miner relationship with mining pools, visualized by payouts | 11.03.19

The concentration of influence among a few mining pools is certainly not ideal, but is not necessarily a major concern. Miners are supposedly pool-agnostic; they will migrate to whichever pool offers the best incentives. If we assume rational behavior by miners, if a single pool would reach a hashrate close to 50% or visibly collude with other pools to mount a 51% attack, the miners would abandon these pools to protect their income.

We wanted to test that assumption, so we visualized miners’ relationships to different pools. We pulled the data for payout transactions of all mining pools for the 24-hour period of November 3, 2019. In Graph 9, each of the dense, colored circles around the edges represents a mining pool address, with the cluster of dots around it showing the addresses that received a payout. The red dots in the center received mining rewards from more than one one mining pool (the size of each red dot correlates to the number of payouts). We can see that, compared to the number of miners in each pool, the overlap rate is low, but still indicates that there are miners just within this 24 hour period, who for one reason or another did not remain ‘loyal’ to just one pool.

This is a new data set, and will need more investigation to determine just how agnostic miners really are over time.

Graph 10: Number of miners and mining pools | 2015 – 2019

For the time being, we are still proceeding with the assumption that the concentration of influence among a few mining pools is certainly not ideal, but that miners can be assumed to be pool-agnostic. 

However, the number of mining pools and miners over time––as shown by Graph 10—demonstrates a distinct decline in both over the past year. The graph shows the change in the number of miners (red line and left y-axis) alongside the number of mining pools (orange line and right y-axis). Since the market bubble, both have dropped—in particular, the number of miners maintaining the network through mining pools. In short, what this means is that fewer miners are active on fewer mining pools, and fewer mining pools are responsible for network maintenance.

As a side note, it is important to reemphasize that the number of miners on this graph is not an exact number of mining pool actors. We identified the number of miners in mining pools based on on-chain payout addresses. Some mining pools pay their miners off-chain through traditional methods like bank deposits, and we cannot account for these miners (so the number could be higher when considering unaccounted miners). On the other hand, since we tracked data quarter-over-quarter, it is possible that we have captured some duplication. The miners represented by those red dots in the previous graph would would be considered as two miners in this quarter of 2019 (so the number could be lower if we allow for duplicate-counting).

Overall, mining pools are an area of increasing centralization on the Ethereum network. Lower ETH prices, reduced block rewards, and a fairly stagnant hashrate have meant that fewer miners are incentivized to join the network, and the laws of efficiency have concentrated the influence over the network into the hands of fewer mining pools. 

The switch to PoS in the next year will redefine, or at least reset, this area of centralization. Until then, it is prudent for the Ethereum ecosystem to keep an eye on the concentration of mining pools to ensure we do not trend too closely towards potentially harmful centralization and imbalance of power.

Nodes

Nodes by Country
Graph 11: Node distribution and concentration | 2018 – 2019

Any blockchain network is composed of distributed nodes, which form the core of the network’s infrastructure. They are therefore important to consider in any exploration of decentralization. We focused on geographical distribution of notes over time, expecting that geographic diversity would generally grow over time even if the overall number of nodes had peaked when the price was highest in early 2018. 

We quickly found out that our first hurdle was actually finding this data. Node data is notoriously difficult to gather, harder to validate, and (as we discovered) practically impossible to track historically. The data in this animation is from NodeTracker on Etherscan, which graciously gave us access to the historical data they have, going back to October 2018. This animation shows the node count by country over the last year. 

The animation is a heat map, so the warm colors are the highest node counts, while the cooler colors are the lower node counts. In general, we see some unfortunately bare areas that stay pretty consistent, particularly in Africa and the Middle East. Of countries that have maintained nodes over time, however, we are seeing fairly uniform fluctuations rather than random sudden spikes or drops in particular jurisdictions. And despite the opportunity for greater distribution in parts of Africa and the Middle East, the data demonstrates impressive geographic decentralization across the world and a variety of legal and political systems.

We have to take node data with a grain of salt, of course. As we mentioned, it is difficult to attain, verify, or track over time. And numbers alone don’t tell the complete story of node decentralization. An opportunity for this data is to look at node distribution alongside legal and jurisdictional attitudes towards blockchain. A country with high node concentration but uncertain or increasingly-negative regulations on blockchain could negatively impact the future decentralization of a network. If that country or jurisdiction enacted restrictions against certain blockchain activity (a direct ban of mining or even indirectly affecting adoption through token or website bans), a significant portion of the protocol’s total nodes could potentially fall off, reducing the network’s overall security and possibly shifting power to other areas. Additionally, there could be an opportunity to better understand how power grids are spread across the globe, and understand what portion of nodes could be “taken out” by compromising just a few crucial power lines. We admit we do not know enough about the global power grid to consider today if this is a potential risk, but we believe it warrants more investigation for any geographically distributed system.

Node Size versus ETH Price
Graph 12: Node Count vs. ETH price (left) & vs. Node Size (right) | 2018 – 2019

There has been a lot of fluctuation in the total count of full nodes running the Ethereum network (footnote 6). Right now, for example, the number is about half of what it was around this time last year. On the surface, this looks a lot like centralization––there are fewer nodes overall and presumably fewer people running nodes. There are a lot of reasons why this could be, but here are two factors we looked at to see if there is a correlation.

The graph on the left illustrates the first assumption, which is: Maybe when the price is high, there is greater opportunity to make money, so the number of nodes increases. The graph demonstrates that is actually not the case, at least over the course of 2019, which is (again) all the data we were able to get. When we look at the node count in blue vs. the ETH price in green over time, it looks like there is actually a negative correlation. The node count was at one of its lowest points in June 2019 when the price was highest, and it was quite high during the price dip at the end of the summer. So even if this correlation may have been true historically, it does not seem to be true in today’s ecosystem.

The graph on the right shows the second assumption: Maybe as the average node size grows as more data is added to the blockchain over time, fewer people will see the value in running a node. We graphed the node count in blue against the total size of a default node on the Geth client in red and the Parity client in orange (footnote 7). About 97% of all Ethereum nodes are running one of those two clients, and about 95% of the nodes on the network are default nodes as opposed to archive nodes, which have a much higher data burden. 

Obviously, the average node size is more or less increasing over time as more blocks are mined and more data is stored. It seems reasonable to assume that as the default node size gets bigger, it gets more expensive and takes more energy to keep a node running and synced, so perhaps fewer people are doing so. It looks like that is the case when we look at the graph on the right––or, at least, we see a clearer relationship than there is with the price of ETH. As we move toward PoS and sharding the network, the node size burden will not be as much of an issue, so maybe this attrition will not continue on the same trend. There are also some interesting experiments happening around the ecosystem to find ways to make nodes cheaper and easier to run. We will keep tracking this data set through the changes of the coming year on the network.

Conclusion

Figure 2: Some decentralization subsystems that can be compared across protocol architectures, regardless of consensus algorithm.
Comparing Across Protocols

When we initially scoped out the value of measuring decentralization and our approach to this research, our goal was to come up with a framework that we could apply to any blockchain protocol. We are, after all, advocating against maximalist thinking, so there is not a lot of utility in focusing just on Ethereum when we are trying to create a comparative metric. 

The reality has been a lot more complex. When we began comparing subsystems of decentralization across protocols, we recognized quickly that most do not translate easily across different architectures. Decentralization means different things depending on the consensus algorithm and the amount and diversity of activity on the network; maximal decentralization for a Proof-of-Authority (PoA) blockchain looks very different than it would for a PoW or a PoS blockchain. The next step in our project would be to identify as much objective data across protocols as possible in order to compare subsystems of decentralization on each protocol.

So what conclusions can we take away from all this? From our initial five questions, at this stage in the research, we can fairly confidently determine:

  1. Is Ethereum actually getting more decentralized over time?
    1. Yes. Most data––particularly function call diversity, Defi growth, token distribution, etc.––demonstrate that Ethereum has grown more decentralized over time.
  2. Are there metrics that show the network getting more centralized?
    1. Miners and mining pools continue to be the greatest areas of centralization over time, and could indicate more centralization down the line. We suspect that node attrition is also an area of significant centralization, but we don’t have enough historical data to prove it concretely here. 
  3. Does the data reveal areas we need to focus on addressing or changing?
    1. With the launch of the Beacon chain in early 2020, we will have both PoS and a PoW (through Ethereum 1.x) in existence (even though the Beacon chain will not yet be operational). We will need to focus on continued mining concentration with the continuation of Ethereum 1.x, while also keeping an eye on the continued rollout of the Beacon chain to ensure unique PoS centralization does not arise, particularly in staking power.
  4. Given the trends we’re observing, can we make any meaningful predictions about the future?
    1. Token ownership will continue to decentralize on Ethereum. That trend combined with the move toward staking in PoS (and away from energy and PoW), which should lower the barrier to entry to be a steward of the blockchain, suggests that we are on a steady path towards fairer distribution of governance across the network.
  5. Which of these metrics can we compare across protocols?
    1. Very few, and they are particularly difficult to compare across protocols with different foundational architectures, namely different consensus algorithms.

In addition to the provisional conclusions of our initial set of questions, we came up with a few additional, big-picture conclusions about the state of decentralization in the blockchain ecosystem:

  1. Over the lifetime of the network, there has been much greater complexity and more layers of activity happening off-chain, which we cannot easily observe through available data. That will be even more true over time, so in some sense this quantification of decentralization will get even harder to track. But that also makes the security and decentralization of the base settlement layer that much more important to watch on mainnet.
  2. It is crucial to have accessible historical data about any public network so everyone can understand how the blockchain is evolving. In the Ethereum ecosystem, we are lucky to have such robust tools such as Alethio and Etherscan. But data can be surprisingly hard to find, let alone to interpret this very  zoomed-out, big-picture data. Thank you to Etherscan.io for giving us access to some of their non-public historical Node Tracker data and to our colleagues at Alethio, especially Danning Sui and Momo Araki, for helping us pull this data and create these visualizations.
  3. Lastly: The larger point this piece makes is not that we are all doing an especially good or bad job at decentralizing Ethereum, or even to make a value judgment about our general progress. It is obvious that activity on Ethereum is getting more diverse, that developer mindshare is growing all the time, that we are making steady progress on the security front, and that the introduction of use cases like Defi have resulted in interesting advancements we did not cover in this article. Even though no one has determined how to measure decentralization (let alone define what is ‘sufficiently decentralized’ or ‘maximally decentralized’), it is safe to argue that Ethereum is far ahead of most other protocols.

The benefit of working with open, permissionless systems is transparent access to data. The challenge, of course, is that the sheer amount of data requires separating signal from noise and identifying the most crucial and interesting information to dissect. We hope the information presented here has painted a clearer picture of Ethereum’s decentralization. We will continue to track and refine all of these metrics as we strive towards a more objective measurement of decentralization. 

We hope our colleagues in the blockchain ecosystem will add their own thoughts on the subsystems, discoveries of new data, and interpretations or findings. In the meantime, the graphs pictured here are available on Alethio’s public Tableau, under the “Measuring Decentralization” dashboard. 

Please contact us with feedback at [email protected].

Footnotes
  1. Walch, Angela. “Deconstructing ‘Decentralization: Exploring the Core Claims of Crypto Systems.” SSRN, 2019.
  2. The sudden spike in new and active addresses in Q4 2016 is due to the DDoS Shanghai Attack during Devcon 2 in China.
  3. That definition of “meaningful non-zero” is, of course, quite arbitrary. We invite any feedback for suggestions of how to more accurately define that threshold.
  4. Including 29 protocols of dYdX, 0x, TokenJar, Airswap, Kyber Network, IDEX, STARBIT, Paradex, RadarRelay, TheTokenStore, DDEX, EtherDelta, TheOcean, OasisDex, ETHERC, Ethfinex, Uniswap, Loopring, imToken, Eidoo, MakerDAO, Compound, Synthetix, MolochDAO, Augur, NUO Network, Set, InstaDapp.
  5. The token circulating volume is pulled from on-chain DEX trading data. This data is pulled only from ETH-to-token trades, not from direct token-to-token trades. That being said, most DEXes execute token-to-token trades by using ETH as a bridge, so those trades are captured in this data.
  6. A default full node receives new transactions and data, verifies state, and stores recent state for syncing purposes. They have the full history of every block and every transaction. An archive node has a complete record of historical states for every account and contract on the entire blockchain, not just transactions.
  7. The significant drop in the default Geth node size in July 2019 coincides with the release of version 1.9.0, which reduced the database size, among other changes. See this blog post for more detail: https://blog.ethereum.org/2019/07/10/geth-v1-9-0/.

This article reflects the research and conclusions of the authors and does not necessarily represent ConsenSys’ official conclusions.

***

About the Authors

Everett Muzzy is a writer and researcher at ConsenSys. His writing has appeared in Hacker Noon, CryptoBriefing, Moguldom, and Coinmonks.

Mally Anderson is a writer and researcher at ConsenSys. Her writing has appeared in MIT’s Journal of Design and Science, MIT’s InnovationsQuartz, and Esquire.