Bonsai Tries: A Big Update for Small State Storage in Hyperledger Besu
Open-source project Hyperledger Besu recently released Hyperledger Besu version 21.1.0. Hyperledger Besu is an Ethereum client for both mainnet and private networks, part of our ConsenSys Quorum product suite. This version contains many changes and improvements to the Java-based Ethereum client compatible with Mainnet and private networks. Most notably is the addition of an experimental Bonsai Trie storage format which marks a major change in the ecosystem.
From a Forest of Tries to a Bonsai Trie
Previously, Hyperledger Besu, like most Ethereum Mainnet clients, uses what’s called a “forest of tries” to store Ethereum’s complex and vital state. (As a refresher, state is defined as a set of variables describing a certain system at a specific time, which you can read more about here.) Pronounced, “try,” these data structures, when strung together (as trees), snapshot what the Ethereum blockchain looks like at a point in time. Transactions, account updates, smart contract changes, and metadata are all stored in tries within tries (for an explainer of Merkle trees see our handy infographic here).
As Ethereum gains more and more adoption and use, its “state” gets increasingly large. Archiving it and synchronizing it across the network becomes challenging and can take weeks to fully “archive” on-chain data for audit (with about 8.4 TB of data at time of writing). As the state gets larger, reading and writing to the network takes more time. It can even cause storage challenges among the nodes that run on Ethereum for day-to-day use. Over time, storage overhead accumulates and can make running nodes prohibitively expensive to those without specialized or expensive hardware.
How It Works
Enter Bonsai Tries, a new storage format for Besu. The experimental “bonsai” flag, when set, starts a Besu node with a new “flat” storage structure. Instead of keeping large “trees” within storage, Bonsai keeps only the most recent trie in its storage as well as a trie log layer. This log layer provides a smaller store of changes that, when needed, can be used to construct the complete history of the tries, not just the most recent. This reduces storage and offers much faster times for nodes to read any data about Ethereum’s current state (O(1) account lookups). It also makes accessing the most recent data on the blockchain much faster, though lookups farther back take a little more time. To get into the weeds, the bonsai approach also provides implicit tree pruning, reduced disk usage by nodes, faster synchronization of nodes, and a reduction of the client “running blocks behind” the network.
Now let’s look at some numbers. When we sort by the storage approach, a full archive of the Ethereum network is less than 1TB. Our tests showed in the older Forest mode, nearly 12.5TB were needed for the same archive. Even nodes running in Fastsync mode (designed to operate faster and lighter than a full archive), become much larger than those running Bonsai over time. When paired with Fastsync, Bonsai synchronizes with the Ethereum network 11 hours faster in our tests than the Forest mode. For those running nodes over time, this is a marked improvement for infrastructure. After two months, we recorded a ~554 GB difference in size between Fastsync Forest and Bonsai nodes. All of this culminates in faster read and storage performance for Besu nodes everywhere.
Get Started Today
We are so excited to get this update into your hands. This feature is experimental and can be set with the flag:
--Xdata-storage-format=BONSAI when starting up an empty Besu node, so you may want to avoid it in production environments without sufficient testing. We would love your feedback on the new features (and any bugs you might find); come join us on our developer community Discord! If you’re new to developing with Besu, check out the docs here and try running your own Besu node by following the ConsenSys Quorum Developer Quickstart.