How Ethereum Works Part 1: Cryptography, Consensus, and Transactions
By Mike Goldin
This is Part I of a two part series and focuses on understanding Bitcoin. Part II, “Ethereum: Bitcoin Plus Everything” focuses on the ways in which Ethereum extends Bitcoin’s blockchain technology.
Time sure does fly. A new generation of programmers are already leapfrogging over Bitcoin and directly into Ethereum. These programmers may never have taken much interest in cryptocurrencies, but they’re drawn to Ethereum by its provisioning for trustless transactions of arbitrary (Turing-complete) complexity. While I can’t blame them for wanting to jump right into Ethereum, Ethereum derives a huge share of its underlying technology directly from Bitcoin. This series of articles will describe just enough of Bitcoin for budding Ethereum developers to better understand how Ethereum works under the hood, then begin to explore some routes by which new Ethereum devs can begin developing contract-orientation just like they developed their sense of object-orientation back in the day.
A blockchain is fundamentally a public record of state changes. Anybody can audit a blockchain’s state changes over time and prove for themselves with mathematical certainty whether those transactions were made in accordance with the blockchain’s rules. In the case of Bitcoin, those rules are simple: Bitcoins can’t be double-spent, and their origin must be traceable back to the mining of a valid block (more on mining later). To begin our journey, I recommend reading the Bitcoin white paper even if you know nothing about cryptography. We’ll loop back to the cryptographic underpinnings afterwards. The Bitcoin white paper is short, and only the first six sections are of real interest to us.
Cryptographic Underpinnings of Blockchain Transactions
As somebody with absolutely no background in cryptography, I found myself confused beginning with the white paper’s description of transactions. We should understand that this is a scheme designed to prove ownership over an asset at a given time by tracing that asset’s history of owners up to the present. But what’s a public key? Hashing? How do these help us to prove ownership of an asset? If you’ve never taken a cryptography class, watch the videos from Khan Academy below for a good high-level description of these important functions.
Now let’s look back at the white paper’s transaction diagram. Here we are hashing both Owner 2’s public key and the previous transaction (in which the Bitcoin entered Owner 1’s possession) into a single digest (the output of the hashing function). This digest is then signed with Owner 1’s private key. In the future, anyone can verify Owner 2’s claim on that Bitcoin by inputting the state change she/he insists occurred to a signature verification algorithm: a sum of Bitcoin and Owner 2’s public key (signifying a transfer of that sum to Owner 2), Owner 1’s public key (signifying that this transfer of Bitcoin originated from Owner 1), and the transaction’s signed form (a digest available publicly on the blockchain, recorded there when the transaction took place). Only if the exact state change Owner 2 specifies was indeed signed with Owner 1’s private key will Owner 1’s public key assert the signed transaction digest’s validity relative to the signed transaction on-chain. This process can be repeated recursively going back to the Bitcoin’s origin to prove the complete ownership chain is valid.
You probably have a lot of questions at this point and are already considering edge cases — how does the ownership chain begin? How are specific sums of Bitcoin specified for transfer? Amazingly, there’s not much more of Bitcoin you need to understand for Ethereum’s sake. Next we’ll cover sections three through six of the whitepaper and figure out how some of those edge cases are resolved.
The Role of Mining in Creating Network Consensus on the Blockchain
All bitcoin transactions are broadcast to the entire network, and these transactions are collected by miners who verify those transaction’s validity (essentially using the method described previously) and include all valid transactions into a “block.” The contents of the block are then hashed with an incrementing random number (called a “nonce”) until the resulting output contains a certain number of leading zeroes. The network dynamically adjusts the requisite number of leading zeroes (or the “difficulty”) so that a block is mined every 10 minutes on average.
Because the results of hashing algorithms are unpredictable, finding a valid hash which the rest of the network will accept requires both luck and CPU power. The more computational power one possesses, the greater their chances of finding a valid hash before anyone else in the network. When a valid block is “mined,” it is broadcast to and tested by the rest of the network. The other nodes test whether the included state transitions make sense given the most-recent canonical block, that all the transaction signatures pass for valid and that the block and the provided nonce hash to a valid digest before accepting the block as the newest canonical network state. The block is then added to the head of the existing “chain” of previously mined blocks in each node’s local database.
How does the ownership chain begin? The Bitcoin protocol specifies that every block is allowed to include a transaction of 25 BTC to the block miner, in which those 25 BTC are created out of thin air. This incentivizes miners to support the network. A Bitcoin can be verified as non-counterfeit if its ownership chain can be traced back to one of these special transactions. Indeed, this is the only way that new Bitcoin can be created.
Bitcoin is safe so long as at least 51% of the computational power for all the nodes connected to the network is controlled by honest, non-colluding users. A malicious user controlling 51% of the network who pays for his coffee with Bitcoin could drink his brew and then mine a block (starting at the state just before his transaction) which fraudulently omitted his coffee purchase. Even if the network had gotten a few blocks ahead of him by the time he started mining his fraudulent chain, with 51% of the network’s horsepower he would inevitably catch up. Because the Bitcoin protocol requires that the longest chain be accepted as canonical, the malicious user would get away with his theft even if there existed abandoned sub-trees of the blockchain suggesting his deceit.
Extra Credit: UTXO in Bitcoin
In Bitcoin, one’s holdings do not comprise a pool of infinitely divisible currency units from which to withdraw and deposit. Instead, Bitcoin users deal in “unspent transaction outputs” (UTXO). If a miner wants to spend some of their freshly-minted 25 BTC on a 1 BTC coffee, they have to specify the entire 25 BTC (a UTXO) as the transaction’s input, and specify two outputs for the transaction: one output which sends one BTC to the vendor’s address, and a second output specifying 24 BTC be sent back to the purchaser’s address as “change”. To buy another coffee, that 24 BTC comprises a single UTXO. UTXOs which are combined in a transaction input to exchange for more expensive items are received by the vendor as a new, single UTXO. Ethereum does not use UTXO — it’s account balances are divisible.
You now have all the knowledge you need to understand the fundamentals of how the Bitcoin blockchain works. It’s head-spinning stuff! Don’t feel bad if you don’t totally get it yet: try sleeping on it for a few days and it’ll start coming together in your dreams. Once it does, you might be able to begin imagining how something a lot like the Bitcoin blockchain could be used to verify the public outputs of arbitrarily complex computations when the inputs and program code are publicly shared as well. In fact, that’s what we’ll explore in Part II of this series, “Ethereum: Bitcoin Plus Everything.”