How To Safely Migrate Your Ethereum 2.0 Validator Client
This article is the last in a four-part series on how to run your own Eth2 validator. If you are new to this series, be sure to check out Part 1: Getting Started, Part 2: Setting Up Your Client and Part 3: Installing Metrics and Analyzing P&L.
Back in November 2020, I set up an Ethereum 2.0 validator client on Amazon Web Services (AWS). Since then, the Beacon chain has launched and I’ve written two additional pieces documenting the journey. This included registering for Infura as an Ethereum 1.0 endpoint, installing and setting up Teku as an Eth2 client, and analyzing my node’s metrics.
This last installment will be about safely migrating my validator from one virtual service provider (AWS) to another (Digital Ocean). Having the same validator keys on two different instances could result in the slashing and freezing of my staked ether, which would be not good.
Also, Proof of Stake blockchains present a unique trust issue to new clients syncing. We’ll discuss this and ways to solve it.
Here we go!
- Initializing new instance
- Syncing and Weak Subjectivity
- Slashing Prevention
Initializing New Instance
In Part 1, I mentioned that I was considering using my 8GB RAM Raspberry Pi but didn’t want to have to worry about internet connection, making the site power is still on, overheating and speed, or if my dog kicks over my laptop when I’m away. After a 80 hour power-outage in Texas last week, I’m really glad that I decided to go with a cloud based service instead.
Some readers were dubious about my decision to spin up an AWS 16GB node for a single validator and those readers have been proven correct. Before genesis, it was difficult to definitively say much about the upper-end network load. And there are still concerns about the increased load brought on by the merge of Ethereum 1.0 and 2.0 scheduled for this year.
AWS is great for large fleets of validators. For my single validator, however, I’ve decided to migrate to Digital Ocean. (See Part 2 for the virtual / local hosting discussion I had).
Another reason for me bulking up the validator, perhaps unnecessarily, may lie in my concept of what a miner is in a major blockchain: Huge server farms on mainland China or, in 2017, dozens of GPUs making a friend’s garage suffocated with heat.
Matt Garnett helped correct this bias by reminding me about the original design premise for Ethereum 2.0: security of the network uncoupled from enormous amounts of computing power. He pointed out the “raspberry pi” computing unit benchmark proposed by Gavin Wood in 2015. Even in 2019, Justin Drake spoke publicly about his hope for Eth2 validators to run on the new Raspberry Pi Model 4:
Digital Ocean Droplet
Using Mara Schmiedt and Collin Meyers’ Validator Guide, I purchased the “Basic” Droplet (Digital Ocean’s terminology for an instance) with 8GB RAM and 160GB memory. For current network conditions, it’s probably overkill on the memory, but the 4GB RAM Droplet is not enough processing power.
If you choose Ubuntu 20.04 as the operating system, you can follow Part 2’s setup (you can also follow Somer Esat’s excellent installation tutorials). You do not have to specify networking rules for P2P exchange because Digital Ocean has those ports open by default (???). For that reason, it’s very important to set up SSH and disable root login.
(The one weird quirk I found with Digital Ocean was adding an SSH key after deploying the instance. This seems to be a common issue with folks, so if you’re having trouble with this, here’s a canonical thread I found to be helpful.)
You’ll also want to set up SSH because we’re going to use scp to export a few things from Instance 1 (AWS) to Instance 2 (DO):
- Validator Keys
- Current Network State
- Slashing Protection.
The validator keys are needed to run our Teku client as the validator on the new instance. The Current Network State and Slashing Protection are crucial for making our migration as safe and fast as possible.
Proof of Stake, the new consensus mechanism for Ethereum 2.0, has significant differences compared to Proof of Work. One of those is the concept of finality: when the Beacon chain finalizes an epoch, it’s taking a snapshot of all the activity and balances on the network. That snapshot, called a checkpoint, might as well be its own genesis block. The network is not going back.
It’s a common misunderstanding that Proof of Work also offers this finality. In fact, Proof of Work chains, like Bitcoin, never fully guarantee the chain won’t be reorganized. It’s more that, over time, the probability of a chain reorganization becomes successively smaller with each block confirmation. At a certain point, the probability of a reorganization for a particular block becomes infinitesimally small. This is why, on Bitcoin and Ethereum 1.0, a transaction is considered “safely included in the chain” only after a certain number of blocks are confirmed after the one containing it.
There is a security weakness in finality, though. From Teku docs:
If ⅓ of validators withdraw their stake and continue signing blocks and attestations, they can form a chain which conflicts with the finalized state. If your node is far enough behind the chain head to not be aware that they’ve withdrawn their funds, the exited validators can trick you into following the wrong chain.https://docs.teku.consensys.net/en/latest/Concepts/Weak-Subjectivity/
Well-behaved validators who have successfully and properly exited the chain can sell their private keys on the black market to a malicious actor. (There is no financial disincentive for them to do this as their funds have safely exited the protocol) That malicious actor can then amass enough keys to find validators coming back online after quite a bit of time and commit a Sybil attack. See Meredith Baxter’s excellent explanation below:
What’s the solution? Weak Subjectivity Checkpoints. These are pointers to a relatively recent network state confirmed by a majority of validators. If a node with relatively scarce network information wants to sync to the Beacon chain, they can start with the genesis block and the weak subjectivity checkpoint. As the node communicates with other peers, they can check to make sure they haven’t been led astray by making sure they end up with the correct network state reflected in the weak subjectivity checkpoints.
Where does one get these checkpoints? That’s a tricky question. Teku Product Lead Ben Edgington shares this insight:
It’s up to the user to set their trust level and act accordingly. One suggestion is for client teams to set the checkpoints since they are implicitly trusted by their users in any case. As a client dev, I don’t really like this, but I suppose it’s the reality. If a bunch of block explorers, the EF, all the client teams, a few exchanges, some staking services, are all advertising the same checkpoint you’re very unlikely to go wrong. Having a diversity of inputs is good to avoid cartels.
Teku provides the start-up flag –ws-checkpoint which accepts the checkpoint for syncing.
Another option with Teku is
--initial-state. This is only available on Teku right now and requires a path or URL to an SSZ-encoded state file. It reduces the sync time to sometimes seconds which is fantastic particularly if you’re concerned about validator downtime.
For now, the best sources for
--initial-state are your own. The best use “is if you’re maintaining a number of nodes and need to spin up new ones from time to time,” according to Teku Blockchain Protocol Engineer Adrian Sutton. This is what I’ll use when switching on my new validator instance. Later in this post, I’ll show you how to export it safely from Teku.
The last concept to discuss before migration is the ever-dreadful sounding slashing. Slashing is the financial disincentive against validators for submitting bad data to the network. The penalty is forfeiting of a portion of your stake and being politely escorted to the door. Beaconcha.in shows there have been 133 validators slashed as of this writing (although, curiously, Beaconscan lists 132?).
Needless to say, we want to avoid being slashed. Luckily, slashing is only for behavior that violates the protocol. We don’t get slashed simply for inactivity.
However, a common reason for slashing is the exact circumstance I’m attempting now: An individual inadvertently running the same validator key on two different instances. This appears to the network as a validator acting maliciously, as they could appear to be attesting to two different network states.
Luckily, there exists slashing protection in the form of EIP-3076, “A standard format for transferring a key’s signing history allows validators to easily switch between clients without the risk of signing conflicting messages.” It’s a JSON file with a list of all the blocks and attestations the client has made. It’s exported by one client and consumed by another in a separate process from actually running the node. In Teku, we will export our slashing protection file from our first instance using the command teku slashing-protection export
We’ll then send the slashing protection file to the new instance and feed into the new validator client with the following command: teku slashing-protection import
We do this before turning our validator on for the first time to prevent our client from accidentally submitting slashable activity.
(For more information about EIP-3076 and slashing, please check out Ethereum Cat Herders’ recent episode interviewing Sacha Saint-Leger, Michael Sproul and Danny Ryan about its development and implementation.)
With those two concepts out of the way, let’s get down to the nitty-gritty. It’s not hard to do this, I just had to triple-check that I had the steps correct. I would advise anyone attempting it to do the same! Jumbling them up would be problematic, to say the least. Here’s the rundown:
- Download initial state from the first Teku node (AWS)
- Stop first Teku node (AWS)
- Export slashing protection data from first Node (AWS)
- Transfer initial state and slashing protection data from first Teku node (AWS) to second Teku node (DO)
- Import slashing protection data to second Teku node (DO)
- Start second Teku node (DO) using initial state from second Teku node (AWS)*
*For the extra-paranoid, Meredith Baxter suggests starting the second Teku client with –p2p-enabled=false while the client is consuming the initial-state to prevent communication with other nodes. If you do this, be sure to restart the second Teku client without
--p2p-enabled=false while the client is consuming the initial-state to prevent communication with other nodes. If you do this, be sure to restart the second Teku client without
--p2p-enabled=false to allow you to communicate with the network
Here are the commands for each of the steps, broken down and detailed:
1) Download initial state from the first Teku Node (AWS)
This has to be done while the first Teku node is still running. Download your current network state from your Teku client’s API by entering the following API call in the first node’s terminal (we’re assuming Teku is running in the background):
curl -X GET "http://localhost:5051/teku/v1/beacon/blocks/finalized/state" --output initial-state.ssz
This will download the initial state as
initial-state.ssz from whatever directory you’re currently in.
2) Stop first Teku node (AWS)
I’m assuming you have a similar setup to Part 2 (Ubuntu 20.04), specifically the systemd service we set up for Teku. If that’s the case, stop the first node with:
sudo systemctl stop teku
Double check it has indeed stopped by running:
sudo systemctl status teku
3) Export slashing protection data from first node (AWS)
Now that the node has stopped, we need to get the slashing protection schema. To do so, run the following command:
sudo teku slashing-protection export --to=slashing-protection --data-path .local/share/teku/
This exports the slashing protection file named slashing-protection.
4) Transfer initial state and slashing protection data from first Teku node (AWS) to second Teku node (DO)
Since both of my service providers have SSH setup, we can use scp to copy the network state and slashing protection from one to the other. There might be a way to pipe these, but the stuff I read suggested scp doesn’t allow piping (if someone can find that, let me know!)
Here are the commands to transfer the two files out of the first node (AWS) to our desktop assuming you don’t change the filenames from above:
scp -i PATH/TO/SSH/KEY [email protected]_INFO.REGION.amazonaws.com:/home/ubuntu/initial-state.ssz ~/Desktop
Here are the commands to transfer the files from your Desktop to your second node:
scp -i PATH/TO/SSH/KEY ~/Desktop/slashing-protection [email protected]_NODE_IP:~
scp -i PATH/TO/SSH/KEY ~/Desktop/initial-state.ssz [email protected]_NODE_IP:~
5) Import slashing protection data to second Teku node (DO)
Before we start our second node, we need to feed in the slashing protection to make sure we don’t get slashed.
sudo teku slashing-protection import --data-path=/var/lib/teku --from=./slashing-protection
You should get a success message from the client when it’s done.
6) Start second Teku node (DO) using initial state from second Teku node (AWS)
We use the same systemd service script for Teku that we used previously with one exception:
ExecStart=/home/ubuntu/teku-20.11.1/bin/teku --network=mainnet --eth1-endpoint=INFURA_ETH1_HTTP_ENDPOINT_GOES_HERE --initial-state=FILENAME --validator-keys=/home/ubuntu/validator_key_info/KEYSTORE-M_123456_789_ABCD.json:/home/ubuntu/validator_key_info/validator_keys/KEYSTORE-M_123456_789_ABCD.txt --rest-api-enabled=true --rest-api-docs-enabled=true --metrics-enabled --validators-keystore-locking-enabled=false --data-base-path=/var/lib/teku
--initial-state=FILENAME flag is where we put in the location of the initial-state.ssz file we transferred in from our first node.
Previous versions of Teku required us to remove this command from the systemd service once it had started, but newer versions can ignore it after the first run. Check the version and adjust accordingly.
Once we have altered and saved our systemd file, we reboot the systemd service to implement the changes, then start Teku and cross our fingers!
sudo systemctl daemon-reload
sudo systemctl start teku
I quadrupled checked to make sure my first node was not running before running the start script, FYI!
As before, check to make sure Teku is booting up and running okay:
sudo systemctl status teku
If you get any errors, you can get more details by running:
sudo journalctl -f -u teku.service
Once you’re satisfied it’s running okay, run the following command to make sure it restarts if anything happens:
sudo systemctl enable teku
Thus concludes the final installment of this first series. I’m sure there will be more excitement as developments continue and we’ll do our best to update this and provide more resources when needed.
Thank you: Aditya Asgaonkar, Meredith Baxter, Ben Edgington, Adrian Sutton, Alex Tudorache, and James Beck.