My Journey to Becoming a Validator on Ethereum 2.0, Part 2
Note: You can still become a validator on the Ethereum 2.0 network! There will be a wait-time to be activated as a validator — as of December 4, 2020, the wait-time is roughly 9.9 days. See the steps to staking in Part 1 of this series.
This article is the second in a four-part series on how to run your own Eth2 validator. If you are new to this series, be sure to check out Part 1: Getting Started, Part 3: Installing Metrics and Analyzing P&L and Part 4: Safely Migrating Your Eth2 Node.
- Validator Configuration Considerations
- Future Proofing Hardware
- To Run or Not To Run an Eth1 Client?
- Virtual vs. Local Hosting
- Eth2 Client Choice and Avoiding Penalties
- Setting up AWS instance
- Operating System
- SSH Keys and Instance Launch
- Installing Teku
- Install binary
- Create non-root user
- Create systemd service
This is a post I’m writing as an employee of ConsenSys and someone planning to stake on the beacon chain. The former statement means I prioritize ConsenSys products (ConsenSys products are typically best in class for Ethereum, and I also have access to engineering teams who can help me answer questions and troubleshoot). The latter statement means I’m optimizing for cost and ease of use: I don’t have thousands of ETH to yield substantial rewards, so I am taking some shortcuts. These are decisions I’ve made to make staking on Ethereum 2.0 as straightforward and accessible for individuals as possible but come with trade-offs to decentralization and privacy. However, you can follow the broad strokes of the tutorial below and make different choices. In fact, if you can do that, I would encourage you to!
Last, staking in Ethereum 2.0 is highly experimental and involves long-term commitment (I’m allotting three years). Please do not participate if you are not comfortable with the high-level of risk attendant with something still in development. If you’re unsure about whether you should stake, please join the ConsenSys discord and ask in the #teku-public channel.
In the previous post, we discussed the reasons for Ethereum 2.0’s deployment and walked through staking 32 ETH in the Ethereum 1.0 mainnet Deposit Contract. We touched on key generation and how the staking process on Launchpad links Ethereum 1.0 to 2.0.
On November 23rd, the minimum amount of staked ETH to launch Ethereum 2.0—about 524,288—was met. People can continue to stake in the contract and the number of validators has risen to over 33,000 as of Dec 4th.
While it was extremely exciting to make it into the Genesis block as a validator, seconds later I had a similar experience to my colleague Aaron Davis in our internal ConsenSys staking channel: For what crazy task had I signed up? Luckily, I have access to incredibly brilliant and technical people charitable enough to give this rube some tips, pointers and insight about running staking infrastructure. I hope to pass on a fraction of that valuable advice here to any other interested parties.
That’s what the first part of this article will be: What are some things you should consider when picking hardware and software to run an Ethereum 2.0 validator client? The second part will walk through the specific hardware / software combination I have chosen for my validator client. The choices you make for your configuration will depend on your resources, personal inclination and technical capacity. I’ll do my best to highlight how personal biases and circumstances informed my choices.
Last, before we jump into it, I want to reiterate these posts are almost like journal entries for my experience staking 32 ETH (albeit journal entries with extensive technical asides). As such, I may change my approach a bit as the beacon chain progresses. For example, I went in thinking I would definitely be using AWS. As you’ll read below, I’m now reconsidering that. I’m also going to be very clear about the financial element of staking. Staking is a way of supporting the Ethereum ecosystem, but for sustainable public use, it also should be accessible to and possible for folks who have the ETH to do so.
Future Proofing Hardware
The basic requirements for running a validator today are relatively easy to satisfy. Mara Schmiedt and Collin Meyers’ Validator Guide on Bankless has a good rundown of minimum requirements. The most challenging aspect of determining Ethereum 2.0 validator equipment is balancing the current needs of the Beacon Chain Phase 0 with any future currently-unknown demands as development continues. (This is not a concern if you’re comfortable maintaining close watch of your validator and able to quickly and easily address changes)
Ben Edgington, Teku Project Manager and publisher of Eth2.news, provided me with some edge cases where the network might demand more of the validator client. Short-term, the biggest concern would be something like the Medalla time server incident, in which a bug and subsequent fix in the Prysm client halted finalization on the testnet (roughly speaking, the network couldn’t “produce blocks”). Since there was no finality on the network (no “confirmed blocks”), validators had to hold many more transactions in their RAM than normal.
Machines with 8GB RAM—which would have been more than enough under normal circumstances—began encountering “out of memory” issues which may have led to slashing. A spike like this, though unusual, is not out of the question for Phase 0 mainnet.
Future configuration ambiguities for the network are the merging of Ethereum 1.0 and 2.0 and the introduction of sharding. We still don’t know when those merges will happen or exactly what they will require. I’d like to have a strong CPU backbone going into Phase 0 (4 virtual core, 16GB RAM with 100GB SSD) and then focus my attention for future adjustments around storage space (leaving wiggle room here). Please note this may turn out to be overkill and eat up staking rewards.
Those are the “known unknowns” of the future, let’s discuss the main configuration decision points for validators today.
To Run or Not to Run an Eth1 client?
It’s a rite of passage I try to subject our bootcamp students to as often as possible: syncing an Ethereum 1.0 client. It’s an open secret that actually hosting a “full” Ethereum 1.0 node is an act of love rather than an hardened, Prisoner’s Dilemma solution. “Full” must be put in quotes because even Ethereum core developers have a tough time agreeing on a definition of what “full node” actually means.
One thing we can all agree on: It takes more time and storage to sync an Ethereum 1.0 client than you’d imagine. Our validators need to have a way of querying the Ethereum 1.0 network database (we’ll get into why a bit later). If you’d like to save the space and headache of syncing locally, you can use an Infura endpoint, which is available for free with registration.
Even though this saves significant storage and logistical concern, it does sacrifice a certain amount of decentralization for the Eth1 and Eth2 networks simultaneously. If Infura were to go down, which is exceedingly rare but does occur, that would cause a ripple effect across the Ethereum 2.0 validators using it for their Ethereum 1.0 endpoint. Something to consider!
(Just to be clear: the difficulty of syncing an Ethereum full node has to do with the nature of the Ethereum world state, not with the Ethereum 1.0 core developers who have done an amazing job dealing with this extremely challenging problem.)
Virtual vs Local Hosting
The next consideration for me was hosting a validator node locally (in my home) or virtually (on a virtual service provider like Amazon Web Services (AWS), Digital Ocean, etc.). As I mentioned in the previous piece, I don’t trust myself to run a consistent validator node from home, even on an old laptop stored away somewhere. Clumsy and goofy, I would either kick it over or forget about it.
I’m opting to run my node on AWS because that’s what I’m most familiar with (After going through this whole process, however, I’m second-guessing this decision. I’ll discuss this later). The trade-off here is again decentralization: If AWS goes down or has any issues (like it did recently), I’m at their mercy. If enough people are running nodes on AWS, it can affect the larger Ethereum network.
People will probably self-select for this one. Local hosting is a special kind of project and not everyone has the time, resources or commitment required. While virtual hosting is a centralizing force, I’m opting to go with it due to its ease-of-use and (hopefully) guaranteed uptime.
If you would like to run a validator node locally, you can still follow the general direction of this tutorial, Somer Esat’s excellent series of tutorials with different clients or even sync a Raspberry Pi Model 4B with 8GB RAM with Ethereum on ARM. (Syncing on Raspberry Pi is still very experimental and folks should wait till Ethereum on ARM team confirms its stability)
Eth2 Client Choice and Avoiding Penalties
Another major choice is the Ethereum 2.0 client among the existing clients: Lighthouse, Lodestar, Nimbus, Prysm and Teku. I am heavily biased towards Teku and not savvy enough to debate the finer points of the clients. This article is a decent overview of client performance on Medalla. (Keep in mind the Medalla demanded much more from processors than mainnet will.)
Proof of Stake incorporates explicit disincentives to encourage correct behavior on the network. These take the form of dinging validators for being offline and the more dramatic move of slashing actors for taking malicious action against the network, knowingly or otherwise.
The most common issue will be making sure your validator is available to the network. This means making sure your validator is online. There’s the DevOps-approach to this issue—creating the monitoring and alerting system to alert you if your validator goes offline—that I won’t cover here.
But there is another way to be unavailable to the network, and that is if your client begins misbehaving for one reason or another. After the time server bug caused a network slowdown on Medalla, Eth2 core developers came together to develop “[a] standard format for transferring a key’s signing history allows validators to easily switch between clients without the risk of signing conflicting messages.” Not all clients have this protection, but Teku does. Here’s where you can read about using Teku’s Slash Protection (runs by default) including migrating between Teku nodes or a non-Teku to Teku node.
If you do have trouble with your client and need to restart the entire network, you need to be aware of Weak Subjectivity. Proof of Work allows anyone to go back to the genesis block of the network and trustlessly build the network state from scratch. Proof of Stake, however, has a catch in that regard: If a third of the network validators at a certain point exit yet continue to sign blocks and attestation, they can alter the network state and feed that false data to a node syncing to the network if the malicious actors catch the syncing node before the syncing node reaches where the validators withdrew their funds. You can read more about it here, but the short answer is you need to have either a “weak subjectivity checkpoint” or an encoded state file, essentially an archive of the network. Teku provides start-up flags for both.
The last penalty concern is the most serious: Slashing. Slashing occurs when a staker works against the rules of the network. It’s different from getting penalized from being offline. It’s actively working against the network by submitting conflicting validator information. There are some really great materials for learning more about slashing and how to prevent it:
The main takeaway is don’t run one validator key on multiple clients. Apparently this is what caused the first slashing event ever, which occurred on Dec 2nd. There have been a number of slashings since! If you’re migrating from one instance to another, quadruple check you’re not going to have two machines working from the same key:
AWS + Infura + Teku Validator Configuration Specs
My Genesis block setup:
Operating System: Ubuntu Server 20.04 LTS (HVM) (Amazon Web Service)
Processor: Intel Xeon Platinum 8000 series processor, 3.1 GHz. (Amazon Web Service)
Memory: 4 virtual cores, 16GB RAM (Amazon Web Service)
Storage: 100GB SSD, 300 / 3000 IOPS (Amazon Web Service)
Ethereum 1.0 client: Infura endpoint (free tier)
Ethereum 2.0 client: Teku
Looking through all the above considerations, I was unsure of the best approach to building a validator setup. For myself, I’d like to pick a machine and generally not worry about changing it for at least two years. This helps with overall validator cost (I can get a significant discount from locking in with a virtual provider for a few years) and I’m not particularly agile with spinning up servers. This future-proofing or “over-spec’ing” approach hopefully makes my life over the next two years a bit easier.
Initially, I was confident AWS was the best virtual platform and it’s the service I’ll use for this post and the next. However, after going through the whole process, I realized AWS might be overkill for the individual developer. AWS’ real strength seems to be its capacity to dynamically scale up to meet demand which comes at a premium cost. This makes economic sense for a large scale, enterprise-level project, but individual Ethereum 2.0 current client requirements do not require such rigor.
I’m going to continue with AWS but am also entertaining the option of running an instance on Digital Ocean, which may be more appropriate for an individual developer. More on that at a later date.
Setup Infura Account
I’m choosing to use Infura as my Ethereum 1.0 endpoint. For now, the beacon chain is watching the Deposit Contract on Ethereum 1.0 to activate new stakers when they have properly deposited 32 ETH and submitted appropriate BLS signatures.
In the future, the beacon chain will start testing further processing by starting to use state information from Ethereum 1.0 to create parallel checkpoints on the beacon chain.
As we mentioned above, there are two main ways to have visibility to the Ethereum 1.0 network. One is to sync and actively maintain an Ethereum 1.0 node, which creates a local Ethereum 1.0 state database. The other is to use a service like Infura, which provides an easy Ethereum 1.0 endpoint, accessible through HTTPS or WebSockets.
Sign for an account here. Once you have an account, click the Ethereum logo on the left-hand side.
Click “Create New Project” in upper right hand corner
Name your project (mine is “Eth 2 Validator”), and go to “Settings,” make sure your network endpoints is “Mainnet” and copy the HTTPS endpoint:
We’ll be using this later for our Teku client start-up command!
Setting up AWS instance
Setting up an EC2 instance on Amazon is straight-forward. We’ll have a few tweaks here and there to help our virtual instance play well with Teku.
Create an AWS account (different from an Amazon.com account) and login to AWS Console. Get to the EC2 page, which will look something like this:
Click the “Launch Instance” button. You’ll then see the following screen:
This is where we select what machine image (which I think of as an operating system) we would like to use on our virtual instance. I’m selecting Ubuntu Server 20.04, which is a Linux-based Server environment. The Ubuntu Server environment has a few key optimization differences from the Ubuntu Desktop environment. The main difference for our purposes is Ubuntu Server lacks a Graphical User Interface (GUI). Where we’re going, there’s only a command line! Brings me back to my Apple II days.
After selecting our operating system, we then choose our instance type:
I find this menu quite overwhelming, so I’m going to break it down a bit for you. Here we’re selecting the computing core / RAM power / CPU for our machine. It is the “raw” or “active memory” of the machine and separate from the storage (hard drive) of a device. Think of it as the engine of our virtual instance, on which we will run our Ubuntu Server operating system. AWS separates these into separate instance families denoted by the letter / number combination on the far left column.
The instance families have different hardware arrangements just as different car engines have different configurations of pistons, plugs and fuels to meet different demands. We will focus on two of their “general computation” instance families, the m5 and t3.
I’m selecting the m5.xlarge instance, which provides 4 virtual computing cores (vCPUs) and 16GB RAM. After running Ethereum 2.0 mainnet for a day or so, my machine has not used more than 4% of the available vCPU. As mentioned in the “Future Proofing” section above, the Ethereum 2.0 network demands will only grow. But for the next few months, absent any prolonged major network spikes, I would most likely be fine with an m5.large instance (2 virtual cores / vCPUs, 8GB RAM)
Technical folks more savvy than myself have also recommended the t3.large instance as a reasonable option for Ethereum 2.0. t3.large has 2 vCPUs and 8GB memory, same as m5.large, but the t3 family is built for more “burstable” network activity (spikes in CPU) rather than the m5 family built for consistent CPU loads.
The last thing to mention before we move on to storage is price. AWS is expensive compared to other options like Digital Ocean. You pay for CPU (your instance family) and storage (your hard-drive) separately based on what you use. CPU is paid for by the hour. Since validators have to be online 24 hours, you can use the price table below (from December 2020) to make some rough calculations:
These are on-demand prices. AWS does provide something called Reserved Instance pricing, where if you promise to have a virtual instance from a year to three years, you can get up to 50-60% cost reduction on the above price table. (Thanks to Jason Chroman for this tip!)
From the EC2 homepage, click the “Reserved Instances” on the left-hand menu, shown below:
Click on “Purchase Reserved Instance”:
In the menu that pops up, put in the instance type details and the amount of time desired to see pricing for (I’m choosing m5.xlarge and a 36-month term):
Click “Search” and see the price options:
There’s a significant price discount, over 50% I believe, but I’m locked in for three years. Once you purchase the Reserved Instance, AWS then applies it to an existing virtual box or will apply it once it is launched. Remember this does NOT include storage space (hard-drive).
Note: I’m not doing this yet, as I’m not yet convinced AWS is the best option for an individual staking one to three Ethereum 2.0 validator nodes. I’m running an instance with on-demand pricing to see how it goes before committing.
Going back to our instance launch process, we’re moving on to the “Add Storage” tab
The brilliant technical people I consulted recommended a storage amount of 100GB General Purpose SSD. Storage is typically not a bottleneck with Eth2 clients. However, this is without running a full Eth1 client. For Eth1 storage, a conservative guesstimate would be about 1TB. Be sure to account for this if you’re not using Infura.
I don’t know the unit on the IOPS column in the image above, but it’s the input-output for the hard-drive communicating with the CPU. This is a classic bottleneck for full Eth1 node syncing. If you’re looking to sync a full Eth1 client on this machine and you’re having issues, this could be a place to look.
Skipping over “Add Tags,” move on to “Configure Security Group.” These are the different openings created for different kinds of incoming and outgoing communication with the instance.
AWS automatically opens the traditional SSH port, as that’s the main way we’ll interact with the instance. Coin Cashew and Somer Esat’s excellent guides both recommend disabling password access for SSH, but we’ll see when we launch the instance that’s not the default option for AWS. However, it is good to randomize your SSH port to a number between 1024-65535. This is to prevent malicious actors from network-scanning the standard SSH port. See how to secure your SSH port generally here and specifically for AWS here.
We have to add two security rules to accommodate the Teku client and it has to do with peer-to-peer communication. Blockchain networks are decentralized in the sense that nodes talk directly to each other. Rather than consulting a central node, an individual node will develop and maintain an understanding of the network state by “gossiping” with many nodes. This means when one client handshakes with another, they swap information about the network. Done enough times with different nodes, information propagates throughout the network. Currently, my Eth2 Validator node has 74 peers with which it’s chatting.
Teku communicates with other nodes on the 9000 port, so we’ll open that up for UDP and TCP, two different kinds of communication protocols.
Afterwards, it should look something like this:
SSH Keys and Instance Launch
Last, go to “Review and Launch,” an overview of the choices made. Once approved, there will be a pop-up menu about SSH keys. I’m not showing mine because it contains sensitive information. Namely, the keypair used to authenticate and login to the virtual instance via SSH (local command line). If you don’t already have a pair, AWS will generate one for you. You must download this and treat it like an Ethereum private key! It’s the only way to connect to your instance and AWS will not save it for you.
Once everything is hunky-dory, this window will appear:
Okay! That’s done with, let’s move on to accessing and securing our instance then installing and running Teku!
The main way to access the AWS instance is through SSH, “a cryptographic protocol for operating network services securely over an unsecured network.” As mentioned earlier, AWS by default disables password authentication for accessing the instance. You can only use the keypair generated before the instance launch. The keypair should have a .pem file ending.
AWS provides a clean way to get your SSH command. Clicking on the running instance from thee main EC2 page, there’s a button in the upper right hand that says “connect”:
In the next page will be an SSH command specific for your instance. It will be structured like this:
ssh -i "PATH_TO_AWS_KEYPAIR.pem" [email protected]_IDENTIFIER.compute-ZONE.amazonaws.com
Entering this command into a terminal will begin the SSH session. The first time, the local machine will ask if you’d like to trust the ECDSA fingerprint provided by AWS. This is to prevent a man-in-the-middle attack and, if concerned, a user can get their instance’s fingerprint following these steps.
In a terminal separate from the current SSH session, transfer the validator key files needed to run Teku. In the previous blog post, we walked through staking 32 ETH and obtaining validator keys for Ethereum 2.0. At the end, we were left with this file structure:
We need to transfer the validator_key_info file to our virtual instance. Secure Copy Protocol (scp) allows us to do this securely. Adapt the generic scp command below using the path to the directory above and the previous SSH command:
scp -r -i "PATH_TO_AWS_KEYPAIR.pem" /PATH_TO_KEYS/eth2deposit-cli/validator_key_info/ [email protected]_IDENTIFIER.compute-ZONE.amazonaws.com:~
(Note the “:~” at the end of the whole command.)
You should see a file transfer occur. If you navigate back to your SSH session and type in ls, you should see the transferred directory.
Now that we have the validator files we need, we’re going to install Teku. First, we have to update existing programs and install the required Java systems:
sudo apt update && sudo apt install default-jre && sudo apt install default-jdk
Double check Java installed was successful with:
Find the latest stable Teku release here. Copy the link address to the tar.gz file, then from your SSH session, download it. Here’s what mine looked like, your version will most likely be different:
Decompress the downloaded file with the following command. If you have a different version, sub that file name in as opposed to teku-20.11.1.tar.gz :
tar -zxvf teku-20.11.1.tar.gz
For cleanliness sake, remove the tar.gz file.
After all these steps, here’s what your home directory should look like (Teku version number and contents may be different:
Create a non-root user
This step is copied from Somer Esat’s excellent Ubuntu / Teku tutorial
We’re going to create a non-root user called teku who can operate Teku. Type the following below:
sudo useradd --no-create-home --shell /bin/false teku
We’re going to create a custom data directory for Teku as well, then give the teku user access to it:
sudo mkdir /var/lib/teku && sudo chown -R teku:teku /var/lib/teku
Create systemd service
This step is adapted from Somer Esat’s excellent Ubuntu / Teku tutorial
This step will make a service that will run Teku in the background. It will also allow the machine to automatically restart the service if it stops for some reason. This is a necessary step to make sure our validator runs 24-7.
Create the service file by using the nano text editor:
sudo nano /etc/systemd/system/teku.service
In this file (which should be empty), we’re going to put in a series of commands for the systemd to execute when we start the service. Here’s the code below, you’ll have to sub in the follow items we’ve collected throughout this journey:
- Infura Eth1 HTTP Endpoint
- validator_key_info directory path with two valid key-related files
- Custom data path (lib/var/teku)
Put those values in the bold code below, then copy it all into the nano text editor:
Description=Teku Beacon Node
ExecStart=/home/ubuntu/teku-20.11.1/bin/teku --network=mainnet --eth1-endpoint=INFURA_ETH1_HTTP_ENDPOINT_GOES_HERE --validator-keys=/home/ubuntu/validator_key_info/KEYSTORE-M_123456_789_ABCD.json:/home/ubuntu/validator_key_info/validator_keys/KEYSTORE-M_123456_789_ABCD.txt --rest-api-enabled=true --rest-api-docs-enabled=true --metrics-enabled --validators-keystore-locking-enabled=false --data-base-path=/var/lib/teku
Type command-X, then type “Y” to save your changes
We have to restart “systemctl” to update it:
sudo systemctl daemon-reload
Start the service:
sudo systemctl start teku
Check to make sure it’s starting okay:
sudo systemctl status teku
If you see any errors, get more details by running:
sudo journalctl -f -u teku.service
You can stop the Teku service by running:
sudo systemctl stop teku
Check the Teku troubleshooting page for common errors or check out the Teku discord, which is monitored by the team.
Once you feel you have things ironed out, enable the service to restart if it shuts down by running:
sudo systemctl enable teku
There you have it! Things should be cooking along right now. When inspecting the Teku service, you will see a series of logs noting a Sync Event, this is your validator syncing the beacon chain. Once it reaches the head, those logs will change to read Slot Event, and you will also see your attestation performance and block proposals.
On December 1st at 12pm UTC, the Beacon Chain’s first blocks were validated. The first block came from Validator 19026, with the enigmatic graffiti, “Mr F was here.” Twelve seconds later came the next block, graffiti indicating the validator might be located in Zug, Switzerland. The Eth2 Beacon Chain grew steadily, block by block every 12 seconds. Then came the next hurdle: would enough validators be online to finalize the first Epoch? Yes! 82.27% of the validators attested to the validity of the Epoch 0, the proverbial ground floor of the Beacon Chain. You can read more about the Beacon Chain launch, and what’s next, here.
We are now on Epoch 760, which means the Beacon Chain has been running smoothly for almost a week.
Here’s a shot from my perspective of the genesis moment, using the setup described in this post:
In the next installment, we’ll do a recap of how things are going. I’m going to access the metrics from Teku, discuss the cost of running AWS, and briefly discuss the state of the network.
Resources and links
Thanks to James Beck, Meredith Baxter, Jason Chroman, Aaron Davis, Chaminda Divitotawela, Ben Edgington, The Dark Jester, Somer Esat, Joseph Lubin, Collin Meyers, Nick Nelson, Mara Schmiedt, Adrian Sutton, and Alex Tudorache for support and technical assistance.