Setting up Monitoring for Hyperledger Besu in AWS

DISCLAIMER: In October 2020 PegaSys was renamed to Quorum.
This post was written by PegaSys Protocol Engineer, Joshua Fernandes. To learn more about setting up Hyperledger Besu in AWS, read this blog.
This guide is a walkthrough of setting up Prometheus and Grafana on AWS to automatically monitor Besu nodes.
Outcome:
We will be setting up a single EC2 instance that will run Prometheus & Grafana via docker-compose.
An alternative is to set the two up manually and apply the config for each as described below.
You can also put this configuration into the likes of a Launch Configuration that is used by an AutoScaling Group or make use of containers directly via ECS or AKS
Steps:
1. We use the Prometheus’ EC2 Scraper which allows retrieving data from targets on EC2 instances.
The private IP address is used by used default, but can be changed to the public IP with relabelling.
Following best practices we will avoid storing AWS keys in the config and will setup a role first which will be used by the instance
2. Create an IAM role for the instance by going to the IAM service in the AWS console.
Set the trusted entity to EC2 (ie the assume role): AWS service: ec2.amazonaws.com
Set the IAM policy to AmazonEC2ReadOnlyAccess. This is an AWS managed policy.
Note: if you attach the instance to a load balancer or similar system, you will need to attach a custom policy.

3. Create a security group that allows access to port 3000 & port 9090 from your CIDR range eg: “0.0.0.0/0”. Ideally lock this down to selected trusted IPs

4. Create an instance using the AmazonLinux2 AMI or the Ubuntu 18.04 AMI or equivalent. We will use the Ubuntu AMI for the rest of this tutorial. For this example, we are using an instance type of t3.small and setting the volume size to 50GB. Select the IAM role and security group from steps 2 & 3. Ensure that you use a public IP or set the instance to come up in a public subnet
Note: In a real production setup, use a large second volume that will persist if the instance/AZ fails. The size will depend on the frequency of logging, the number of nodes and the retention period (default is 14 days).
5. Once the instance is up, ssh into the machine and install packages:
sudo apt-get update && sudo apt-get install -y apparmor apt-transport-https ca-certificates curl build-essential # install docker curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add – sudo sh -c ‘echo “deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable” > /etc/apt/sources.list.d/docker.list’ sudo apt-get update && sudo apt-get -y install docker-ce sudo usermod -aG docker $USER # install docker-compose sudo curl -L https://github.com/docker/compose/releases/download/1.24.1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose |
6. Create the following folder structure to store the data and config for prometheus and grafana. Use any location that works for you, we will be using /data
sudo mkdir -p /data/prometheus/config && sudo mkdir -p /data/prometheus/datasudo mkdir -p /data/grafana/config && sudo mkdir -p /data/grafana/data # 472 is the respective container user:group pairssudo mkdir -p /var/log/grafana/ && sudo chown -R 472:472 /var/log/grafana/sudo chown -R $USER:$USER /data |
7. Create the prometheus config file called prometheus.yml in /data/prometheus/config/ with the contents below:
Note:
- The relabel config below sets the instance’s name to show up instead of the private IP. Customize this to suit your requirements.
- Update the aws-region parameter based on the region you are using
- Update the regex parameter based on the besu node prefix you use
global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # Alertmanager configuration alerting: # Load rules once and evaluate them according to the global ‘evaluation_interval’. rule_files: scrape_configs: – job_name: prometheus scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: – targets: [ localhost:9090 ] – job_name: besu-nodes scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http ec2_sd_configs: – region: ap-southeast-2 port: 9545 relabel_configs: – source_labels: [__meta_ec2_tag_Name] regex: besu-.* action: keep # Use the name as the instance label – source_labels: [__meta_ec2_tag_Name] target_label: instance |
8. Create the following docker-compose.yml file in /data
Note:
Update the aws-region parameter based on the region you are using
Grafana supports a number of authentication mechanisms for secure login. Select one that works best for your needs. The example below uses a simple admin:password scheme where password is what is defined in the GF_SECURITY_ADMIN_PASSWORD env var:
—version “3.4” services: prometheus: container_name: prometheus labels: service_name: prometheus image: “registry.hub.docker.com/prom/prometheus:v2.10.0” restart: always network_mode: “host” ports: – “9090:9090” volumes: – /var/run/docker.sock:/var/run/docker.sock:ro – /data/prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml – /data/prometheus/data/:/prometheus command: – ‘–log.format=json’ – ‘–config.file=/etc/prometheus/prometheus.yml’ logging: driver: “json-file” options: max-size: “200k” max-file: “10” environment: AWS_REGION: “ap-southeast-2” deploy: resources: limits: cpus: ‘0.50’ memory: 1024M reservations: cpus: ‘0.25’ memory: 512M cap_add: – NET_ADMIN grafana: container_name: grafana labels: service_name: grafana image: “registry.hub.docker.com/grafana/grafana:6.2.4” restart: always network_mode: “host” ports: – “3000:3000” volumes: – /data/grafana/data/:/var/lib/grafana/ – /var/run/docker.sock:/var/run/docker.sock:ro – /var/log/grafana/:/var/log/grafana/ environment: AWS_REGION: “ap-southeast-2” GF_SECURITY_ADMIN_PASSWORD: “SuperSecretPassword” GF_LOG_MODE: “file” deploy: resources: limits: cpus: ‘0.50’ memory: 1024M reservations: cpus: ‘0.25’ memory: 512M cap_add: – NET_ADMIN |
9. Update permissions of the folder so the containers can read and write.
sudo chown -R $USER:$USER /data sudo chmod -R 777 /data/grafana /data/prometheus |
10. Start docker-compose
cd /data && docker-compose up -d |
11. Add the following options to your Besu startup command to enable metrics collection:
–metrics-enabled –metrics-host=0.0.0.0 –metrics-port=9545 |
12. Setup Grafana datasource
Once compose has started both services, go to http://<IP>:3000 and login with the credentials from above of admin:SuperSecretPassword
Select Add a Datasource -> Prometheus and put the details in like so:

13. Setup Besu Dashboard
https://grafana.com/grafana/dashboards/10273
Select Dashboard -> Manage -> Import and
Enter 10273 for Grafana.dashboard & click Load
select the Prometheus datasource you just created and Import

That will take you to the main dashboard and you will see the Besu dashboard and any Besu nodes that have metrics enabled
14. Preconfiguring grafana to read from prometheus datasource & install the Besu dashboard – this can be done by saving the yml and json data and putting them in a provisioning folder which is mounted at runtime. Please refer to the official documentation for details
15. Logrotate for grafana (optional but recommended)
We recommend doing this for Besu too, if it’s not already done.
Create the config file /etc/logrotate.d/grafana with the contents below:
/var/log/grafana/*.log { rotate 7 missingok daily compress copytruncate } |
16. Kubernetes SD config
For kubernetes related setups please use the kuberenetes scraper instead. For a working example please see our besu-kubernetes repo
17. Additional system metrics eg: disk space, cpu etc. (optional but recommended)
You can also add other system metrics to this setup so you capture a lot more information and can alert of that. Install the prometheus node exporter and add an extra scrape_config to the prometheus list in step 7
18. Load balancer & https config
Create an Application Load Balancer and target groups. Use ACM to generate ssl certs.
Add any instances to the target groups and pick whichever service you wish to expose. Generally this is only grafana with some auth mechanism, and prometheus is kept internal only.