web3-infra
Sizing and Isolating Archive/RPC Nodes on AWS
A field note on adding several archive/RPC nodes with narrow Terraform plans, separate security groups, explicit disk sizing, and private RPC boundaries.
The goal was simple: add archive/RPC capacity without letting Terraform touch unrelated running hosts. That boundary drove most of the work. The safer pattern is to enable only the intended node definitions, give each node its own security group, size the data volumes explicitly, and keep any already-tuned high-spec node as an exception rather than a default template.
Terms used here
| Term | Meaning |
|---|---|
| Archive node | A node that keeps enough historical state for old blocks, receipts, traces, or balances to be queried. |
| RPC | The HTTP API used by applications and tools to query node state. On EVM chains this is usually JSON-RPC. |
| P2P | The peer-to-peer network port a node uses to discover and exchange data with other nodes. |
| Security group | AWS firewall rules attached to an instance or network interface. |
| EBS | AWS block storage attached to an EC2 instance. The data volume is where the node keeps its chain database. |
| gp3 | An EBS volume type where size, IOPS, and throughput can be provisioned separately. |
| Terraform plan | The preview of what Terraform will create, update, or replace before applying infrastructure changes. |
What changed
The node profiles looked like this:
| Profile | Client direction | Instance class | Data volume |
|---|---|---|---|
| Large EVM archive node | Erigon archive node | r7i.2xlarge | 4096 GiB |
| Medium EVM archive node | Erigon plus consensus client where required | r7i.2xlarge | 2048 GiB |
| Substrate-style RPC node | Chain client with archive/RPC workload | r7i.2xlarge | 4096 GiB |
| Additional Substrate-style RPC node | Same client family, separate network profile | r7i.2xlarge | 4096 GiB |
These profiles kept gp3 data volumes with provisioned I/O. The disk sizes were deliberately above the current published archive-size references, because blockchain databases do not fail gracefully when the disk becomes tight. The spare room is part of the operating budget, not wasted space.
An already-tuned node should stay on its larger setup unless there is a separate reason to change it. It had shown different pressure from hardfork handling, client behavior, and storage throughput. Using the most expensive host as the default baseline would have raised every new node’s cost before there was evidence that they needed it.
Network boundary
The security group change mattered as much as the instance change. Each new node got its own security group. P2P stayed public where the chain requires public peer discovery, while RPC and metrics stayed private.
The rule I used was:
| Traffic | Boundary |
|---|---|
| P2P | Open to the internet on the chain-specific P2P port. |
| RPC | Available only from trusted private network ranges or explicitly approved sources. |
| Metrics | Scraped by the monitor host, not exposed publicly. |
| SSH | Managed separately from application ports. |
For some EVM nodes, the process itself can bind RPC and metrics to 127.0.0.1.
That is still compatible with access from another EC2 instance if the request is routed through an approved local proxy, tunnel, or host-level forwarding path.
Opening the security group is not enough when the service is bound to loopback, and binding the service to every interface is rarely the right first move.
Client versions
The Erigon-based nodes were pinned to upstream Erigon v3.4.4.
Archive mode remained explicit with --prune.mode=archive.
For Substrate-style nodes, the client image was pinned instead of using a floating latest tag.
Floating client versions are convenient until a restart pulls a different binary than the one that was tested.
For archive/RPC infrastructure, version drift should be a deliberate change.
Some EVM networks need one extra piece: a consensus client. An Erigon execution client alone is not enough for those post-merge networks. The execution and consensus clients need to share JWT authentication locally. A file-based JWT is safer than putting the secret directly into a container command line, because command-line arguments are easy to expose through process and container inspection.
Keeping the plan small
The Terraform side was checked with a targeted plan before applying. The expected shape was:
| Expected | Avoided |
|---|---|
| Intended EC2 nodes | Replacing out-of-scope nodes |
| Data volumes for the intended nodes | Modifying unrelated databases or application resources |
| Node-specific security groups | Reusing a broad open security group |
| Monitor rules where needed | Accidental global network changes |
That review matters more than the exact command. The plan has to say the same thing as the change request. If it wants to replace a running node that was out of scope, the right answer is to stop and reduce the plan, not to apply and clean up later.
Verification boundary
After boot, the first checks were intentionally basic:
docker ps
df -h /data
curl -s http://127.0.0.1:8545
For EVM nodes, the useful RPC checks were:
{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}
{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}
{"jsonrpc":"2.0","method":"trace_block","params":["0xf4240"],"id":1}
For Substrate-style nodes, the first useful signals were block height movement, peer count, and whether both parachain and relaychain logs were moving.
The final state was not “all nodes are synced”. The honest state was “the nodes are created, clients are running, history mode is configured where needed, and sync is progressing”. For archive nodes, that distinction matters. A node can be correctly configured for archive mode and still need days to reach the current head.
The operating takeaway
The safest part of this rollout was the narrowness. New nodes got new instances, new disks, and their own security groups. Existing hosts were kept out of the change. Once the nodes were running, any later tuning could be done per chain based on CPU, memory, disk I/O, and sync behavior instead of guessing up front.