web3-infra

Sizing and Isolating Archive/RPC Nodes on AWS

A field note on adding several archive/RPC nodes with narrow Terraform plans, separate security groups, explicit disk sizing, and private RPC boundaries.

Jun 23, 2026
archive-nodeTerraformAWSErigonmonitoring

The goal was simple: add archive/RPC capacity without letting Terraform touch unrelated running hosts. That boundary drove most of the work. The safer pattern is to enable only the intended node definitions, give each node its own security group, size the data volumes explicitly, and keep any already-tuned high-spec node as an exception rather than a default template.

Terms used here

TermMeaning
Archive nodeA node that keeps enough historical state for old blocks, receipts, traces, or balances to be queried.
RPCThe HTTP API used by applications and tools to query node state. On EVM chains this is usually JSON-RPC.
P2PThe peer-to-peer network port a node uses to discover and exchange data with other nodes.
Security groupAWS firewall rules attached to an instance or network interface.
EBSAWS block storage attached to an EC2 instance. The data volume is where the node keeps its chain database.
gp3An EBS volume type where size, IOPS, and throughput can be provisioned separately.
Terraform planThe preview of what Terraform will create, update, or replace before applying infrastructure changes.

What changed

The node profiles looked like this:

ProfileClient directionInstance classData volume
Large EVM archive nodeErigon archive noder7i.2xlarge4096 GiB
Medium EVM archive nodeErigon plus consensus client where requiredr7i.2xlarge2048 GiB
Substrate-style RPC nodeChain client with archive/RPC workloadr7i.2xlarge4096 GiB
Additional Substrate-style RPC nodeSame client family, separate network profiler7i.2xlarge4096 GiB

These profiles kept gp3 data volumes with provisioned I/O. The disk sizes were deliberately above the current published archive-size references, because blockchain databases do not fail gracefully when the disk becomes tight. The spare room is part of the operating budget, not wasted space.

An already-tuned node should stay on its larger setup unless there is a separate reason to change it. It had shown different pressure from hardfork handling, client behavior, and storage throughput. Using the most expensive host as the default baseline would have raised every new node’s cost before there was evidence that they needed it.

Network boundary

The security group change mattered as much as the instance change. Each new node got its own security group. P2P stayed public where the chain requires public peer discovery, while RPC and metrics stayed private.

The rule I used was:

TrafficBoundary
P2POpen to the internet on the chain-specific P2P port.
RPCAvailable only from trusted private network ranges or explicitly approved sources.
MetricsScraped by the monitor host, not exposed publicly.
SSHManaged separately from application ports.

For some EVM nodes, the process itself can bind RPC and metrics to 127.0.0.1. That is still compatible with access from another EC2 instance if the request is routed through an approved local proxy, tunnel, or host-level forwarding path. Opening the security group is not enough when the service is bound to loopback, and binding the service to every interface is rarely the right first move.

Client versions

The Erigon-based nodes were pinned to upstream Erigon v3.4.4. Archive mode remained explicit with --prune.mode=archive.

For Substrate-style nodes, the client image was pinned instead of using a floating latest tag. Floating client versions are convenient until a restart pulls a different binary than the one that was tested. For archive/RPC infrastructure, version drift should be a deliberate change.

Some EVM networks need one extra piece: a consensus client. An Erigon execution client alone is not enough for those post-merge networks. The execution and consensus clients need to share JWT authentication locally. A file-based JWT is safer than putting the secret directly into a container command line, because command-line arguments are easy to expose through process and container inspection.

Keeping the plan small

The Terraform side was checked with a targeted plan before applying. The expected shape was:

ExpectedAvoided
Intended EC2 nodesReplacing out-of-scope nodes
Data volumes for the intended nodesModifying unrelated databases or application resources
Node-specific security groupsReusing a broad open security group
Monitor rules where neededAccidental global network changes

That review matters more than the exact command. The plan has to say the same thing as the change request. If it wants to replace a running node that was out of scope, the right answer is to stop and reduce the plan, not to apply and clean up later.

Verification boundary

After boot, the first checks were intentionally basic:

docker ps
df -h /data
curl -s http://127.0.0.1:8545

For EVM nodes, the useful RPC checks were:

{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}
{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}
{"jsonrpc":"2.0","method":"trace_block","params":["0xf4240"],"id":1}

For Substrate-style nodes, the first useful signals were block height movement, peer count, and whether both parachain and relaychain logs were moving.

The final state was not “all nodes are synced”. The honest state was “the nodes are created, clients are running, history mode is configured where needed, and sync is progressing”. For archive nodes, that distinction matters. A node can be correctly configured for archive mode and still need days to reach the current head.

The operating takeaway

The safest part of this rollout was the narrowness. New nodes got new instances, new disks, and their own security groups. Existing hosts were kept out of the change. Once the nodes were running, any later tuning could be done per chain based on CPU, memory, disk I/O, and sync behavior instead of guessing up front.