Sizing and Isolating Archive/RPC Nodes on AWS

The goal was simple: add archive/RPC capacity without letting Terraform touch unrelated running hosts. That boundary drove most of the work. The safer pattern is to enable only the intended node definitions, give each node its own security group, size the data volumes explicitly, and keep any already-tuned high-spec node as an exception rather than a default template.

Terms used here

Term	Meaning
Archive node	A node that keeps enough historical state for old blocks, receipts, traces, or balances to be queried.
RPC	The HTTP API used by applications and tools to query node state. On EVM chains this is usually JSON-RPC.
P2P	The peer-to-peer network port a node uses to discover and exchange data with other nodes.
Security group	AWS firewall rules attached to an instance or network interface.
EBS	AWS block storage attached to an EC2 instance. The data volume is where the node keeps its chain database.
gp3	An EBS volume type where size, IOPS, and throughput can be provisioned separately.
Terraform plan	The preview of what Terraform will create, update, or replace before applying infrastructure changes.

What changed

The node profiles looked like this:

Profile	Client direction	Instance class	Data volume
Large EVM archive node	Erigon archive node	`r7i.2xlarge`	`4096 GiB`
Medium EVM archive node	Erigon plus consensus client where required	`r7i.2xlarge`	`2048 GiB`
Substrate-style RPC node	Chain client with archive/RPC workload	`r7i.2xlarge`	`4096 GiB`
Additional Substrate-style RPC node	Same client family, separate network profile	`r7i.2xlarge`	`4096 GiB`

These profiles kept gp3 data volumes with provisioned I/O. The disk sizes were deliberately above the current published archive-size references, because blockchain databases do not fail gracefully when the disk becomes tight. The spare room is part of the operating budget, not wasted space.

An already-tuned node should stay on its larger setup unless there is a separate reason to change it. It had shown different pressure from hardfork handling, client behavior, and storage throughput. Using the most expensive host as the default baseline would have raised every new node’s cost before there was evidence that they needed it.

Network boundary

The security group change mattered as much as the instance change. Each new node got its own security group. P2P stayed public where the chain requires public peer discovery, while RPC and metrics stayed private.

The rule I used was:

Traffic	Boundary
P2P	Open to the internet on the chain-specific P2P port.
RPC	Available only from trusted private network ranges or explicitly approved sources.
Metrics	Scraped by the monitor host, not exposed publicly.
SSH	Managed separately from application ports.

For some EVM nodes, the process itself can bind RPC and metrics to 127.0.0.1. That is still compatible with access from another EC2 instance if the request is routed through an approved local proxy, tunnel, or host-level forwarding path. Opening the security group is not enough when the service is bound to loopback, and binding the service to every interface is rarely the right first move.

Client versions

The Erigon-based nodes were pinned to upstream Erigon v3.4.4. Archive mode remained explicit with --prune.mode=archive.

For Substrate-style nodes, the client image was pinned instead of using a floating latest tag. Floating client versions are convenient until a restart pulls a different binary than the one that was tested. For archive/RPC infrastructure, version drift should be a deliberate change.

Some EVM networks need one extra piece: a consensus client. An Erigon execution client alone is not enough for those post-merge networks. The execution and consensus clients need to share JWT authentication locally. A file-based JWT is safer than putting the secret directly into a container command line, because command-line arguments are easy to expose through process and container inspection.

Keeping the plan small

The Terraform side was checked with a targeted plan before applying. The expected shape was:

Expected	Avoided
Intended EC2 nodes	Replacing out-of-scope nodes
Data volumes for the intended nodes	Modifying unrelated databases or application resources
Node-specific security groups	Reusing a broad open security group
Monitor rules where needed	Accidental global network changes

That review matters more than the exact command. The plan has to say the same thing as the change request. If it wants to replace a running node that was out of scope, the right answer is to stop and reduce the plan, not to apply and clean up later.

Verification boundary

After boot, the first checks were intentionally basic:

docker ps
df -h /data
curl -s http://127.0.0.1:8545

For EVM nodes, the useful RPC checks were:

{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}
{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}
{"jsonrpc":"2.0","method":"trace_block","params":["0xf4240"],"id":1}

For Substrate-style nodes, the first useful signals were block height movement, peer count, and whether both parachain and relaychain logs were moving.

The final state was not “all nodes are synced”. The honest state was “the nodes are created, clients are running, history mode is configured where needed, and sync is progressing”. For archive nodes, that distinction matters. A node can be correctly configured for archive mode and still need days to reach the current head.

The operating takeaway

The safest part of this rollout was the narrowness. New nodes got new instances, new disks, and their own security groups. Existing hosts were kept out of the change. Once the nodes were running, any later tuning could be done per chain based on CPU, memory, disk I/O, and sync behavior instead of guessing up front.