cloud-sre

Fixing a Polygon Erigon Archive Node Stuck at Block 87218600

A field note on diagnosing a deterministic gas-used mismatch at Polygon's Chicago hardfork activation block and fixing it by upgrading 0xPolygon/erigon from v3.5.0 to v3.6.0.

Jun 18, 2026
PolygonErigonarchive-nodehardforktroubleshooting

A Polygon Mainnet Erigon archive node stopped making progress at block 87218600. The first instinct in archive-node incidents is to suspect disk, CPU, memory, peers, or a corrupted snapshot. In this case, the failure was somewhere else.

The failure was deterministic and happened exactly at Polygon’s Chicago hardfork activation block. The fix was to upgrade the running 0xPolygon/erigon build from v3.5.0 to v3.6.0, then restart the container with the same /data directory.

Background

A blockchain node downloads data, but it also re-executes blocks. It checks that its computed result matches the block header accepted by the network. That is why a node can have enough CPU, enough memory, enough disk I/O, and still stop at one block if its software does not understand the rules active at that block.

Polygon, like other EVM chains, changes execution rules through hardforks. A hardfork is a scheduled protocol upgrade. At a specific block height, new rules become active: gas pricing may change, precompile behavior may change, or chain-specific system transactions may be handled differently. If the client binary does not include those new rules, it may calculate a different result from the network and reject the block as invalid.

In this incident, 87218600 was the Chicago hardfork activation block for Polygon Mainnet. The running node was still using an older v3.5.0 build. That version could replay blocks before the activation point, but it failed when execution reached the new rule boundary.

Terms used in this post

TermPlain meaning
Archive nodeA node that keeps historical state so old blocks, receipts, traces, and contract state can be queried. It needs much more storage than a pruned node.
ErigonAn Ethereum client implementation. This node uses Polygon’s Erigon fork, 0xPolygon/erigon, so it can follow Polygon Bor mainnet rules.
Bor mainnetPolygon PoS mainnet’s execution chain. In Erigon, it is selected with --chain=bor-mainnet.
HardforkA scheduled protocol rule change at a specific block. Every node must run software that knows the new rules before reaching that block.
Chicago hardforkThe Polygon hardfork that activates at Mainnet block 87218600 in the 0xPolygon/erigon v3.6.0 release notes.
Gas usedThe total gas the block execution consumed. The node recalculates it and compares it with the value in the block header.
StateSyncPolygon-specific state synchronization events. These can affect execution validation around Polygon forks.
HeimdallPolygon’s consensus/checkpoint layer. Erigon can talk to a remote Heimdall endpoint using --bor.heimdall=....
Fork choiceThe process of deciding which chain head is valid. If execution says a block is invalid, fork choice cannot advance to it.
DatadirThe directory where Erigon stores databases and snapshots. Here it is /data.

The symptom

The node repeatedly unwound to the last valid tip and crashed with a gas-used mismatch:

gas used mismatch block=87218600 header=79467913 execution=62712118
Execution failed block=87218600
err="invalid block, txnIdx=186, gas used by execution: 62712118, in header: 79467913"
pos sync failed: unexpected bad block at finalized waypoint

The useful clue was the block number: 87218600.

That block is not random. The 0xPolygon/erigon v3.6.0 release notes list Polygon Mainnet Chicago activation at block 87218600 and tell validators, RPC providers, node operators, and infrastructure partners to upgrade before the hardfork activation [1].

The practical interpretation is:

  • If the node fails once at a random block, it may be data, peers, or resources.
  • If it fails repeatedly at the exact hardfork activation block, the client version becomes the first suspect.
  • Deleting data is risky and usually unnecessary until the running binary version has been verified.

What was ruled out

The node was already running on archive-node class infrastructure:

LayerStatus
Storage15TB class data volume, upgraded to high-IOPS EBS
Memory128GB class host after earlier tuning
P2PPeers recovered after restart
HeimdallRemote Heimdall scraper kept advancing
Snapshot dataExisting /data was readable and execution could replay up to the failing block

After restart, the node could download and execute blocks again:

GoodPeers eth68=3
inserting fetched blocks start=87214502 end=87216293 blocks=1792

That made a local resource problem less likely. The process was not randomly failing under load; it was replaying to the same hardfork boundary and failing there.

The GitHub match

There were already open issues in 0xPolygon/erigon with the same class of failure:

  • Issue #143 reports the same gas mismatch pattern on bor-mainnet with v3.5.0-230b11a7, and points at StateSync gas handling after Polygon forks [2].
  • Issue #133 reports pos sync failed: unexpected bad block at finalized waypoint on the same code line [3].
  • Issue #100 shows an earlier version failing with the same gas used by execution ... in header ... pattern [4].

Those issues did not prove the final fix by themselves, but they changed the investigation direction. The failure looked like a client/fork-rule mismatch, not a disk or peer problem.

The decisive version check

The container was still running the old build:

Build info git_tag=v3.5.0-dirty git_commit=230b11a713...

After cloning v3.6.0, the source tree existed, but the container had not yet been rebuilt or restarted with the new image. That mistake matters: having the source on disk does nothing if Docker is still running the old image.

The correct check was:

docker logs --tail 100 polygon-erigon | grep 'Build info'
docker images | grep polygon-erigon

Before the fix, only the local-v3.5.0 image was present.

The fix

Build the v3.6.0 image:

docker stop polygon-erigon
rm -rf /opt/erigon-v3.6.0
git clone --branch v3.6.0 --single-branch https://github.com/0xPolygon/erigon.git /opt/erigon-v3.6.0
cd /opt/erigon-v3.6.0
DOCKER_BUILDKIT=1 docker build -t polygon-erigon:local-v3.6.0 .

Then remove the old container and start a new one with the same /data mount:

docker update --restart=no polygon-erigon
docker rm -f polygon-erigon
docker network create erigon-net 2>/dev/null || true

The runtime change was the image:

polygon-erigon:local-v3.6.0

The archive data was not deleted. The node kept using:

--chain=bor-mainnet
--datadir=/data
--prune.mode=archive
--db.size.limit=12TB
--db.pagesize=16KB
--bor.heimdall=https://heimdall-api.polygon.technology

For RPC safety, JSON-RPC and metrics were bound to localhost at the Docker publish layer:

-p 127.0.0.1:8545:8545
-p 127.0.0.1:6060:6060

That kept local validation working without reopening public RPC.

Validation

The new container showed the expected build:

Build info git_tag=v3.6.0-dirty git_commit=231d67e50b...
Initialised chain configuration ... Chicago: 87218600

At first, the node had a few temporary peer warnings:

can't use any peers to download blocks
No GoodPeers

Those were transient. The node later found peers and resumed block insertion:

GoodPeers eth68=3
inserting fetched blocks start=87214502 end=87216293 blocks=1792

The real validation was crossing the failing block:

blk=87218538
blk=87218647
blk=87218727
blk=87218801

That confirmed v3.6.0 had crossed 87218600. There was no repeat of:

gas used mismatch block=87218600
unexpected bad block
Execution failed

Runbook

When a Polygon Erigon node hits this class of failure:

  1. Check the failing block number.
  2. Check the running binary version. The source directory alone is not enough.
  3. Compare the failing block with Polygon hardfork activation blocks.
  4. Search upstream issues for the exact error pattern.
  5. Upgrade to the release that contains the hardfork rules.
  6. Reuse the existing datadir unless upstream explicitly says the database format changed.
  7. Validate by crossing the exact failing block.

The two most useful commands are:

docker logs --tail 100 polygon-erigon | grep 'Build info'

and:

docker logs --since 15m polygon-erigon | \
  grep -E 'gas used mismatch|unexpected bad block|Execution failed|polygon.sync.*crashed'

The first proves what binary is really running. The second proves whether the old failure is still present.

Why this mattered

This incident looked like a performance problem at first because the archive node had already gone through storage and instance tuning. But performance tuning cannot fix a hardfork rule mismatch.

The operational lesson is simple: when a blockchain node fails deterministically at one block, especially a known fork activation block, treat the client version as a first-class suspect.

References

  1. 0xPolygon/erigon v3.6.0 release notes
  2. 0xPolygon/erigon issue #143: Block 76879430 - same gas mismatch pattern
  3. 0xPolygon/erigon issue #133: pos sync failed: unexpected bad block at finalized waypoint
  4. 0xPolygon/erigon issue #100: gas used by execution mismatch
  5. Polygon documentation: Erigon archive node