Testing Polygon Erigon v3.6.1-beta After the v3.6.0 Upgrade

The first upgrade fixed the obvious failure. v3.5.0 could not cross Polygon Mainnet block 87218600, the Chicago hardfork activation block. Moving to 0xPolygon/erigon v3.6.0 added the Chicago rules and the node crossed that block.

The second problem was less clean. After v3.6.0, the node still showed weak peer behavior and repeated post-hardfork sync warnings. It was not a security group problem: public P2P was reachable, JSON-RPC stayed closed to the internet, and local RPC still worked. The safer next move was a controlled test of v3.6.1-beta, not deleting /data and starting over.

Terms used here

Term	Meaning
Archive node	A node that keeps historical state so old receipts, traces, and contract state can be queried. It needs much more disk than a pruned node.
Hardfork	A scheduled protocol rule change at a specific block. A node must run software that knows the new rules before it reaches that block.
Chicago hardfork	The Polygon hardfork activated on Mainnet at block `87218600` in the `0xPolygon/erigon v3.6.0` release notes.
Peer	Another node connected through P2P. Erigon needs peers to download blocks and exchange network data.
GoodPeers	Erigon’s view of peers that can currently serve the data it needs. A low number can be network-related, version-related, or validation-related.
Fork choice	The process that decides which chain head is valid. If execution marks a block invalid, fork choice cannot advance to it.
Datadir	The directory where Erigon stores its databases and snapshots. On this node it is `/data`.
Beta release	A pre-release build. It can contain important fixes, but it should be tested with a rollback path.

What v3.6.0 fixed

The previous failure was deterministic:

gas used by execution: 62712118, in header: 79467913, headerNum=87218600
pos sync failed: unexpected bad block at finalized waypoint

The block number was the key. The v3.6.0 release notes say the release includes the changes required for the Chicago hardfork and list Polygon Mainnet activation at block 87218600 ^[1].

After rebuilding and starting polygon-erigon:local-v3.6.0, the node printed:

Build info git_tag=v3.6.0-dirty git_commit=231d67e50b...
Initialised chain configuration ... Chicago: 87218600

It then crossed the failing block:

blk=87218538
blk=87218647
blk=87218801

That made v3.6.0 the right fix for the Chicago rule mismatch. But it did not mean the node was finished catching up.

What still looked wrong

After the v3.6.0 upgrade, the node could move forward, but the peer layer still looked weak. The logs sometimes showed:

[sync] can't use any peers to download blocks, will try again in a bit
[p2p] No GoodPeers

Those messages can be misleading. They do not always mean the EC2 security group or Docker port mapping is wrong. In this case, the public P2P port was reachable from outside:

nc -vz <node-public-ip> 30303

Docker was also publishing the right ports:

30303/tcp -> 0.0.0.0:30303
30303/udp -> 0.0.0.0:30303
42069/udp -> 0.0.0.0:42069
8545/tcp -> 127.0.0.1:8545
6060/tcp -> 127.0.0.1:6060

That port layout is intentional. P2P stays public. JSON-RPC and metrics stay local.

The open upstream issue #154 reports the same v3.6.0 class of symptom: No GoodPeers after upgrading to v3.6.0, with can't use any peers to download blocks in the log ^[2]. That issue by itself does not prove the same failure mode on every node. It does make one thing clear: if P2P reachability checks pass, the next suspect is the client version and its post-hardfork sync behavior.

Why the datadir remained intact

Deleting chaindata, heimdall, or polygon-bridge is expensive on an archive node. It turns a version problem into a long restore problem.

The local checks pointed the other way:

The node could open /data/chaindata.
It could download checkpoint ranges.
It could insert fetched blocks.
It could enter the Execution stage.
It had already crossed 87218600.

The better test was to change only the client build and keep the same /data. That gives a clean comparison: same disk, same snapshot, same instance, same network boundary, newer Erigon.

Why v3.6.1-beta was worth testing

v3.6.1-beta is marked as a pre-release, so it is not the default conservative choice. The reason to test it here was the changelog.

The release notes describe it as a maintenance release with bug fixes, including:

execution and P2P fixes around ssTxs encoding and missing hardfork blocks
post-v3.6.0 backports
deterministic state sync
a P2P forkid change for Polygon-specific forks

Those items match the area where the node was still suspicious: post-Chicago execution, peer compatibility, and sync validation ^[3].

The startup command

The runtime shape stayed the same. Only the image moved from polygon-erigon:local-v3.6.0 to polygon-erigon:local-v3.6.1-beta.

EXT_IP="$(curl -sS http://169.254.169.254/latest/meta-data/public-ipv4)"

sudo docker run -d \
  --name polygon-erigon \
  --restart unless-stopped \
  --network erigon-net \
  --log-driver=json-file \
  --log-opt max-size=100m \
  --log-opt max-file=5 \
  -v /data:/data \
  -p 127.0.0.1:8545:8545 \
  -p 30303:30303 \
  -p 30303:30303/udp \
  -p 42069:42069/udp \
  -p 127.0.0.1:6060:6060 \
  polygon-erigon:local-v3.6.1-beta \
  --chain=bor-mainnet \
  --datadir=/data \
  --prune.mode=archive \
  --db.size.limit=12TB \
  --db.pagesize=16KB \
  --metrics \
  --metrics.addr=0.0.0.0 \
  --metrics.port=6060 \
  --http \
  --ws \
  --http.addr=0.0.0.0 \
  --http.vhosts='*' \
  --http.port=8545 \
  --http.api=web3,net,eth,trace \
  --private.api.addr=127.0.0.1:9090 \
  --torrent.port=42069 \
  --bor.heimdall=https://heimdall-api.polygon.technology \
  --rpc.batch.concurrency=16 \
  --rpc.batch.limit=5000 \
  --rpc.returndata.limit=5000000 \
  --maxpeers=200 \
  --log.dir.path=/data/logs \
  --log.dir.prefix=erigon \
  --log.dir.verbosity=info \
  --verbosity=3 \
  --nat=extip:${EXT_IP}

The --nat=extip:${EXT_IP} value matters. If the advertised public IP is stale, peers may not be able to connect back correctly.

Verification after the switch

First, confirm the running build:

sudo docker logs --tail 100 polygon-erigon | egrep 'Build info|Public IP|Starting Erigon'

The expected signs are:

Build info git_tag=v3.6.1-beta-dirty
[torrent] Public IP ip=<node-public-ip>
Starting Erigon on Bor Mainnet...

Then confirm ports:

sudo docker port polygon-erigon

The expected shape is:

8545/tcp -> 127.0.0.1:8545
6060/tcp -> 127.0.0.1:6060
30303/tcp -> 0.0.0.0:30303
30303/udp -> 0.0.0.0:30303
42069/udp -> 0.0.0.0:42069

Finally, watch sync:

sudo docker logs -f polygon-erigon | \
  egrep 'GoodPeers|No GoodPeers|bad block|fork choice|Execution|inserting fetched blocks|update fork choice'

The useful post-upgrade log looked like this:

inserting fetched blocks start=87269286 end=87271077 blocks=1792
inserting fetched blocks start=87271078 end=87274917 blocks=3840
update fork choice block=87274917
GoodPeers eth69=3 eth68=3
[4/6 Execution] blk=87269385 blks=52 blk/s=2.6

Peer count was still not high, but the node was moving. That is the important distinction. A low peer count is a watch item. A low peer count plus no block insertion, no execution progress, and repeated bad-block errors is a stop condition.

Rollback boundary

The rollback plan stayed simple:

sudo docker stop polygon-erigon
sudo docker rm polygon-erigon

Then start the same command with:

polygon-erigon:local-v3.6.0

The rollback conditions for v3.6.1-beta were:

repeated fork choice update bad block
repeated unexpected bad block
no Execution progress over multiple samples
no block insertion after P2P reachability was already confirmed

In this run, the early signal was better than that. The node kept downloading, inserting, updating fork choice, and executing.

Reusable upgrade rule

The move from v3.6.0 to v3.6.1-beta was not a blind upgrade. It was a narrow test after the stable Chicago release fixed one failure but left enough P2P and post-hardfork sync symptoms to justify trying the maintenance beta.

The reusable upgrade rule is:

keep /data when the database can still be opened and execution is progressing
change one variable at a time
keep public RPC closed
verify the actual running image, not the source directory on disk
treat beta builds as reversible operational tests, not as permanent assumptions

References

[1] 0xPolygon. “Erigon v3.6.0 release notes.”

[2] 0xPolygon. “No GoodPeers after upgrading to v3.6.0.” GitHub issue #154.

[3] 0xPolygon. “Erigon v3.6.1-beta release notes.”

[4] Polygon Labs. “Run an Erigon archive node.”