Growing heap up to oom crash on Ethereum mainnet. v1.10.0 #5877

SpeakinTelnet · 2023-08-12T00:59:50Z

Describe the bug

I have an issue where the heap grows up to the limit of available memory and then crash lodestar. I'm running beacon only on ethereum mainnet.

Lodestar (currently v1.10.0/stable/7546292) is installed from source using yarn install followed by yarn build and I'm using the following systemd service:

[Unit]
Description=Lodestar Beacon Node (Ethereum consensus client)
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]

User=cryptobro
Group=cryptobro
WorkingDirectory=/opt/lodestar
TimeoutSec=1200
Restart=always

ExecStart=/opt/lodestar/lodestar beacon \
  --dataDir=/var/opt/lodestar/chain-data \
  --network.maxPeers 60 \
  --metrics=true \
  --metrics.port=8009 \
  --checkpointSyncUrl=https://beaconstate-mainnet.chainsafe.io \
  --network=mainnet \
  --execution.urls=http://10.1.10.91:8551 \
  --jwt-secret=/var/opt/lodestar/jwt.hex

Here's how the heap usage looks prior and after growing the max-old-space-size to 6144MiB

Here's the trace I get when the node crash:

Aug 10 03:51:00 ethereum-cl lodestar[1274]: [1274:0x5928970] 41466040 ms: Mark-sweep 3972.7 (4124.8) -> 3952.4 (4104.8) MB, 1682.5 / 0.0 ms  (average mu = 0.117, current mu = 0.038) allocation failure; scavenge might not succeed
Aug 10 03:51:00 ethereum-cl lodestar[1274]: [1274:0x5928970] 41467710 ms: Mark-sweep 3965.8 (4118.2) -> 3952.4 (4104.8) MB, 1606.2 / 0.0 ms  (average mu = 0.081, current mu = 0.039) allocation failure; scavenge might not succeed
Aug 10 03:51:00 ethereum-cl lodestar[1274]: <--- JS stacktrace --->
Aug 10 03:51:00 ethereum-cl lodestar[1274]: FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  1: 0xb7a940 node::Abort() [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  2: 0xa8e823  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  5: 0xf3a3e5  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  6: 0xf3b2e8 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  7: 0xf4b7f3  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  8: 0xf4c668 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  9: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 10: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 11: 0xf088e0 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 12: 0xeffeac v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 13: 0xf00025 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 14: 0x10aff62  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 15: 0x10b96f4  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 16: 0x10e5ca8  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 17: 0x119e81d v8::internal::JSArray::SetLength(v8::internal::Handle<v8::internal::JSArray>, unsigned int) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 18: 0x10e7465 v8::internal::ArrayConstructInitializeElements(v8::internal::Handle<v8::internal::JSArray>, v8::internal::Arguments<(v8::internal::ArgumentsType)1>*) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 19: 0x12be7ea v8::internal::Runtime_NewArray(int, unsigned long*, v8::internal::Isolate*) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 20: 0x16fb6b9  [node]
Aug 10 03:51:01 ethereum-cl lodestar[1273]: /opt/lodestar/lodestar: line 7:  1274 Aborted                 node --trace-deprecation --max-old-space-size=4096 ./packages/cli/bin/lodestar.js "$@"
Aug 10 03:51:01 ethereum-cl systemd[1]: lodestar-beacon.service: Main process exited, code=exited, status=134/n/a
Aug 10 03:51:01 ethereum-cl systemd[1]: lodestar-beacon.service: Failed with result 'exit-code'.

I'm running similar configuration on Gnosis with validators without such issue.

Expected behavior

For lodestar heap to be stable.

Operating system

Ubuntu 22.04.2 LTS

Lodestar version or commit hash

v1.10.0/stable/7546292

The text was updated successfully, but these errors were encountered:

nflaig · 2023-08-12T08:31:28Z

Thanks for reporting @SpeakinTelnet. This is a known issue (#5851) in Lodestar 1.10 with older NodeJS versions.

You have to update NodeJS to a version >=18.17.0 or >=20.1.0. I would recommend to install the latest release of node 20.

SpeakinTelnet · 2023-08-12T14:58:37Z

Thanks @nflaig! Looks like I missed the issue in my search. I've updated to the latest NodeJs version and everything seems stable since. Closing the issue as fixed

nflaig · 2023-08-12T16:21:03Z

We should probably re-open the other issue for better visibility as it is not really resolved until we know all users upgraded their node version.

Thanks again for reporting 👍

czepluch · 2023-08-31T17:11:05Z

I've been experiencing this 3 times today while syncing Nethermind from scratch with Lodestar. I'm on a Dappnode with Lodestar v1.10.0 upstream

nflaig · 2023-08-31T18:00:25Z

@czepluch On Dappnode this issue should not be happening as a correct node version is used there, The memory leak would also take at least a day before it would crash Lodestar due to OOM.

From what you mentioned, it might be the case that Nethermind takes a lot of memory due to sync and the OOM is not related to Lodestar.

It would help if you provide more details

Is it Lodestar that is crashing or your server?
Any Lodestar error / debug logs would help as well
Metrics if you are tracking those via DMS on Dappnode

Best would be to open a new issue or just ask in #lodestar-help channel on discord.

SpeakinTelnet added the meta-bug Issues that identify a bug and require a fix. label Aug 12, 2023

SpeakinTelnet closed this as completed Aug 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Growing heap up to oom crash on Ethereum mainnet. v1.10.0 #5877

Growing heap up to oom crash on Ethereum mainnet. v1.10.0 #5877

SpeakinTelnet commented Aug 12, 2023 •

edited

Loading

nflaig commented Aug 12, 2023

SpeakinTelnet commented Aug 12, 2023

nflaig commented Aug 12, 2023

czepluch commented Aug 31, 2023 •

edited

Loading

nflaig commented Aug 31, 2023

Growing heap up to oom crash on Ethereum mainnet. v1.10.0 #5877

Growing heap up to oom crash on Ethereum mainnet. v1.10.0 #5877

Comments

SpeakinTelnet commented Aug 12, 2023 • edited Loading

Describe the bug

Expected behavior

Operating system

Lodestar version or commit hash

nflaig commented Aug 12, 2023

SpeakinTelnet commented Aug 12, 2023

nflaig commented Aug 12, 2023

czepluch commented Aug 31, 2023 • edited Loading

nflaig commented Aug 31, 2023

SpeakinTelnet commented Aug 12, 2023 •

edited

Loading

czepluch commented Aug 31, 2023 •

edited

Loading