Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Growing heap up to oom crash on Ethereum mainnet. v1.10.0 #5877

Closed
SpeakinTelnet opened this issue Aug 12, 2023 · 5 comments
Closed

Growing heap up to oom crash on Ethereum mainnet. v1.10.0 #5877

SpeakinTelnet opened this issue Aug 12, 2023 · 5 comments
Labels
meta-bug Issues that identify a bug and require a fix.

Comments

@SpeakinTelnet
Copy link

SpeakinTelnet commented Aug 12, 2023

Describe the bug

I have an issue where the heap grows up to the limit of available memory and then crash lodestar. I'm running beacon only on ethereum mainnet.

Lodestar (currently v1.10.0/stable/7546292) is installed from source using yarn install followed by yarn build and I'm using the following systemd service:

[Unit]
Description=Lodestar Beacon Node (Ethereum consensus client)
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]

User=cryptobro
Group=cryptobro
WorkingDirectory=/opt/lodestar
TimeoutSec=1200
Restart=always

ExecStart=/opt/lodestar/lodestar beacon \
  --dataDir=/var/opt/lodestar/chain-data \
  --network.maxPeers 60 \
  --metrics=true \
  --metrics.port=8009 \
  --checkpointSyncUrl=https://beaconstate-mainnet.chainsafe.io \
  --network=mainnet \
  --execution.urls=http://10.1.10.91:8551 \
  --jwt-secret=/var/opt/lodestar/jwt.hex

Here's how the heap usage looks prior and after growing the max-old-space-size to 6144MiB

image

Here's the trace I get when the node crash:

Aug 10 03:51:00 ethereum-cl lodestar[1274]: [1274:0x5928970] 41466040 ms: Mark-sweep 3972.7 (4124.8) -> 3952.4 (4104.8) MB, 1682.5 / 0.0 ms  (average mu = 0.117, current mu = 0.038) allocation failure; scavenge might not succeed
Aug 10 03:51:00 ethereum-cl lodestar[1274]: [1274:0x5928970] 41467710 ms: Mark-sweep 3965.8 (4118.2) -> 3952.4 (4104.8) MB, 1606.2 / 0.0 ms  (average mu = 0.081, current mu = 0.039) allocation failure; scavenge might not succeed
Aug 10 03:51:00 ethereum-cl lodestar[1274]: <--- JS stacktrace --->
Aug 10 03:51:00 ethereum-cl lodestar[1274]: FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  1: 0xb7a940 node::Abort() [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  2: 0xa8e823  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  5: 0xf3a3e5  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  6: 0xf3b2e8 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  7: 0xf4b7f3  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  8: 0xf4c668 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]:  9: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 10: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 11: 0xf088e0 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 12: 0xeffeac v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 13: 0xf00025 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 14: 0x10aff62  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 15: 0x10b96f4  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 16: 0x10e5ca8  [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 17: 0x119e81d v8::internal::JSArray::SetLength(v8::internal::Handle<v8::internal::JSArray>, unsigned int) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 18: 0x10e7465 v8::internal::ArrayConstructInitializeElements(v8::internal::Handle<v8::internal::JSArray>, v8::internal::Arguments<(v8::internal::ArgumentsType)1>*) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 19: 0x12be7ea v8::internal::Runtime_NewArray(int, unsigned long*, v8::internal::Isolate*) [node]
Aug 10 03:51:00 ethereum-cl lodestar[1274]: 20: 0x16fb6b9  [node]
Aug 10 03:51:01 ethereum-cl lodestar[1273]: /opt/lodestar/lodestar: line 7:  1274 Aborted                 node --trace-deprecation --max-old-space-size=4096 ./packages/cli/bin/lodestar.js "$@"
Aug 10 03:51:01 ethereum-cl systemd[1]: lodestar-beacon.service: Main process exited, code=exited, status=134/n/a
Aug 10 03:51:01 ethereum-cl systemd[1]: lodestar-beacon.service: Failed with result 'exit-code'.

I'm running similar configuration on Gnosis with validators without such issue.

Expected behavior

For lodestar heap to be stable.

Operating system

Ubuntu 22.04.2 LTS

Lodestar version or commit hash

v1.10.0/stable/7546292

@SpeakinTelnet SpeakinTelnet added the meta-bug Issues that identify a bug and require a fix. label Aug 12, 2023
@nflaig
Copy link
Member

nflaig commented Aug 12, 2023

Thanks for reporting @SpeakinTelnet. This is a known issue (#5851) in Lodestar 1.10 with older NodeJS versions.

You have to update NodeJS to a version >=18.17.0 or >=20.1.0. I would recommend to install the latest release of node 20.

@SpeakinTelnet
Copy link
Author

Thanks @nflaig! Looks like I missed the issue in my search. I've updated to the latest NodeJs version and everything seems stable since. Closing the issue as fixed

@nflaig
Copy link
Member

nflaig commented Aug 12, 2023

We should probably re-open the other issue for better visibility as it is not really resolved until we know all users upgraded their node version.

Thanks again for reporting 👍

@czepluch
Copy link

czepluch commented Aug 31, 2023

I've been experiencing this 3 times today while syncing Nethermind from scratch with Lodestar. I'm on a Dappnode with Lodestar v1.10.0 upstream

@nflaig
Copy link
Member

nflaig commented Aug 31, 2023

@czepluch On Dappnode this issue should not be happening as a correct node version is used there, The memory leak would also take at least a day before it would crash Lodestar due to OOM.

From what you mentioned, it might be the case that Nethermind takes a lot of memory due to sync and the OOM is not related to Lodestar.

It would help if you provide more details

  • Is it Lodestar that is crashing or your server?
  • Any Lodestar error / debug logs would help as well
  • Metrics if you are tracking those via DMS on Dappnode

Best would be to open a new issue or just ask in #lodestar-help channel on discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-bug Issues that identify a bug and require a fix.
Projects
None yet
Development

No branches or pull requests

3 participants