Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binaries compiled with MDBX exhibit memory corruption #6277

Open
michaelsproul opened this issue Aug 19, 2024 · 5 comments
Open

Binaries compiled with MDBX exhibit memory corruption #6277

michaelsproul opened this issue Aug 19, 2024 · 5 comments
Labels
bug Something isn't working database slasher v6.0.0 New major release for hierarchical state diffs

Comments

@michaelsproul
Copy link
Member

michaelsproul commented Aug 19, 2024

Description

We've found a very weird issue on one of our nodes where binaries compiled with LMDB MDBX cause memory corruption during execution. Even when the code paths that use MDBX (i.e. the slasher) are inactive, the corruption still occurs.

The nature of the corruption is described here:

Initially we identified UB in the LMDB bindings and worked around it in this PR:

Steps to resolve (MDBX)

  1. Delete MDBX from LH, retaining LMDB as default for now (if it is untainted), and Redb.
  2. Move to using the Reth bindings (assuming they are free of UB), or

Steps to resolve (LMDB)

This section was written assuming LMDB was buggy, which it still could be:

Option 1 is that we remove LMDB from Lighthouse completely, or at least from our compiled binaries. The downside is that it's the default slasher DB, so this will require everyone with a slasher to delete their slasher DB and start again (not really a big deal). The other difficulty is choosing whether to go with Redb or MDBX as the new default (I would prefer Redb).

Option 2 is to try to fix the UB in the bindings. I worry that this will be very involved and for little long-term benefit. There are no alternative LMDB bindings we can use because the bug exists in the original lmdb and its fork lmdb-rkv. There are no LMDB bindings that are being maintained (see https://crates.io/search?q=lmdb).

@michaelsproul michaelsproul added bug Something isn't working database slasher v6.0.0 New major release for hierarchical state diffs labels Aug 19, 2024
@dapplion
Copy link
Collaborator

Let's move on to Option 1 and not waste time

@michaelsproul
Copy link
Member Author

Sounds good. I don't think we have the bandwidth to maintain LMDB bindings solo, and there's no real point pouring tonnes of resources in to a dead-end DB.

Documenting here so we have something to reference in the release notes when we delete it!

@michaelsproul michaelsproul changed the title Binaries compiled with LMDB exhibit memory corruption Binaries compiled with MDBX exhibit memory corruption Aug 22, 2024
@michaelsproul
Copy link
Member Author

Ok so it turns out it's actually the binaries compiled with MDBX that are corrupting their memory.

I think this means we need to either remove just MDBX, or remove MDBX and LMDB. Or we could try using the Reth bindings for MDBX.

@michaelsproul
Copy link
Member Author

@michaelsproul
Copy link
Member Author

That same node is crashing with LMDB again, but is also throwing I/O errors. Going to investigate the hardware failure angle some more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database slasher v6.0.0 New major release for hierarchical state diffs
Projects
None yet
Development

No branches or pull requests

2 participants