Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDB_CORRUPTED when multiprocessing #269

Open
NightTsarina opened this issue Nov 6, 2020 · 41 comments
Open

MDB_CORRUPTED when multiprocessing #269

NightTsarina opened this issue Nov 6, 2020 · 41 comments

Comments

@NightTsarina
Copy link

Affected Operating Systems

Linux Debian testing (bullseye)

Affected py-lmdb Version

1.0.0

py-lmdb Installation Method

apt

Using bundled or distribution-provided LMDB library?

Distribution

Distribution name and LMDB library version

(0, 9, 24)

Machine "free -m" output

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7670         191         846           0        6632        7181
Swap:          2047          71        1976

Other important machine info

All pretty standard, not using containers, and cgroups as might be configured by systemd/apache2. Filesystem is ext4.

Describe Your Problem

Hi,

I know this might be an upstream issue, but it is difficult to tell. It is a difficult-to-reproduce scenario, but the outcome is very serious: once this happened, my application stopped working, and found no way of fixing the database except by re-loading all the data from scratch (mdb_dump/mdb_load did not seem to handle it properly and give no debugging details).

So my application is built around py-lmdb, I have my own Environment and Transaction wrapping classes, that allow me to ensure only one Environment instance is created, and to keep track of all the opened transactions. Those classes are also able to detect MDB_MAP_FULL and MDB_MAP_RESIZED, and when either of those happen, they wait for all transactions to finish, close the environment, and reopen it (with an increased map size in the first case, and with the detected map size in the second case).

This was working like a charm with multiple threads, until I started also using multiple processes. A few times, either process will fill the DB, close and resize it; which will trigger the other process detecting the map being increased under its feet, and reopening the DB. But then a sequence of events happened that ended with the database corrupted.

This is an extract of the logs of both processes (process 1 is an WSGI application, process 2 is a CLI application), in which I see:

  • When reopening a resized DB, the new map_size has not been properly persisted, although I always commit a write transaction after opening the DB.
  • Just before the corruption happened, I see that the number of pages in use exceeds the map_size.
  • The apache process dies at one point with a bus error.

Process 1

[2020-11-05 21:28:17,519] ** ..Database reopened with map_size=4912832512 (used pages > map_size!!!) in other process..

[2020-11-05 21:28:17,587] INFO [req:0x19984d8841c644e0] [Item:block/7b936a8a] Item added to 10000 users (0 ignored).

[2020-11-05 21:28:18,456] ** ..MDB_MAP_FULL received in other process..
[2020-11-05 21:28:18,469] ** ..Database reopened with map_size=7369248768 in other process..]
[2020-11-05 21:28:18.725] ** ..Apache/mod-wsgi process (this process) dies with BUS ERROR..

[2020-11-05 21:28:23,512] INFO [req:] LMDB database 'myapp' opened: path=/srv/myapp-dev/db/myapp, capacity=4912840704 B, used=4912840704 B
[2020-11-05 21:28:36,238] ERROR [req:] Database error executing 'begin': mdb_txn_begin: MDB_MAP_RESIZED: Database contents grew beyond environment mapsize

[2020-11-05 21:28:36,240] INFO [req:] LMDB database 'myapp' opened: path=/srv/myapp-dev/db/myapp, capacity=4912840704 B, used=4914143232 B
** (used pages > map_size!!!) 

[2020-11-05 21:28:39,165] ERROR [req:0x796ee9c6b2ca9d5] Database error executing 'put': mdb_put: MDB_CORRUPTED: Located page was wrong type

Process 2

[2020-11-05 21:27:55,107] INFO [req:] [Item:block/403574cc] Item added to 5 users (9995 ignored).

[2020-11-05 21:27:55,688] ** ..MDB_MAP_FULL received in other process..
[2020-11-05 21:27:56,199] ** ..Database reopened with map_size=7369248768 in other process..

[2020-11-05 21:27:56,215] INFO [req:] [Item:block/403574cc] Item added to 247 users (9753 ignored).

[2020-11-05 21:28:17,422] ERROR [req:] Database error executing 'begin': mdb_txn_begin: MDB_MAP_RESIZED: Database contents grew beyond environment mapsize

[2020-11-05 21:28:17,519] INFO [req:] LMDB database 'myapp' opened: path=/srv/myapp-dev/db/myapp, capacity=4912832512 B, used=4912840704 B
** (used pages > map_size!!!) 

[2020-11-05 21:28:18,061] INFO [req:] [Item:block/403574cc] Item added to 510 users (9490 ignored).

[2020-11-05 21:28:18,456] ERROR [req:] Database error executing 'put': mdb_put: MDB_MAP_FULL: Environment mapsize limit reached
[2020-11-05 21:28:18,469] INFO [req:] Resizing LMDB database 'myapp' to 7369248768 bytes.

[2020-11-05 21:28:18,469] INFO [req:] LMDB database 'myapp' opened: path=/srv/myapp-dev/db/myapp, capacity=7369248768 B, used=4912840704 B

[2020-11-05 21:28:18,580] INFO [req:] [Item:block/403574cc] Item added to 510 users (9490 ignored).

[2020-11-05 21:28:18.725] ** ..Other process dies with BUS ERROR..

[2020-11-05 21:28:18,825] INFO [req:] [Audit#27099] AuditLog entry d71236e4 created: [apply block/ffe76a64 uids:510]

[2020-11-05 21:28:23,512]  ** ..Database reopened with map_size=4912840704 in other process..
[2020-11-05 21:28:36,238]  ** ..MDB_MAP_RESIZED received in other process..
[2020-11-05 21:28:23,240]  ** ..Database reopened with map_size=4912840704 (used pages > map_size!!!) in other process..

[2020-11-05 21:28:36,360] ERROR [req:] Database error executing 'put': mdb_put: MDB_CORRUPTED: Located page was wrong type

So, any clues? Is there some extra precaution I should be taking when doing these complicated dances? Or is this just some obscure bug i stepped on? Any help would be appreciated.

@NightTsarina
Copy link
Author

NightTsarina commented Nov 6, 2020

One clarification: when i said number of pages used, I meant the last_pgno value from info, which is reported in the logs as "used".

@jnwatson
Copy link
Owner

jnwatson commented Nov 9, 2020

The only time where I've seen this sort of thing happen is when file locking was accidentally bypassed, for example by sharing a file between docker containers, or opening the database with lock=False.

The general concept of what you're doing (multiple processes) is definitely something I've successfully done.

I'd be happy to track this to ground, but I need something reproducible. I know that is easier said than done. I've had success with using logging statements on every database action, and then writing a program that can replay those actions. Good luck.

@NightTsarina
Copy link
Author

Hi Nic, thanks for your answer!

I am not doing anything bizarre with the DB or the environment. I am using lock=True, the only non-default environment options I am using are buffers=True (I feed the data directly into pickle, so the raw memoryview objects never leaves the class, and I save memory copies), and readahead=False, and writemap=True, which I understand should be safe too.

The bus error crashes are easily reproduced here. At least one condition that seems to trigger them is when one process opens the database and uses the map_size as reported by the library, which seems to always be as big as the last page used. I always call set_mapsize after opening the environment, and if no resizing was needed, the code just passes the value as reported by Environment.info().

The first write operation immediately hits a MDB_MAP_FULL error and it is at that point where the other process (which had a bigger map size configured) crashes. So this does not seem to be related to the database resizing, as I that happens after one process crashes, but more with the initial map_size configuration. I suspect there is a bug in the map_size reporting that makes this happen often, but I don't understand why it crashes at all.

For completeness, I have tried not using buffers, and these bus error crashes are easily reproducible.

On the other hand, the full DB corruption is quite difficult to reproduce, as it seems to depend on the timings and writing patterns of each process. I will try to make a reproducible test, but I am not sure how feasible it is..

@jnwatson
Copy link
Owner

This whole area seems vaguely familiar. I did run across an issue on Windows once upon a time where a failure in set_mapsize left the process in an unsafe state.

It would be useful to get a stack trace of the crash. You can get that my running your process under gdb or enabling core dumps (e.g. ulimit -S -c unlimited ) and running gdb on the dumped file.

@NightTsarina
Copy link
Author

This is the top of the stack trace when the Bus error happens, sadly, it does not show much about the critical code path:

$ gdb /usr/bin/python3 core 
GNU gdb (Debian 9.2-1) 9.2

[...]

Program terminated with signal SIGBUS, Bus error.
#0  0x00007ff620e89507 in ?? () from /usr/lib/x86_64-linux-gnu/liblmdb.so.0
(gdb) bt
#0  0x00007ff620e89507 in ?? () from /usr/lib/x86_64-linux-gnu/liblmdb.so.0
#1  0x00007ff620e89753 in ?? () from /usr/lib/x86_64-linux-gnu/liblmdb.so.0
#2  0x00007ff620e8b38c in ?? () from /usr/lib/x86_64-linux-gnu/liblmdb.so.0
#3  0x00007ff620e8e495 in mdb_cursor_put () from /usr/lib/x86_64-linux-gnu/liblmdb.so.0
#4  0x00007ff620e912eb in mdb_put () from /usr/lib/x86_64-linux-gnu/liblmdb.so.0
#5  0x00007ff620ea9a08 in ?? () from /usr/lib/python3/dist-packages/lmdb/cpython.cpython-38-x86_64-linux-gnu.so
#6  0x000000000052108c in cfunction_call_varargs (kwargs={'dupdata': True, 'overwrite': True, 'db': <_Database at remote 0x7ff620e2d760>}, 
    args=(b'\x01\x01\xf0\xe6\xc5\xea\xda+]v\x80\xf5\x1a^+\x86{,', b'\x80\x04\x95\n\x00\x00\x00\x00\x00\x00\x00NN\x86\x94N\x86\x94\x85\x94.'), 
    func=<built-in method put of Transaction object at remote 0x7ff620e73e30>) at ../Objects/call.c:742
#7  PyCFunction_Call (func=<built-in method put of Transaction object at remote 0x7ff620e73e30>, 
    args=(b'\x01\x01\xf0\xe6\xc5\xea\xda+]v\x80\xf5\x1a^+\x86{,', b'\x80\x04\x95\n\x00\x00\x00\x00\x00\x00\x00NN\x86\x94N\x86\x94\x85\x94.'), 
    kwargs={'dupdata': True, 'overwrite': True, 'db': <_Database at remote 0x7ff620e2d760>}) at ../Objects/call.c:772
#8  0x0000000000510513 in _PyObject_MakeTpCall (callable=<built-in method put of Transaction object at remote 0x7ff620e73e30>, args=<optimized out>, 
    nargs=<optimized out>, keywords=<optimized out>) at ../Include/internal/pycore_pyerrors.h:13
#9  0x000000000050af9a in _PyObject_Vectorcall (kwnames=('dupdata', 'overwrite', 'db'), nargsf=<optimized out>, args=<optimized out>, 
    callable=<built-in method put of Transaction object at remote 0x7ff620e73e30>) at ../Include/cpython/abstract.h:125
#10 _PyObject_Vectorcall (kwnames=('dupdata', 'overwrite', 'db'), nargsf=<optimized out>, args=<optimized out>, 
    callable=<built-in method put of Transaction object at remote 0x7ff620e73e30>) at ../Include/cpython/abstract.h:115
#11 call_function (kwnames=('dupdata', 'overwrite', 'db'), oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>)
    at ../Python/ceval.c:4963
#12 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3515
#13 0x0000000000503df5 in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x15d5260, for file /home/tina/repos/2monos/colander/build/lib/tina/lmdb.py, line 594, in put (self=<_Transaction(...(truncated))
    at ../Python/ceval.c:741

@jnwatson
Copy link
Owner

That's useful. The next step is installing py-lmdb from source and doing it again.

  1. cd to a tmp directory
  2. git clone https://github.com/jnwatson/py-lmdb.git
  3. cd py-lmdb
  4. git checkout py-lmdb_1.0.0 so you get the same version that you're already running.
  5. (Optional) make/activate a virtualenv. At least pip uninstall lmdb so you don't get the other version.
  6. python setup.py build develop
  7. Rerun your test.

That should get me line numbers at least.

@NightTsarina
Copy link
Author

Thanks for the info, I am quite rusty with C and extensions these days! :)

After a few attempts, I managed to get a better stacktrace. I also compiled with -O0 so nothing is optimised out. This is the Bus error crash:

Reading symbols from python3...
Reading symbols from /usr/lib/debug/.build-id/56/3871a87b6f0981eaf50ac9268b1b0c131bca03.debug...
[New LWP 227574]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 ./build/scripts-3.8/colander load --item={"blocks":[{"selector":{"offer'.
Program terminated with signal SIGBUS, Bus error.
#0  0x00007fc4446beb06 in mdb_page_search_root (mc=0x7ffe96992e10, key=0x0, flags=4) at build/lib/mdb.c:5485
5485		while (IS_BRANCH(mp)) {
(gdb) bt
#0  0x00007fc4446beb06 in mdb_page_search_root (mc=0x7ffe96992e10, key=0x0, flags=4) at build/lib/mdb.c:5485
#1  0x00007fc4446bf03e in mdb_page_search (mc=0x7ffe96992e10, key=0x0, flags=4) at build/lib/mdb.c:5664
#2  0x00007fc4446c0bf0 in mdb_cursor_first (mc=0x7ffe96992e10, key=0x7ffe96992bf0, data=0x0) at build/lib/mdb.c:6236
#3  0x00007fc4446c09cf in mdb_cursor_set (mc=0x7ffe96992c80, key=0x7ffe96993050, data=0x7ffe96992bf0, op=MDB_SET, exactp=0x7ffe96992be0)
    at build/lib/mdb.c:6178
#4  0x00007fc4446c1e02 in mdb_cursor_put (mc=0x7ffe96992c80, key=0x7ffe96993050, data=0x7ffe96993060, flags=32) at build/lib/mdb.c:6614
#5  0x00007fc4446cb27d in mdb_put (txn=0x23f6dd0, dbi=5, key=0x7ffe96993050, data=0x7ffe96993060, flags=32) at build/lib/mdb.c:9022
#6  0x00007fc4446b7b6e in trans_put (self=0x7fc44469e030, 
    args=(b'\x01\x00\x00\x00\x00\x00\x00\x00\x05\xb8\xca\xec4DE\xbb\xb6h;\xcb\xd2\xf5M\x1b\xf4f\xb4Q', b'\x01}H\xc4\x82\x83h\xaa)Al\xf5QH_\nv'), 
    kwds={'dupdata': False, 'overwrite': True, 'db': <_Database at remote 0x7fc4438ed3a0>}) at lmdb/cpython.c:3352
#7  0x000000000052108c in cfunction_call_varargs (kwargs={'dupdata': False, 'overwrite': True, 'db': <_Database at remote 0x7fc4438ed3a0>}, 
    args=(b'\x01\x00\x00\x00\x00\x00\x00\x00\x05\xb8\xca\xec4DE\xbb\xb6h;\xcb\xd2\xf5M\x1b\xf4f\xb4Q', b'\x01}H\xc4\x82\x83h\xaa)Al\xf5QH_\nv'), 
    func=<built-in method put of Transaction object at remote 0x7fc44469e030>) at ../Objects/call.c:742
#8  PyCFunction_Call (func=<built-in method put of Transaction object at remote 0x7fc44469e030>, 
    args=(b'\x01\x00\x00\x00\x00\x00\x00\x00\x05\xb8\xca\xec4DE\xbb\xb6h;\xcb\xd2\xf5M\x1b\xf4f\xb4Q', b'\x01}H\xc4\x82\x83h\xaa)Al\xf5QH_\nv'), 
    kwargs={'dupdata': False, 'overwrite': True, 'db': <_Database at remote 0x7fc4438ed3a0>}) at ../Objects/call.c:772
#9  0x0000000000510513 in _PyObject_MakeTpCall (callable=<built-in method put of Transaction object at remote 0x7fc44469e030>, args=<optimized out>, 
    nargs=<optimized out>, keywords=<optimized out>) at ../Objects/call.c:159
#10 0x000000000050af9a in _PyObject_Vectorcall (kwnames=('dupdata', 'overwrite', 'db'), nargsf=<optimized out>, args=<optimized out>, 
    callable=<built-in method put of Transaction object at remote 0x7fc44469e030>) at ../Include/cpython/abstract.h:125
#11 _PyObject_Vectorcall (kwnames=('dupdata', 'overwrite', 'db'), nargsf=<optimized out>, args=<optimized out>, 
    callable=<built-in method put of Transaction object at remote 0x7fc44469e030>) at ../Include/cpython/abstract.h:115
#12 call_function (kwnames=('dupdata', 'overwrite', 'db'), oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>)
    at ../Python/ceval.c:4963
#13 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3515

@NightTsarina
Copy link
Author

One thing I noticed while testing this, is that there is definitely a bug in the persistance of map_size changes. Here is a minimal reproducible PoC, using the just compiled library:

$ python3 -c '
import lmdb;

e = lmdb.Environment("db/test", subdir=False, create=True, readahead=False, writemap=True,)
print(e.info())

e.set_mapsize(e.info()["map_size"] * 2)
print(e.info())
with e.begin(write=True) as txn:
  txn.put(b"foo",b"bar")
e.close()

e = lmdb.Environment("db/test", subdir=False, create=True, readahead=False, writemap=True,)
print(e.info())                                                       

import subprocess
subprocess.run(["ls", "-l", "db/"])'

{'map_addr': 0, 'map_size': 10485760, 'last_pgno': 7, 'last_txnid': 6, 'max_readers': 126, 'num_readers': 1}
{'map_addr': 0, 'map_size': 20971520, 'last_pgno': 7, 'last_txnid': 6, 'max_readers': 126, 'num_readers': 1}
{'map_addr': 0, 'map_size': 10485760, 'last_pgno': 7, 'last_txnid': 7, 'max_readers': 126, 'num_readers': 1}
total 40
-rw-r--r-- 1 tina tina 10485760 Nov 11 04:17 test
-rw-r--r-- 1 tina tina     8192 Nov 11 04:17 test-lock

@jnwatson
Copy link
Owner

I'm not 100% convinced either way that mapsize not sticking is a problem. Ultimately, the size of the file is a function of the OS mmap function. If there's no write to the latter part it might be OK? At the same time, I do agree it looks fishy.

This whole area is problematic. See #96. Also see: https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4056. Do you happen to be calling set_mapsize while there's an open transaction in the same process?

A real bug would be if there's a problem when calling set_mapsize when another process has a transaction. It certainly isn't reasonable for users to have external inter-process locking to increase map size.

@jnwatson
Copy link
Owner

Back to your call trace, the error looks pretty basic. It is looking for the root of the btree and crashing, which probably means there's a disconnect between what's being mapped in and where the root is.

I'm definitely interested in getting to the root of this problem. I'm going to work tonight on seeing if I can exploit what you found about the map size decreasing to cause a crash between two processes. If I launch a second process after the map size increase, but before the decrease, open a write transaction in the second process, then let the first process environment close, maybe that will crash?

@NightTsarina
Copy link
Author

I'm not 100% convinced either way that mapsize not sticking is a problem. Ultimately, the size of the file is a function of the OS mmap function. If there's no write to the latter part it might be OK? At the same time, I do agree it looks fishy.

The size of the file is not an issue, but the fact that the mapsize reported is just the file size means that I end up with different processes having different ideas of the mapsize, if one process starts after the second has done a resize.

One scenario that I am not sure how is handled is: if this second process tries to write, and the mapsize is exactly the file size, it will trigger a MDB_MAP_FULL immediately and resize; but at the same time the first process was operating with a different map_size that might be smaller or bigger than the new mapsize of the second process, what happens there?

This whole area is problematic. See #96. Also see: https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4056. Do you happen to be calling set_mapsize while there's an open transaction in the same process?

No, I implemented a locking mechanism similar to the one suggested in #96. I keep a count of active transactions, when I detect a MDB_MAP_FULL, I use a barrier to wait for all transactions to finish, then close the database, open again and resize, and only then new transactions can start.

A real bug would be if there's a problem when calling set_mapsize when another process has a transaction. It certainly isn't reasonable for users to have external inter-process locking to increase map size.

I was thinking about that while trying to find a workaround for this.. It would be very difficult to do that properly. The bus error crashes do seem to be caused by something like that though, but I am not sure yet if it is because the other process has a transaction, or if it also happens while idling.

@NightTsarina
Copy link
Author

Back to your call trace, the error looks pretty basic. It is looking for the root of the btree and crashing, which probably means there's a disconnect between what's being mapped in and where the root is.

I'm definitely interested in getting to the root of this problem.

I really appreciate your help! This is driving me nuts, as I have a whole thing written on top of LMDB, and now I can't find a way to ensure it will not lose all the data. Please, let me know if I can help in any way.

I'm going to work tonight on seeing if I can exploit what you found about the map size decreasing to cause a crash between two processes. If I launch a second process after the map size increase, but before the decrease, open a write transaction in the second process, then let the first process environment close, maybe that will crash?

One thing to note is that in these latter tests, I did not see the mapsize decreasing, it looked more like it not being persisted and the file size being the only indication of it. OTOH, the logs from the big crash with data corruption DO indicate that somehow the mapsize was decreased below the last page used.

@NightTsarina
Copy link
Author

I have spent some more time on this today looking at the C code.. Two things struck me:

  1. If I understand it correctly, the mapsize should be getting persisted after a transaction by saving the mm_mapsize field of the meta struct .
  2. The fact that I am getting the file size as map size, makes me believe this code is being executed, which I can only explain if the wrong meta block is being read... but then I would have lost the last transaction!
  3. More worryingly, if the mapsize was set in that line, and the other process commits just in the instant before mdb_env_map is called, then this other line might explain the DB corruption?

@jnwatson
Copy link
Owner

Well, number 3 can only happen if you have write_map set.

I was unable to repro anything.

Looking at some of my code that does the same thing, there is a value that you can pass as a parameter to set_mapsize that is guaranteed to be safe: 0. You might try that.

I'm afraid all that is left is printf debugging. You can comment a line out here and you'll get a bunch of stuff printed out that might help figure out the order of operations.

@NightTsarina
Copy link
Author

Well, number 3 can only happen if you have write_map set.

Yes, I have it set.. Is that a bad idea? From the docs and considering how I am using it, it seemed safe.

More importantly, unless I am missing something, there is a race condition there where you could be truncating the file that has just been extended by another process. It is a bit difficult to follow for me, but from what I see mdb_env_set_mapsize does not get an exclusive lock before calling mdb_env_map.

In any case, it does not change the behaviour I see with the short test I wrote 2 days ago, but it does seem to prevent the bus errors.

I was unable to repro anything.

Not even the bus error? Sorry, I thought you had already. I will work on getting a repro test.

You were able to reproduce the mapsize shrinking after reopen, right?

Looking at some of my code that does the same thing, there is a value that you can pass as a parameter to set_mapsize that is guaranteed to be safe: 0. You might try that.

Oh, I was not expecting that, and I see that the mapsize gets corrected after doing it... Now, isn't this a bug?

I'm afraid all that is left is printf debugging. You can comment a line out here and you'll get a bunch of stuff printed out that might help figure out the order of operations.

Thanks for the pointer, I see now why defining constants in CFLAGS didn't work :)

BTW, I forgot to show what I am doing in case it helps.. This is the wrapper class I wrote: https://gist.github.com/NightTsarina/bd42736531a52c2843ee3e6d0d8b0ae2

@jnwatson
Copy link
Owner

Writemap should be safe, but turning it off does help characterize the problem.

I don't think the use of ftruncate was expected to make the file smaller. That's still the best bet of what's going wrong. I was able to repro the mapsize shrinking. I was unable to repro the bus error.

See http://www.lmdb.tech/doc/group__mdb.html#gaa2506ec8dab3d969b0e609cd82e619e5 for details on the upstream set_mapsize API. "This function may be called with a size of zero to adopt the new size." I need to document this in the Python API. The other details of that call might be important too.

I'll take a look at your code later.

@jnwatson
Copy link
Owner

One more question? You're not sharing the lmdb file across different docker containers or from inside outside a container, right? I don't believe file locking works correctly across a container boundary.

@NightTsarina
Copy link
Author

OK, I wiil try to get you a way to reproduce the bus error.

About the API, I had read it, but did not expect having to call it with zero just after opening the DB! In any case, I still need to wait for all transactions to finish if I want to use it after a MDB_MAP_RESIZED, so it does not change much what I need to do.

And no, no containers are in use at all.. I think they do not affect file locking unless you are using some bizarre fuse tricks, but it is not the case here.

@jnwatson
Copy link
Owner

IIRC, the problem with file locking was on the Mac between the host and the container (running different OS's). I look forward the the repro.

@NightTsarina
Copy link
Author

Sorry for the delay, I finally have a script to reproduce this problem very consistently. It took me way longer to manage errors, dead processes and whatnot than actually reproducing the problem :)

This is the script: https://gist.github.com/NightTsarina/e6bb8734bc15f333ccdcd6ae7d31893f

I have run this multiple times on my laptop running Debian unstable with linux 5.9.0-1-amd64; I have tried compiling the lib directly from your repo and -O0 -DMDB_DEBUG=1 and also directly with the Debian packaged version. I also tried both ext3 and tmpfs filesystems, with similar results. I get crashes in a matter of seconds. In fact, I am scared at how easily I get MDB to report a corrupted DB!

I made the script so it can take two parameters: whether to use writemap=True and whether to call set_mapsize(0) after opening the environment. Without the former, I have not yet reproduced the crashes or corruptions, but occasionally the library would report a map_size smaller than the used pages, and then immediately get an MDB_MAP_RESIZED when trying to write any data.

On the other hand, if I set writemap=True and don't call set_mapsize(0), I get bus errors and the ocassional MDB_MAP_CORRUPTED. When both parameters are true, I don't get bus errors, but plenty of corruption errors.

  • use_writemap=True, set_mapsize0=False
$ python3 test/crashtest.py 1 0
[INFO/Process-1] child process calling self.run()
[INFO/Process-1] Process started
[INFO/Process-1] LMDB stats: map_size=10485760, used_size=8192
[INFO/Process-2] child process calling self.run()
[INFO/Process-2] Process started
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=8192
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-1] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=10481664
[INFO/Process-2] Resizing to 20971520.
[INFO/Process-1] LMDB stats: map_size=10485760, used_size=10481664
[INFO/Process-1] Resizing to 20971520.
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=20942848, used_size=20942848
[INFO/Process-2] Resizing to 41885696.
[INFO/Process-1] ** MDB_MAP_RESIZED
[INFO/Process-1] LMDB stats: map_size=21008384, used_size=21008384
[INFO/Process-1] ** MDB_MAP_RESIZED
[INFO/Process-1] LMDB stats: map_size=21086208, used_size=21086208
[ERROR/Process-1] Process raised an exception
Traceback (most recent call last):
  File "test/crashtest.py", line 69, in task
    txn.put(key, value)
lmdb.CorruptedError: mdb_put: MDB_CORRUPTED: Located page was wrong type
[INFO/Process-1] Process finished
[INFO/Process-1] process shutting down
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "test/crashtest.py", line 69, in task
    txn.put(key, value)
lmdb.CorruptedError: mdb_put: MDB_CORRUPTED: Located page was wrong type
[INFO/Process-1] process exiting with exitcode 1
[INFO/MainProcess] Process Process-1 died with status 1.
[INFO/MainProcess] process shutting down
  • use_writemap=True, set_mapsize0=True
$ python3 test/crashtest.py 1 1
[INFO/Process-1] child process calling self.run()
[INFO/Process-1] Process started
[INFO/Process-1] LMDB stats: map_size=10485760, used_size=8192
[INFO/Process-2] child process calling self.run()
[INFO/Process-2] Process started
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=8192
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=10457088
[INFO/Process-2] Resizing to 20971520.
[INFO/Process-1] ** MDB_MAP_RESIZED
[INFO/Process-1] LMDB stats: map_size=10530816, used_size=10530816
[INFO/MainProcess] Process Process-2 died with status -7.
[INFO/MainProcess] process shutting down
$ python3 test/crashtest.py 1 1
[INFO/Process-1] child process calling self.run()
[INFO/Process-1] Process started
[INFO/Process-1] LMDB stats: map_size=10485760, used_size=8192
[INFO/Process-2] child process calling self.run()
[INFO/Process-2] Process started
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=8192
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=10481664
[INFO/Process-1] ** MDB_MAP_FULL
[INFO/Process-2] Resizing to 20971520.
[INFO/Process-1] LMDB stats: map_size=10485760, used_size=10481664
[INFO/Process-1] Resizing to 20971520.
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=20922368, used_size=20922368
[INFO/Process-2] Resizing to 41844736.
[INFO/Process-1] ** MDB_MAP_RESIZED
[INFO/Process-1] LMDB stats: map_size=21061632, used_size=21061632
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=41816064, used_size=41816064
[INFO/Process-2] Resizing to 83632128.
[ERROR/Process-1] Process raised an exception
Traceback (most recent call last):
  File "test/crashtest.py", line 69, in task
    txn.put(key, value)
lmdb.CorruptedError: mdb_put: MDB_CORRUPTED: Located page was wrong type
[INFO/Process-1] Process finished
[ERROR/Process-2] Process raised an exception
Traceback (most recent call last):
  File "test/crashtest.py", line 69, in task
    txn.put(key, value)
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test/crashtest.py", line 76, in task
    dbenv = dbopen(dbpath, use_writemap, set_mapsize0, resize=True)
  File "test/crashtest.py", line 27, in dbopen
    txn.put(b'_resized', b'1')
lmdb.CorruptedError: mdb_put: MDB_CORRUPTED: Located page was wrong type
[INFO/Process-1] process shutting down
[INFO/Process-2] Process finished
Process Process-1:
[INFO/Process-2] process shutting down
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "test/crashtest.py", line 69, in task
    txn.put(key, value)
lmdb.CorruptedError: mdb_put: MDB_CORRUPTED: Located page was wrong type
Traceback (most recent call last):
  File "test/crashtest.py", line 69, in task
    txn.put(key, value)
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "test/crashtest.py", line 76, in task
    dbenv = dbopen(dbpath, use_writemap, set_mapsize0, resize=True)
  File "test/crashtest.py", line 27, in dbopen
    txn.put(b'_resized', b'1')
lmdb.CorruptedError: mdb_put: MDB_CORRUPTED: Located page was wrong type
[INFO/Process-2] process exiting with exitcode 1
[INFO/Process-1] process exiting with exitcode 1
[INFO/MainProcess] Process Process-1 died with status 1.
[INFO/MainProcess] process shutting down

When running under tmpfs, something different happens: I don't seem to be able to corrupt the DB (but still plenty of bus errors), and many times I get MDB_MAP_FULL errors while trying to open the environment, which seems like another race condition:

[INFO/Process-1] child process calling self.run()
[INFO/Process-1] Process started
[INFO/Process-1] LMDB stats: map_size=10485760, used_size=8192
[INFO/Process-2] child process calling self.run()
[INFO/Process-2] Process started
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=122880
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-1] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=10485760, used_size=10481664
[INFO/Process-2] Resizing to 20971520.
[INFO/Process-1] LMDB stats: map_size=10485760, used_size=10481664
[INFO/Process-1] Resizing to 20971520.
[INFO/Process-1] ** MDB_MAP_FULL
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-2] LMDB stats: map_size=20967424, used_size=20967424
[INFO/Process-2] Resizing to 41934848.
[INFO/Process-1] LMDB stats: map_size=20967424, used_size=20967424
[INFO/Process-1] Resizing to 41934848.
[INFO/Process-2] ** MDB_MAP_FULL
[INFO/Process-1] ** MDB_MAP_FULL
[INFO/Process-1] LMDB stats: map_size=41922560, used_size=41922560
[INFO/Process-1] Resizing to 83845120.
[ERROR/Process-2] Process raised an exception
Traceback (most recent call last):
  File "test/crashtest.py", line 73, in task
    txn.put(key, value)
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test/crashtest.py", line 80, in task
    dbenv = dbopen(dbpath, use_writemap, set_mapsize0, resize=True)
  File "test/crashtest.py", line 18, in dbopen
    dbenv = lmdb.Environment(
lmdb.MapResizedError: mdb_txn_begin: MDB_MAP_RESIZED: Database contents grew beyond environment mapsize
[INFO/Process-2] Process finished
[INFO/Process-2] process shutting down
Process Process-2:
Traceback (most recent call last):
  File "test/crashtest.py", line 73, in task
    txn.put(key, value)
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "test/crashtest.py", line 80, in task
    dbenv = dbopen(dbpath, use_writemap, set_mapsize0, resize=True)
  File "test/crashtest.py", line 18, in dbopen
    dbenv = lmdb.Environment(
lmdb.MapResizedError: mdb_txn_begin: MDB_MAP_RESIZED: Database contents grew beyond environment mapsize
[INFO/Process-2] process exiting with exitcode 1
[INFO/MainProcess] Process Process-1 died with status -7.
[INFO/MainProcess] process shutting down

@jnwatson
Copy link
Owner

This gives me a lot to work with. On the bright size, running it with 0 1 seems to be reliable. I'll investigate everything else.

@jnwatson
Copy link
Owner

With lots of debug prints, I can confirm the fruncate issue you identified. One process is nuking the last part of the database file out from under the other. I'm still on it.

@jnwatson
Copy link
Owner

Thank you for your patience. It is clear to me now that you've known the problem for a while.

@NightTsarina
Copy link
Author

No, thank you for taking the time to investigate this 🥰

@jnwatson
Copy link
Owner

jnwatson commented Nov 17, 2020

New minimal repro:

import lmdb
import subprocess
import multiprocessing

dbpath = sys.argv[1] + '.lmdb'

subprocess.run(["rm", "-r", dbpath])

MB = 1024 * 1024

e = lmdb.Environment(dbpath, writemap=True, create=True, map_size=4 * MB)

print("*0", e.info())

def newproc():
    e = lmdb.Environment(dbpath, writemap=True, create=True, map_size=2 * MB)
    print("*1", e.info())
    with e.begin(write=True) as txn:
        txn.put(b"xxx", b"bar")
    e.close()

p = multiprocessing.Process(target=newproc)
p.start()
p.join()

print("*2", e.info())
with e.begin(write=True) as txn:
    for i in range(70000):
        txn.put(b"foo" + i.to_bytes(4, 'big'), b"bar")

e.close()

@jnwatson
Copy link
Owner

OK. I have at least one problem characterized. Ultimately, the meta page (with the map size) is being saved and read correctly. The problem is that the in-memory value ignores it if the new value is not before the end of the current data (see https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4416). It then fruncates the file to the smaller value.

The problem is that the first process then uses its cached in-memory (in env) version, so it will happily try to write past the end of the file.

This can happen any time you open an environment or call set_mapsize on an environment with write_map=True. The problem might just be with the ftruncate. If that never reduces the file size, (or we're the only reader) it should be safe.

This doesn't explain why it still fails when map_size=0, but I have a feeling the problem is in the same area. There is an exclusive lock taken when an environment is open, so I don't see where a race might occur. It is definitely possible for two processes to have a different idea of what the map size is.

As long as the smaller one doesn't end up writing the meta page with the smaller value, we don't have a loss of data. I will continue to explore this direction. The question is whether it is possible for the smaller-value process to successfully complete a transaction (without getting a RESIZED or FULL error). If that happens, and the larger-value process subsequently quits without writing, the DB might be truncated at next start.

This is clearly an upstream problem. I'll need to put together a repro in C.

@NightTsarina
Copy link
Author

Hi,

OK. I have at least one problem characterized. Ultimately, the meta page (with the map size) is being saved and read correctly. The problem is that the in-memory value ignores it if the new value is not before the end of the current data (see https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4416). It then fruncates the file to the smaller value.

Ah, i see, that makes sense. It should probably be checking if the on-disk mapsize is bigger than the configured one.

The problem is that the first process then uses its cached in-memory (in env) version, so it will happily try to write past the end of the file.

This can happen any time you open an environment or call set_mapsize on an environment with write_map=True. The problem might just be with the ftruncate. If that never reduces the file size, (or we're the only reader) it should be safe.

Wouldn't it also be a problem when the second process writes a transaction and persists the smaller map_size?

This doesn't explain why it still fails when map_size=0, but I have a feeling the problem is in the same area. There is an exclusive lock taken when an environment is open, so I don't see where a race might occur. It is definitely possible for two processes to have a different idea of what the map size is.

The issue with map_size=0 seems to be that the persisted map_size is only read if you call set_mapsize(0), but should always be read during mdb_env_open2 according to https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4405. Maybe the problem is that there is a default value for map_size somewhere making this code path never be executed, and so the initial map_size is always the last page or the default value (whatever is bigger)?

As long as the smaller one doesn't end up writing the meta page with the smaller value, we don't have a loss of data. I will continue to explore this direction. The question is whether it is possible for the smaller-value process to successfully complete a transaction (without getting a RESIZED or FULL error). If that happens, and the larger-value process subsequently quits without writing, the DB might be truncated at next start.

This is clearly an upstream problem. I'll need to put together a repro in C.

Yeah, it seems so. Please let me know if I can help somehow!

@jnwatson
Copy link
Owner

Wouldn't it also be a problem when the second process writes a transaction and persists the smaller map_size?

Yes. I need to confirm whether this is possible.

The issue with map_size=0 seems to be that the persisted map_size is only read if you call set_mapsize(0), but should always be read during mdb_env_open2 according to https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4405. Maybe the problem is that there is a default value for map_size somewhere making this code path never be executed, and so the initial map_size is always the last page or the default value (whatever is bigger)?

The meta page is always read, but then chooses to ignore it, preferring to minimize the file size to what's actually used. I need to confirm that it is a failure to mmap, then ftruncate, then write to the old end of the file (the behavior is actually undefined in POSIX).

There is a default value (https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L615), but it is only applied when a database is first created.

@jnwatson
Copy link
Owner

I spoke too soon. You are absolutely on the right path regarding the default mapsize. This is actually a bug in something I control. py-lmdb provides a default value (that's the same value as the default value for lmdb when creating a database). This might be completely fixed if I change that default value to 0.

@jnwatson
Copy link
Owner

jnwatson commented Nov 18, 2020

And that fixes it completely. All four combinations of your repro pass once I changed the default in py-lmdb.

What we have here is two bugs, one in py-lmdb and one in lmdb. The first bug is that a non-zero default value of map_size on Environment is inappropriate. Passing 0 is generally the right answer most of the time.

That bug triggers a second bug in the underlying lmdb where opening a database with write_map=True and explicitly specifying a map_size too small will ftruncate the file out from underneath another open process.

@jnwatson
Copy link
Owner

Upstream bug report: https://bugs.openldap.org/show_bug.cgi?id=9397

@NightTsarina
Copy link
Author

Hi Nic,
Sorry for the radio silence, I had been super busy with other projects.. I saw the upstream bug report, but they don't seem to be interested in fixing this issue at all, isn't it? Between this and the issues I am having with the single writer model, I am starting to regret choosing LMDB as a backend, tbh... 😞

OTOH, I had completely missed the message when you say removing the default value avoids this bug completely! If I understand your code correctly, it would be enough to pass map_size=0 when opening the environment to avoid that completely? (I rather keep using the Debian package if there is no other changes in the code). Right now this project is running without write_map=True, and so its performance is significantly worse than before, I am eager to re-enable it

@jnwatson
Copy link
Owner

Making map_size=0 significantly reduces the window where a problem could happen, but it doesn't eliminate it. The patch I'll submit upstream will narrow that further, but can't eliminate it without explicit locking.

My immediate plans are to implement the external locking in py-lmdb since it doesn't seem like upstream is amenable.

@skupr-anaconda
Copy link

skupr-anaconda commented Apr 26, 2021

The same error is when py-lmdb 1.2.1 builds on linux_ppc64le platform (py3.6-py3.9):

## Package Plan ##

  environment location: /root/worker/containers/00000a7rlau/tmp/build/80754af9/python-lmdb_1619391259780/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl


The following NEW packages will be INSTALLED:

    _libgcc_mutex:         0.1-main                                                                                            
    attrs:                 20.3.0-pyhd3eb1b0_0                                                                                 
    ca-certificates:       2021.4.13-h6ffa863_1                                                                                
    iniconfig:             1.1.1-pyhd3eb1b0_0                                                                                  
    ld_impl_linux-ppc64le: 2.33.1-h0f24833_7                                                                                   
    libffi:                3.3-he6710b0_2                                                                                      
    libgcc-ng:             8.2.0-h822a55f_1                                                                                    
    libstdcxx-ng:          8.2.0-h822a55f_1                                                                                    
    more-itertools:        8.7.0-pyhd3eb1b0_0                                                                                  
    ncurses:               6.2-he6710b0_1                                                                                      
    openssl:               1.1.1k-h140841e_0                                                                                   
    packaging:             20.9-pyhd3eb1b0_0                                                                                   
    pluggy:                0.13.1-py38h6ffa863_0                                                                               
    py:                    1.10.0-pyhd3eb1b0_0                                                                                 
    pyparsing:             2.4.7-pyhd3eb1b0_0                                                                                  
    pytest:                6.2.3-py38h6ffa863_2                                                                                
    python:                3.8.8-h836d2c2_5                                                                                    
    python-lmdb:           1.2.1-py38h29c3540_1  file:///root/worker/containers/00000a7rlau/tmp/build/80754af9/output-artifacts
    readline:              8.1-h140841e_0                                                                                      
    six:                   1.15.0-py38h6ffa863_0                                                                               
    sqlite:                3.35.4-hd7247d8_0                                                                                   
    tk:                    8.6.10-hbc83047_0                                                                                   
    toml:                  0.10.2-pyhd3eb1b0_0                                                                                 
    xz:                    5.2.5-h7b6447c_0                                                                                    
    zlib:                  1.2.11-h7b6447c_3                                                                                   

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
export PREFIX=/root/worker/containers/00000a7rlau/tmp/build/80754af9/python-lmdb_1619391259780/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl
export SRC_DIR=/root/worker/containers/00000a7rlau/tmp/build/80754af9/python-lmdb_1619391259780/test_tmp
import: 'lmdb'
import: 'lmdb'
+ py.test tests
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: $SRC_DIR
collected 190 items

tests/crash_test.py ..................F                                  [ 10%]
tests/cursor_test.py .............................                       [ 25%]
tests/env_test.py ...................................................... [ 53%]
..............                                                           [ 61%]
tests/getmulti_test.py ....                                              [ 63%]
tests/iteration_test.py ..................                               [ 72%]
tests/package_test.py .                                                  [ 73%]
tests/tool_test.py ..                                                    [ 74%]
tests/txn_test.py .................................................      [100%]

=================================== FAILURES ===================================
_______________________ MapResizeTest.test_opendb_resize _______________________

self = <crash_test.MapResizeTest testMethod=test_opendb_resize>

    def test_opendb_resize(self):
        '''
        Test that we correctly handle a MDB_MAP_RESIZED in env.open_db.
    
        Would seg fault in cffi implementation
        '''
        mpctx = multiprocessing.get_context('spawn')
        path, env = testlib.temp_env(max_dbs=10, map_size=32000)
        env.close()
        env = lmdb.open(path, max_dbs=10, map_size=32000, readonly=True)
        proc = mpctx.Process(target=self.do_resize, args=(path,))
        proc.start()
        proc.join(5)
        assert proc.exitcode is not None
>       self.assertRaises(lmdb.MapResizedError, env.open_db, b'foo')
E       lmdb.ReadonlyError: mdb_dbi_open: Permission denied

tests/crash_test.py:316: ReadonlyError
----------------------------- Captured stderr call -----------------------------
Process SpawnProcess-1:
Traceback (most recent call last):
  File "$PREFIX/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "$PREFIX/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "$SRC_DIR/tests/crash_test.py", line 296, in do_resize
    env.open_db(b'foo')
lmdb.MapFullError: mdb_dbi_open: MDB_MAP_FULL: Environment mapsize limit reached
=========================== short test summary info ============================
FAILED tests/crash_test.py::MapResizeTest::test_opendb_resize - lmdb.Readonly...
======================== 1 failed, 189 passed in 9.78s =========================
Tests failed for python-lmdb-1.2.1-py38h29c3540_1.tar.bz2 - moving package to /root/worker/containers/00000a7rlau/tmp/build/80754af9/broken
WARNING:conda_build.build:Tests failed for python-lmdb-1.2.1-py38h29c3540_1.tar.bz2 - moving package to /root/worker/containers/00000a7rlau/tmp/build/80754af9/broken
WARNING conda_build.build:tests_failed(2955): Tests failed for python-lmdb-1.2.1-py38h29c3540_1.tar.bz2 - moving package to /root/worker/containers/00000a7rlau/tmp/build/80754af9/broken
TESTS FAILED: python-lmdb-1.2.1-py38h29c3540_1.tar.bz2

@jnwatson
Copy link
Owner

This is not actually the same issue. Please open a new issue.

@venu
Copy link

venu commented Jul 8, 2021

Hi

Apologies in advance for making use of this thread! I think its same one but different configuration so I am not sure..

Here is the script which can reproduce the Bus errors quite easily (lmdb==1.2.1).
https://gist.github.com/venu/bfaf4f2f4d604171206211a3d6cc4d42

Summary:
I am following single writer model (N readers). With initial small map size & autoresize it when MDB_MAP_FULL and handle MDB_MAP_RESIZED case in readers. Writer is crashing with bus errors.

config = {
        "meminit": False,
        "readahead": False,
        "map_size": 2 ** 10,
        "metasync": False,
        "sync": False,
        "map_async": True,
        "writemap": True,
    }

Crash (ubuntu 18.04/python 3.7.5/lmdb 1.2.1)

(gdb) backtrace
#0  mdb_page_alloc (num=num@entry=3, mp=mp@entry=0x7fffffffce00, mc=<optimized out>) at build/lib/mdb.c:2310
#1  0x00007ffff5e04994 in mdb_page_new (mc=mc@entry=0x7fffffffcfe0, flags=flags@entry=4, num=3, mp=mp@entry=0x7fffffffce60)
  at build/lib/mdb.c:7193
#2  0x00007ffff5e04d99 in mdb_node_add (mc=mc@entry=0x7fffffffcfe0, indx=<optimized out>, key=key@entry=0x7fffffffd3a0, data=0x7fffffffd3b0, 
  pgno=pgno@entry=0, flags=0) at build/lib/mdb.c:7335
#3  0x00007ffff5e082a9 in mdb_cursor_put (mc=0x7fffffffcfe0, key=0x7fffffffd3a0, data=0x7fffffffd3b0, flags=<optimized out>)
  at build/lib/mdb.c:6960
#4  0x00007ffff5e0a85b in mdb_put (txn=0xbefda0, dbi=1, key=key@entry=0x7fffffffd3a0, data=data@entry=0x7fffffffd3b0, flags=flags@entry=0)
  at build/lib/mdb.c:9045
#5  0x00007ffff5dfdee0 in trans_put (self=0x7ffff5dbbd50, args=<optimized out>, kwds=<optimized out>) at lmdb/cpython.c:3554
#6  0x00000000004d75b4 in _PyMethodDescr_FastCallKeywords ()
#7  0x0000000000552eb8 in _PyEval_EvalFrameDefault ()
#8  0x00000000005d977c in _PyFunction_FastCallKeywords ()
#9  0x000000000054efdc in _PyEval_EvalFrameDefault ()
#10 0x00000000005d5e47 in ?? ()
#11 0x000000000054ee40 in _PyEval_EvalFrameDefault ()
#12 0x000000000054c522 in _PyEval_EvalCodeWithName ()
#13 0x000000000054e933 in PyEval_EvalCode ()
#14 0x00000000006305a2 in ?? ()
#15 0x0000000000630657 in PyRun_FileExFlags ()
#16 0x00000000006312cf in PyRun_SimpleFileExFlags ()
#17 0x0000000000654232 in ?? ()
#18 0x000000000065458e in _Py_UnixMain ()
#19 0x00007ffff7a03bf7 in __libc_start_main (main=0x4bbbf0 <main>, argc=3, argv=0x7fffffffdcb8, init=<optimized out>, fini=<optimized out>, 
  rtld_fini=<optimized out>, stack_end=0x7fffffffdca8) at ../csu/libc-start.c:310
#20 0x00000000005e0cca in _start ()

With write_map=False things looks fine so far (have to test throughly)!

I am looking forward to hear your thoughts on this.. Thank you in advance!

@jnwatson
Copy link
Owner

jnwatson commented Jul 8, 2021

See my comment here: #301 (comment)

Essentially, multiprocessing is being too smart and recycling processes that still have open environment handles.

You're relying on the garbage collector to close the DB. Try an explicit del store at the end of your writer and reader methods.

@venu
Copy link

venu commented Jul 9, 2021

Thank you Nic for your quick reply and apolgoies for the delay (Different timezones)

I have looked into your suggestion and I have tried following things

  • Closing connection after writing/reading is done
  • Also calling sync(True) before close
  • Tried subprocess instead of multiprocessing

I can confirm the issue still persists.. And this is happening when new process open environment while writer doing resizing.. For now, we have decided to go with big mapsize (1TB) to avoid frequent map resizing.
[EDIT]
Here is the latest code
https://gist.github.com/venu/f4ebf1e457c8355e329c8ad6b69a3397

We have also few questions about below configuration. Sorry for these (if there is any channel for communication with you, please let me know, I will approach via that)

Plan: We want to use async writing to make requests faster and also avoid writing for every request and reduce IOPS.

config = {
        "meminit": False,
        "readahead": False,
        "map_size": 2 ** 10,
        "metasync": False,
        "sync": False,
        "map_async": True,
        "writemap": True,
    }

I read the document and I have following questions
https://lmdb.readthedocs.io/en/release/#environment-class

Note that sync=False, writemap=True leaves the system with no hint for when to write transactions to disk, unless sync() is called. map_async=True, writemap=True may be preferable.

With above configuration,

  1. Do we have to call sync() method (in writemap=True mode)? if so, do I have to pass force=True to sync()? (I am not seeing no bif diff with and without call in writemap=True mode in my quick tests)
  2. Is there any issue we call sync() only during writer server shutdown (I undestand we will loose data or db corrupt if OS crash)?
  3. I am pressuming once we write to memory map, we leave the job of flushing data to disk to OS and we have nothing to do with sync, but wanted to confrm whether my understanding is correct.
  4. Does sync=False has any effect in this configuration with writemap=True?

Thank you in advance!

@jnwatson
Copy link
Owner

sync=False is not compatible with multiple processes accessing the data simultaneously.

@venu
Copy link

venu commented Jul 15, 2021

Okay, Thank you!

@katadonutskuri
Copy link

Is this the same/related to the bug filed here? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244493

Seems like there was a recent fix for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants