-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process crash when iteratively writing 'batches' of values via cursor in transaction #155
Comments
Interesting. From a quick glance this code looks correct. I'll take a deeper look at this today or tomorrow. What kind of exception is being thrown? |
No dotnet exceptions thrown, just crashes the process. When running as an xunit test it shows the following: "The active test run was aborted. Reason: Test host process crashed." |
Lemme try to repro. Btw, what OS is this happening on? |
Confirmed a repro. My test console application dies after 2039 iterations of calling append() with no observable exception. |
Wow, that's exactly where mine was happening. I apologize for not putting the OS details and version. Windows 11, .NET 6, LightningDB 0.14.1. Thanks for spinning up a repro sample. I guess this is a good test case to include into the test suite. I guess I'm not crazy and doing something wrong ;) Wonder if this happens on other OS's. What I was working on may end up running in a linux VM at some point but way too early in my prototype for that. |
No, it's definitely not just you. There is a crash happening deep in the call stack internal to LMDB. In fact it is faulting in ntdll, probably due to some type of resource exhaustion. I can see the following error in windows event viewer
But what is very weird is the program crashes, but I can't observe the exception in VS2022. But when i use WinDBG, the program does not crash and concludes 5000 iteractions successfully. I am still trying to determined what the issue is exactly. Historically, LMDB is extremely reliable as long as it is used correctly, so i would bet this is a problem with a invalid config. But I still trying to figure out what exactly... |
Observations
|
My first attempt at using the LightningDB.net I had it writing one item at a time and didn't get good throughput. So I looked in the unit tests to see if there was a bulk way to get data in and saw the cursor approach so I tried that and got better throughput in my unit tests and thought I was good to go. Then when I wired this into my prototype project it went to hell once I ramped up the number of items in my simulation. I have an Akka actor sending the chunks of items to insert over to the writer actor. Poof, went up in smoke. How does the transaction work if you don't commit? Is there a better way to batch stuff without using cursors? |
@4deeptech Yep, you only need to use cursors for some of the fancier features like assigning multiple values to a single key (which can be pretty useful for advanced scenarios). To perform a batch insert simply create a transaction and then call On the up side, avoiding the use of cursors fixes this issue and In can make unbounded writes as long as the map size is adequate. I am still trying to pin down the current issue, it is related to disposal of the database handle. I'm wondering if there is a leak somewhere. |
If do not commit, the data written during the transaction is voided and released when the transaction is disposed. |
After some fiddling, I managed to get WinDBG to hook the process mid crash. It looks like a genuine native LMDB bug involving a corrupt memory heap. I'll work on minimizing a repro and reporting this upstream.
@4deeptech Another side note, you don't actually need to re-open the database handle each iteration. You can safely open a handle once per application and then reuse it for all transactions. This has a positive impact on performance. Each time you open a DB handle in LMDB, it performs a linear scan of all the databases in the environment. If your environment has lots of databases (dozens or more...) the time taken to open a DB starts to add up. There are some other optimization opportunities i see in your example code pertaining to string operations and memory allocations, but idk how much of a performance win they would be compared to the IO operation. Let me know if you'd like some example code, I'd be happy to help. |
@4deeptech using System.Text;
using LightningDB;
//this is a single file formatted C# program
//executions begins here
Console.WriteLine("Hello, World!");
using var environment = CreateFreshEnvironment("C:/dev/temp");
environment.Open();
using var db = OpenDatabase(environment, "testDb");
//this loop fails after 2039 iterations
for (int i = 0; i < 5000; i++) {
ReproCoreIteration(environment, db);
Console.WriteLine($"Completed iteration {i}");
}
static void ReproCoreIteration(LightningEnvironment environment, LightningDatabase db)
{
using var tx = environment.BeginTransaction();//auto-disposed at end of scope
using var cursor = tx.CreateCursor(db); //auto-disposed at end of scope
var guid = Guid.NewGuid().ToString();
var guidBytes = Encoding.UTF8.GetBytes(guid);
_ = cursor.Put(
guidBytes,
guidBytes,
CursorPutOptions.None
);
var trxResult = tx.Commit();
if (trxResult != MDBResultCode.Success)
throw new Exception("LightningDB append commit exception");
//program crashes here during transaction dispose after exactly 2039 iterations
//Note : Commit() call is successful, failure happens after commit when transaction is disposed
}
static LightningDatabase OpenDatabase(LightningEnvironment environment, string dbName)
{
using var tx = environment.BeginTransaction();
var db = tx.OpenDatabase(dbName, new DatabaseConfiguration { Flags = DatabaseOpenFlags.Create });
tx.Commit(); //ensure database is created
return db;
}
static LightningEnvironment CreateFreshEnvironment(string path)
{
if (Directory.Exists(path))
{
Directory.Delete(path, true);
Console.WriteLine("Cleaned up previous directory");
}
return new LightningEnvironment(path, new EnvironmentConfiguration {
MapSize = 100_000_000,
MaxDatabases = 2
});
} Project<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net7.0</TargetFramework> <!-- other runtimes should also repro -->
<RootNamespace>lightningnet_155_minimal</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="lightningdb" Version="0.14.1" />
</ItemGroup>
</Project> |
... D:\bootstrap_projects\LightningDBErrorRepro\LightningDBErrorRepro\bin\Debug\net6.0\LightningDBErrorRepro.exe (process 3652) exited with code -1073740940. |
I'm going to try updating the version of LMDB we depend on, but I am not hopeful about there being a fix already made since I have tried searching their bug tracker and haven't seen one that matches this description. So this might take a while to resolve since it is an upstream fix. |
@AlgorithmsAreCool Sounds good. Thanks for verifying the issue and suggesting a workable alternative! |
Hi! I have done research on this issue and I think I know the root of the problem. First of all, I set up Windows error reporting (the documentation says this doesn't work for .net applications, but it's not true) to get a crash dump. This looks like cursor freed twice. Then i altered Codeprivate long freed;
private void Dispose(bool disposing)
{
if (_handle == IntPtr.Zero)
return;
if (!disposing)
throw new InvalidOperationException("The LightningCursor was not disposed and cannot be reliably dealt with from the finalizer");
Interlocked.Increment(ref freed);
mdb_cursor_close(_handle);
_handle = IntPtr.Zero;
Transaction.Disposing -= Dispose;
GC.SuppressFinalize(this);
} and collect dump again. The dump showed that the fatal call Another interesting thing is that if i dispose the cursor before committing the transaction, the application will not crash. CodeConsole.WriteLine("Hello, World!");
using var environment = CreateFreshEnvironment("C:/dev/temp");
environment.Open();
using var db = OpenDatabase(environment, "testDb");
//this loop fails after 2039 iterations
for (int i = 0; i < 5000; i++) {
ReproCoreIteration(environment, db);
Console.WriteLine($"Completed iteration {i}");
}
static void ReproCoreIteration(LightningEnvironment environment, LightningDatabase db)
{
using var tx = environment.BeginTransaction();//auto-disposed at end of scope
using (var cursor = tx.CreateCursor(db)) // <-------------------------------------------------------
{ // <----------------------------------------------------------------------------------------------
var guid = Guid.NewGuid().ToString();
var guidBytes = Encoding.UTF8.GetBytes(guid);
_ = cursor.Put(
guidBytes,
guidBytes,
CursorPutOptions.None
);
} // <----------------------------------------------------------------------------------------------
var trxResult = tx.Commit();
if (trxResult != MDBResultCode.Success)
throw new Exception("LightningDB append commit exception");
//program crashes here during transaction dispose after exactly 2039 iterations
//Note : Commit() call is successful, failure happens after commit when transaction is disposed
}
static LightningDatabase OpenDatabase(LightningEnvironment environment, string dbName)
{
using var tx = environment.BeginTransaction();
var db = tx.OpenDatabase(dbName, new DatabaseConfiguration { Flags = DatabaseOpenFlags.Create });
tx.Commit(); //ensure database is created
return db;
}
static LightningEnvironment CreateFreshEnvironment(string path)
{
if (Directory.Exists(path))
{
Directory.Delete(path, true);
Console.WriteLine("Cleaned up previous directory");
}
return new LightningEnvironment(path, new EnvironmentConfiguration {
MapSize = 100_000_000,
MaxDatabases = 2
});
} Finally, I found that the lmdb documentation on Therefore, when we use I do not know where the magic number 2039 (suspiciously close to 2048 ж) ) comes from. Perhaps this is the internal specifics of the Windows allocator. Could you guys try to dispose the cursor before committing and check if the problem is reproducing? |
To confirm your hypothesis: 1- I disposed the cursor before committing the transaction. It worked for the 5000 iterations. private void Dispose(bool disposing)
{
if (_handle == IntPtr.Zero)
return;
if (!disposing)
throw new InvalidOperationException("The LightningCursor was not disposed and cannot be reliably dealt with from the finalizer");
// If the transaction is committed the cursor is already closed
if (Transaction.State != LightningTransactionState.Commited)
{
mdb_cursor_close(_handle);
}
_handle = IntPtr.Zero;
Transaction.Disposing -= Dispose;
GC.SuppressFinalize(this);
} So I agree with your hypothesis, and this fix should solve the problem. Looking at the c code committing the transaction will already close the cursor, so the c# wrapper should not do that again. |
NB: I noticed a huge performance degradation when disposing the cursor before committing the transaction, in case that matters in your use case. |
This is interesting, I'll re-check my test code. I thought I tested the cursor disposal when I was chasing this last week, Perhaps I didn't. |
Anybody up for submitting a PR? I think @sebastienros 's approach seems reasonable. Although, I think the intention was BTW, love all the activity. |
@CoreyKaylor I think my code is correct, Lightning.NET/src/LightningDB/LightningTransaction.cs Lines 299 to 301 in 4db175a
So we need to call I checked other places and I think this could also happen when the Database is dropped and then calls
Which will then close the cursor for a connection https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c#L11113 And I don't think the State is changed so it would also invoke From what I see in other libraries, they call |
Yes, you're right. I read the comment on the line above wrong in relation to the code right below. That's probably a reasonable suggestion for this scenario to use a flag on the transaction. |
Please take a look at PR #159. I believe this should solve the problem using @sebastienros suggestion of conditional check. It also coincidentally stabilized one of the flakey tests that I couldn't figure out. |
Published version 0.15.0 that includes this change. |
This code enumerates and PUTS via cursor
If I call this Append method in a loop 3K times with about 15 messages in the enumeration it will crash the process. However, If I call it once with say 50K items it works fine. As a sanity check I ran the loop 2K times and that succeeded.
This is my first time using LightningDB/lmdb so I may be doing something wrong. There is a key/value containing a version number that is maintained/set after each 'batch' written in the same transaction. Note that the size of the DB is the same in both scenarios and the looping is all done in the same thread like this:
vs
The text was updated successfully, but these errors were encountered: