-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a busy_handler_timeout setter #443
Add a busy_handler_timeout setter #443
Conversation
@suwyn Do you happen to have a better alternative name for this? I know that you have been thinking about this problem space for a while, so I imagine you might have some language around distinguishing a Ruby |
Oh boy, naming is hard! Maybe |
I like that. I'll switch to that |
@fractaledmind Also consider we'll want to configure this from the database config file so adding it as an option would be beneficial. Here is an example from #426 Default of |
Also noting for prosperity why this is needed. The underlining issue with the interpreter is documented here. From my experiments there is a lot of speed to be gained if there was a way to fix the underlining issue. |
This will be set in Rails the same way that |
Ignore any CI failures for "native packaging" with "head", that's only because the HEAD version rolled over to 3.4.dev (you should be able to fix with a rebase now that #447 has been merged) |
…e GVL between connection retries, but also errors after the timeout passes
Don't use a modulo and pre-compute the timeout_deadline
9179ede
to
17c9db8
Compare
@djmb @rosa: I was just reading thru the SolidQueue source as I work on integrating it into a project I have at work. I came across this commit that patches Rails' SQLite3 adapter to improve concurrency. Reading the commit description, it sounds like you have had similar experiences as I have working to get SQLite to handle concurrency in a reasonable and resilient way. I note that in your comment and patch, you lean on the
After working with the I believe your perspective could help ensure that both this PR and the larger plan for Rails' SQLite support are as solid as possible. |
@flavorjones @byroot: Added two simple tests focusing on the GVL blocking difference between |
lib/sqlite3/database.rb
Outdated
# while SQLite sleeps and retries. | ||
def busy_handler_timeout=( milliseconds ) | ||
timeout_seconds = milliseconds.fdiv(1000) | ||
timeout_deadline = Process.clock_gettime(Process::CLOCK_MONOTONIC) + timeout_seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fractaledmind Looking at the code again, I believe there is a bug here.
This is setting a clock based timeout deadline at the time busy_handler_timeout
is set rather than when the busy_handler gets invoked for the first time. Essentially setting a hard timeout for the timeout_deadline
for all invocations of the busy handler.
Consider the following (I didn't test it but I believe this should show the issue, if I am reading the code correctly)
busy_handler_timeout_db = SQLite3::Database.open( "test.db" )
busy_handler_timeout_db.busy_handler_timeout = 1000
sleep 1.5
# timeout_deadline should be expired now since that was set
# at the point in time when we set busy_handler_timeout to 1000.
# Any invocations of the busy handler will now throw busy errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. Great catch. Will fix when I'm back at my laptop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better @fractaledmind
It still has me wondering if it's thread safe though since it is using an instance variable. If two threads used the same connection, you'd have an issue. I did some experimenting in Rails and the connection pool appears to be thread safe (as it states in the docs) .
Experimenting with the gem on its own, I wasn't able quickly able to cause any issue. SQLite doesnt have a way to sleep so I stopped short of writing a long running UDF.
I think the ideal solution is to either bake the busy handler into the C code or pass the time of first invocation into the handler from C, like it does with count. I'd obviously strongly prefer the former, but have no idea on what that would take and outside of the scope of this PR.
So short of using the count to estimate the clock time (e.g. the (count * RETRY_INTERVAL) > timeout_seconds
approach) instead of the clock time we'll just have to know that a race condition probably exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suwyn: You should join this Discord server where we talk a lot about the SQLite ecosystem in Ruby: https://discord.gg/ehdDh5C4
That would allow us to connect more easily and even pair on this. I'd love some help on writing useful tests. I have a long running UDF already written:
WITH RECURSIVE r(i) AS (
VALUES(0)
UNION ALL
SELECT i FROM r
LIMIT 10000000
)
SELECT i FROM r ORDER BY i LIMIT 1;
What I don't have are resilient and expansive tests. Want to work on getting this over the line together?
Hi @fractaledmind! The issue with using retries without a backoff was that we would still get I've just tried it out with no sleep and to avoid the BusyExceptions, I need to set the retries to about 1,000,000, at which point it seems to hang. I don't know exactly what the mechanism is there though - whether the thread holding the lock is actually blocked or if it is being starved.
We only used It seems likely we will still need a sleep in the handler to avoid the issues we've been seeing, but it would be good to know why. I'll see if I can debug what's going on in Solid Queue myself. It's tricky because the I/O of logging anything from the handler seems to have the same effect as a sleep and the BusyExceptions go away. If you want to dig about yourself, the patch for sleeping is to allow the tests to pass, you can run them with |
elsif now > @timeout_deadline | ||
next false | ||
else | ||
sleep(0.001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sleep necessary? I think the VM is interruptible on any method call so the calls to clock_gettime etc should do the trick (I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenderlove it's not about being interruptible, it's about forcibly releasing the GVL. The assumption is that the client that is currently holding the SQLite3 write lock might be another thread in the same process, hence we should try to switch back to it rather than busy loop for 100ms until the thread scheduler quantum is reached.
And even if it's not in that process, yielding the GVL allow other unrelated threads to proceed.
NB: for the "in same process case" a shared Ruby mutex would be way more efficient, but we'd need some way to tell two clients are pointing that the same database, hence should share a single Mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is that the client that is currently holding the SQLite3 write lock might be another thread in the same process, hence we should try to switch back to it
I see. In that case wouldn't Thread.pass
also do the trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would yes, but then the risk is that there's no other ready thread, causing the process to essentially be busy looping, which would pin one CPU to 100% and may not be desirable. A very short sleep actually make sense IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the patch if @byroot is OK with it. I had one question wrt the sleep
(but including the sleep is fine if it's necessary)
@fractaledmind this is causing failures in CI on windows
see e.g. https://github.com/sparklemotion/sqlite3-ruby/actions/runs/7403622873/job/20143757496 Can you please take a look? Or let me know if you'd prefer me to revert. |
I have been trying to debug, but can't reproduce. I am now trying to research a more direct way to test "holds GVL" vs "releases GVL" for these two. I opened a PR with a fix that grounds the tests in the same kind of setup that was used in the other tests, which I am confident is more deterministic: #456 On a related note, is there a way to ensure local tests are running with current head version of SQLite? It seems my tests run against my default macOS version of SQLite. |
By default this should not happen. Try: bundle exec rake clean clobber
bundle exec rake compile
bundle exec rake test If you're still using the system libraries, open a new issue and attach
|
…ndler-timeout-for-now Revert pull request #443 from fractaledmind/lock-wait-timeout
One of the largest pain-points when using SQLite in a Rails app is the limited concurrency support. While SQLite does support only one writer, from a Rails app's point-of-view this does not mean that that app simply cannot run in, for example, Puma clustered mode. As I have detailed, the primary issues are that the GVL isn't released while the SQLite
busy_timeout
C function is running and transactions are run inDEFERRED
mode.We can solve the first of these by providing a Ruby
busy_handler
callback that will release the GVL between retries. This allows multiple threads to naturally coordinate and "queue up" to acquire SQLite's single write lock.I initially proposed providing this in the Rails codebase (see: rails/rails#50370); however, it was suggested that this functionality more naturally belongs in this codebase. So, I would love to start brainstorming the interface for this. Once built and released, we can then update Rails to use this instead of the
busy_timeout
.Jean Boussier suggested
lock_wait_timeout
as a name. I'm opening this PR to get the ball rolling. I'm open to naming suggestions as well as test suggestions.