Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve perf by compiling with no locking and allowing users to configure it #59

Closed
headius opened this issue Oct 16, 2015 · 29 comments
Closed
Assignees
Milestone

Comments

@headius
Copy link
Contributor

headius commented Oct 16, 2015

We were investigating a concurrent performance issue with JRuby at jruby/jruby#3398 and after a few hours of investigation figured out that sqlite-jdbc is compiling sqlite with the default configuration of always threadsafe. Unfortunately it seems like that default mode does not treat separate database connections (to separate databases) as their own mutexes. As a result, parallel performance did not scale properly even when inserting into completely separate databases.

The fix, which is trivial, was to compile sqlite with SQLITE_THREADSAFE=0.

diff --git a/Makefile b/Makefile
index de1f1d3..31cd2d5 100644
--- a/Makefile
+++ b/Makefile
@@ -62,6 +62,7 @@ $(SQLITE_OUT)/sqlite3.o : $(SQLITE_UNPACKED)
            -DSQLITE_ENABLE_FTS3_PARENTHESIS \
            -DSQLITE_ENABLE_RTREE \
            -DSQLITE_ENABLE_STAT2 \
+           -DSQLITE_THREADSAFE=0 \
            $(SQLITE_FLAGS) \
            $(SQLITE_OUT)/sqlite3.c

Obviously this is not sufficient for the library. The full fix would be to build sqlite this way but when opening connections default to a thread-safe setup. As it turns out, sqlite can upgrade connections to be more threadsafe, but it can't downgrade them. The current configuration sets them by default to serialize all accesses.

The full doco for this is here: https://www.sqlite.org/threadsafe.html

Basically, sqlite-jdbc needs to configure the library to be thread-unsafe as a minimum, and then by default specify thread-safety when opening the connection. Users can then specify configuration parameters to open the connection as thread-unsafe (single thread access) giving them better concurrency if inserting into isolated databases.

I don't have a full patch for sqlite-jdbc, but I wanted to start the dialog now. We have a large number of JRuby users hitting sqlite via your library in heavily concurrent applications, and this would give them a better chance of having good scaling characteristics.

@xerial
Copy link
Owner

xerial commented Oct 16, 2015

That's a good catch. OK. I'll create a non-thread safe version only for Win/Mac/Linux (x86, x86_64). We need to wait a contribution for the other platforms.

@xerial
Copy link
Owner

xerial commented Oct 16, 2015

Uploaded a snapshot jar: https://oss.sonatype.org/content/repositories/snapshots/org/xerial/sqlite-jdbc/3.9.0-SNAPSHOT/, which is built with SQLITE_THREADSAFE=0

We also need to add a configuration method, which calls sqlite3_config(flags) function in sqlite3 to select thread-safe mode.

@xerial
Copy link
Owner

xerial commented Oct 16, 2015

@headius By the way, did you try -DSQLITE_THREADSAFE=2 (multi-thread) mode? If we use SQLITE_THREADSAFE=0 (single thread) at compile-time, we cannot make it thread-safe with sqlite3_config.

If -DSQLITE_THREADSAFE=2 works efficiently, I would like to use this option.

@headius
Copy link
Contributor Author

headius commented Oct 16, 2015

@xerial I will give it a try on the benchmark from jruby/jruby#3398.

@headius
Copy link
Contributor Author

headius commented Oct 16, 2015

@xerial Unfortunately, SQLITE_THREADSAFE=2 seems to have the same issues as =1 and the default mode (which I believe is also =1). It doesn't seem right that sqlite blocks all threads regardless of what database they're connecting to, but that seems to be the case.

@xerial
Copy link
Owner

xerial commented Oct 16, 2015

Thanks for testing. Then we should build multiple-versions of native libraries with different compilation options. The default should use a thread safe library, and by some configuration we need to be able to use non thread-safe version.

A possible approach is using a JVM system property (e.g., -Dsqlite-jdbc.threadsafe=false), then load native library built with -DSQLITE_THREADSAFE=0. Do you think does it work for jruby?

@headius
Copy link
Contributor Author

headius commented Oct 18, 2015

A property would work but the problem with properties is making sure they get set early enough. A typical JRuby user just runs a Ruby script, which loads all the libraries needed. Those libraries would need a way to set the property such that it's picked up by the driver.

Can we make it a parameter of the connection?

@xerial
Copy link
Owner

xerial commented Oct 20, 2015

Once the native code is loaded by JVM, we have no way to unload it. So the connection parameter would work only for the first time, and in the second time we need to use the same mode (thread-safe or unsafe) with the first connection, but this behavior might be acceptable since an application should not mix multiple running modes.

@headius
Copy link
Contributor Author

headius commented Oct 20, 2015

@xerial Ahh yes, I understand. I have another suggestion that may be too complex to support...two libraries.

If we had two jni libraries with different symbols and different thread-safe modes, it would simply be a matter of calling the correct one. So, two JNI interfaces, two C files, two native libraries.

I can understand if that's too much change to introduce. The property will work ok, but we'll have to document it well (somewhere in JRuby or related libraries) so people aren't constantly reporting performance bugs to us.

@headius
Copy link
Contributor Author

headius commented Oct 20, 2015

@xerial I think the property will be fine. My current mystery is why other Ruby implementations do not seem to have this threading bottleneck with the sqlite3-ruby gem.

@headius
Copy link
Contributor Author

headius commented Oct 20, 2015

I have sent an email to the sqlite-users list to ask about threadsafety and why we might be seeing serialized performance regardless of mode or database independence.

@xerial
Copy link
Owner

xerial commented Oct 20, 2015

I'm reluctant to provide two types of API (because it increases the maintenance cost), rather I would choose to prepare two types of packages: sqlite-jdbd.jar and sqlite-jdbc-singlethread.jar.

@headius
Copy link
Contributor Author

headius commented Oct 22, 2015

@xerial the two package approach is actually might be the best. All jars in a typical JRuby application get loaded dynamically at runtime rather than statically on a classpath.

@xerial xerial self-assigned this Oct 26, 2015
@xerial
Copy link
Owner

xerial commented Oct 26, 2015

OK. I'll tweak packaging scripts so that we can prepare two types of binaries (normal jar and single-thread one).

@xerial xerial modified the milestone: 3.9.1 Oct 26, 2015
@headius
Copy link
Contributor Author

headius commented Nov 23, 2015

28 days later... :-)

Anything I can do to help you with this?

@patcheng
Copy link
Contributor

patcheng commented Dec 5, 2015

Alternatively, you could change SQLITE_DEFAULT_MEMSTATUS
https://www.sqlite.org/compile.html#default_memstatus

Inside sqlite.c, there is a comment saying:

** SQLite holds the [SQLITE_MUTEX_STATIC_MASTER] mutex when it invokes
** the xInit method, so the xInit method need not be threadsafe.  The
** xShutdown method is only called from [sqlite3_shutdown()] so it does
** not need to be threadsafe either.  For all other methods, SQLite
** holds the [SQLITE_MUTEX_STATIC_MEM] mutex as long as the
** [SQLITE_CONFIG_MEMSTATUS] configuration option is turned on (which
** it is by default) and so the methods are automatically serialized.
** However, if [SQLITE_CONFIG_MEMSTATUS] is disabled, then the other
** methods must be threadsafe or else make their own arrangements for
** serialization.

@patcheng
Copy link
Contributor

patcheng commented Dec 5, 2015

@headius
using this test https://gist.github.com/patcheng/324f50fd360bc2e3486a (based to the file from JRuby's Issue. Make it closer to I think the version you were using.

I am using stock JRuby 9.0.4.0 and sqlite-jdbc from master.

        -DSQLITE_DEFAULT_MEMSTATUS=1 \
        -DSQLITE_THREADSAFE=1 \
insert rows              5.410000   0.140000   5.550000 (  1.732364)
insert rows              4.560000   0.100000   4.660000 (  1.577796)
insert rows              1.290000   0.050000   1.340000 (  1.198922)
insert rows              1.170000   0.050000   1.220000 (  1.170728)
insert rows              1.220000   0.050000   1.270000 (  1.201040)
insert rows              1.160000   0.050000   1.210000 (  1.193106)

        -DSQLITE_DEFAULT_MEMSTATUS=1 \
        -DSQLITE_THREADSAFE=0 \
                             user     system      total        real
insert rows              5.570000   0.160000   5.730000 (  1.753490)
insert rows              3.950000   0.090000   4.040000 (  1.377097)
insert rows              1.230000   0.050000   1.280000 (  1.156456)
insert rows              1.140000   0.050000   1.190000 (  1.142759)
insert rows              1.140000   0.050000   1.190000 (  1.130870)
insert rows              1.140000   0.040000   1.180000 (  1.145773)

        -DSQLITE_DEFAULT_MEMSTATUS=0 \
        -DSQLITE_THREADSAFE=0 \

                             user     system      total        real
insert rows              5.080000   0.120000   5.200000 (  1.570296)
insert rows              3.990000   0.110000   4.100000 (  1.380693)
insert rows              1.230000   0.050000   1.280000 (  1.074259)
insert rows              1.260000   0.050000   1.310000 (  1.027860)
insert rows              1.130000   0.060000   1.190000 (  1.083451)
insert rows              1.080000   0.050000   1.130000 (  1.062150)


        -DSQLITE_DEFAULT_MEMSTATUS=0 \
        -DSQLITE_THREADSAFE=2 \
                             user     system      total        real
insert rows              4.900000   0.130000   5.030000 (  1.569652)
insert rows              3.870000   0.090000   3.960000 (  1.441974)
insert rows              1.250000   0.040000   1.290000 (  1.086575)
insert rows              1.050000   0.050000   1.100000 (  1.069246)
insert rows              1.260000   0.060000   1.320000 (  1.091227)
insert rows              1.050000   0.040000   1.090000 (  1.090663)

@patcheng
Copy link
Contributor

patcheng commented Dec 5, 2015

For whatever reason, on my machine, the JRuby test didn't show as much difference between all of them.

I did a basic test: https://gist.github.com/patcheng/28580eed030361dc0f90

This show better results:

        -DSQLITE_DEFAULT_MEMSTATUS=1 \
        -DSQLITE_THREADSAFE=0 \
Threads: 1 Time: 676 ms
Threads: 4 Time: 675 ms
Threads: 8 Time: 1309 ms


        -DSQLITE_DEFAULT_MEMSTATUS=1 \
        -DSQLITE_THREADSAFE=1 \
Threads: 1 Time: 782 ms
Threads: 4 Time: 19141 ms
Threads: 8 Time: 38252 ms

        -DSQLITE_DEFAULT_MEMSTATUS=1 \
        -DSQLITE_THREADSAFE=2 \
Threads: 1 Time: 798 ms
Threads: 4 Time: 17880 ms
Threads: 8 Time: 34578 ms


        -DSQLITE_DEFAULT_MEMSTATUS=0 \
        -DSQLITE_THREADSAFE=2 \
Threads: 1 Time: 667 ms
Threads: 4 Time: 571 ms
Threads: 8 Time: 1198 ms

        -DSQLITE_DEFAULT_MEMSTATUS=0 \
        -DSQLITE_THREADSAFE=1 \
Threads: 1 Time: 672 ms
Threads: 4 Time: 639 ms
Threads: 8 Time: 1246 ms

patcheng added a commit to patcheng/sqlite-jdbc that referenced this issue Dec 5, 2015
xerial#59

disabled memstatus and make the library thread-safe
@headius
Copy link
Contributor Author

headius commented Jan 8, 2016

How many cores do you run on?

@mkauf
Copy link
Contributor

mkauf commented Jan 8, 2016

I think that commit a6a559e is bogus. If multiple connections are created to the same database file, the database may get corrupted if SQLite has been compiled with -DSQLITE_THREADSAFE=0.

I propose to use -DSQLITE_THREADSAFE=2, it is faster than the default and it is safe to use within sqlite-jdbc, because the native methods in NativeDB.java are declared using synchronized.

@patcheng
Copy link
Contributor

patcheng commented Jan 8, 2016

@headius My tests were performed on a MacBook Pro with Core i7 2.8 Hz. 4 real cores, and 4 hyperthreaded cores.

@mkauf I think it's safer to stick with -DSQLITE_THREADSAFE=1. With -DSQLITE_THREADSAFE=2, accessing the same database connection from multiple threads would cause an explicit error (as oppose to slower). I ran into that when I was doing an naive implementation for the test.

When compiling SQLITE_THREADSAFE using 1 and 2 , the behavior can be modified at run-time by specifying SQLITE_OPEN_FULLMUTEX flag in sqlite3_open_v2(). see https://www.sqlite.org/threadsafe.html

@headius
Copy link
Contributor Author

headius commented Mar 3, 2016

What's the status of this? It looks like either mode 1 or 2 will work with SQLITE_DEFAULT_MEMSTATUS=0, but I don't know what that flag does. If we can go with a single binary, that's obviously best, so it's probably ok to go with SQLITE_THREADSAFE=1 and memstatus=0?

fpuga pushed a commit to cartolab/sqlite-jdbc that referenced this issue May 5, 2016
@headius
Copy link
Contributor Author

headius commented Jun 10, 2016

@xerial We are still getting reports of slow performance on JRuby, and would really like to help you get this out. It sounds like @patcheng has come up with the right magic for safe speed and configurable behavior at runtime. What can I do to help you get a release out with these changes?

@headius
Copy link
Contributor Author

headius commented Jun 10, 2016

@patcheng Can you turn your changes into a PR? I'd like to see the full patch and have some of our users test it out.

@headius
Copy link
Contributor Author

headius commented Jun 10, 2016

@patcheng Nevermind, I realized it was trivial :-)

diff --git a/Makefile b/Makefile
index de1f1d3..df440a4 100644
--- a/Makefile
+++ b/Makefile
@@ -62,6 +62,8 @@ $(SQLITE_OUT)/sqlite3.o : $(SQLITE_UNPACKED)
            -DSQLITE_ENABLE_FTS3_PARENTHESIS \
            -DSQLITE_ENABLE_RTREE \
            -DSQLITE_ENABLE_STAT2 \
+           -DSQLITE_THREADSAFE=1 \
+           -DSQLITE_DEFAULT_MEMSTATUS=0 \
            $(SQLITE_FLAGS) \
            $(SQLITE_OUT)/sqlite3.c

@chuckremes Maybe you could build the driver with this patch and see how perf looks for you? I'd like to put this one to bed.

@chuckremes
Copy link

@headius I will try to do so today.

All I can say is "wow" when looking at all the followup you have performed on this issue since Oct 2015. Thank you very much.

@chuckremes
Copy link

Testing this patched version in a production app right now. I'll let you know what kind of performance increase I see. Last run with un-patched driver took 8 hours and 50 minutes. A good chunk of that was DB access.

@chuckremes
Copy link

Surprisingly, the latest run just finished. It took 59m 56s to complete. So this change basically shaved 8 hours off of a 9 hour run.

@headius
Copy link
Contributor Author

headius commented Jun 10, 2016

So this change basically shaved 8 hours off of a 9 hour run.

Well I'm convinced! How about you, @xerial?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants