Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce CGO calls when scanning rows #1291

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

charlievieth
Copy link
Contributor

@charlievieth charlievieth commented Nov 2, 2024

This commit improves the performance of queries by at least 15% by only collecting the information needed to convert sqlite3 values to Go values and by batching CGO calls when scanning rows (the performance improvement scales with the number of columns being scanned).

This commit adds a new coltype field to the SQLiteRows struct which stores the declared column type (either data/time or boolean) and the sqlite3 datatype. Previously, this library would fetch the string representation of each column, which is inefficient and rarely needed since the non-standard SQLiteRows.DeclTypes method is rarely called.

It also changes the benchmark suite to use an in-memory database since we do not want the file system interfering with benchmark results.

goos: darwin
goarch: arm64
pkg: github.com/mattn/go-sqlite3
cpu: Apple M1 Max
                             │ base.10.txt  │             new.10.txt              │
                             │    sec/op    │   sec/op     vs base                │
CustomFunctions-10              3.318µ ± 2%   3.115µ ± 2%   -6.10% (p=0.000 n=10)
Suite/BenchmarkExec-10          1.236µ ± 1%   1.240µ ± 2%        ~ (p=0.617 n=10)
Suite/BenchmarkQuery-10         4.004µ ± 7%   3.363µ ± 2%  -16.02% (p=0.000 n=10)
Suite/BenchmarkParams-10        4.241µ ± 1%   3.758µ ± 2%  -11.40% (p=0.000 n=10)
Suite/BenchmarkStmt-10          2.830µ ± 0%   2.378µ ± 2%  -15.97% (p=0.000 n=10)
Suite/BenchmarkRows-10          126.3µ ± 1%   101.3µ ± 1%  -19.79% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10      124.9µ ± 1%   100.5µ ± 2%  -19.56% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10   10.130µ ± 0%   7.042µ ± 1%  -30.48% (p=0.000 n=10)
geomean                         8.655µ        7.328µ       -15.33%

                             │ base.10.txt  │               new.10.txt               │
                             │     B/op     │     B/op      vs base                  │
CustomFunctions-10               568.0 ± 0%     576.0 ± 0%   +1.41% (p=0.000 n=10)
Suite/BenchmarkExec-10           128.0 ± 0%     128.0 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkQuery-10          688.0 ± 0%     648.0 ± 0%   -5.81% (p=0.000 n=10)
Suite/BenchmarkParams-10       1.078Ki ± 0%   1.031Ki ± 0%   -4.35% (p=0.000 n=10)
Suite/BenchmarkStmt-10           920.0 ± 0%     872.0 ± 0%   -5.22% (p=0.000 n=10)
Suite/BenchmarkRows-10         9.305Ki ± 0%   9.188Ki ± 0%   -1.26% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10     9.289Ki ± 0%   9.164Ki ± 0%   -1.35% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10     992.0 ± 0%     696.0 ± 0%  -29.84% (p=0.000 n=10)
geomean                        1.181Ki        1.106Ki        -6.35%
¹ all samples are equal

                             │ base.10.txt │              new.10.txt              │
                             │  allocs/op  │ allocs/op   vs base                  │
CustomFunctions-10              18.00 ± 0%   18.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkExec-10          7.000 ± 0%   7.000 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkQuery-10         23.00 ± 0%   23.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkParams-10        27.00 ± 0%   27.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkStmt-10          25.00 ± 0%   25.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkRows-10          525.0 ± 0%   519.0 ± 0%   -1.14% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10      524.0 ± 0%   518.0 ± 0%   -1.15% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10    39.00 ± 0%   19.00 ± 0%  -51.28% (p=0.000 n=10)
geomean                         46.26        42.17        -8.86%
¹ all samples are equal

@charlievieth charlievieth changed the title reduce CGO calls when scanning rows Improve query scanning performance by 20% Nov 2, 2024
@charlievieth charlievieth changed the title Improve query scanning performance by 20% reduce CGO calls when scanning rows and improve query performance Nov 2, 2024
@charlievieth charlievieth changed the title reduce CGO calls when scanning rows and improve query performance improve query performance by +15% by reducing CGO calls Nov 2, 2024
@charlievieth charlievieth changed the title improve query performance by +15% by reducing CGO calls reduce CGO calls when scanning rows Nov 2, 2024
@charlievieth
Copy link
Contributor Author

charlievieth commented Nov 2, 2024

Commit aa590cf addresses an issue with icu4c on macos that was caused by a recent change to the formula's name and alias: Homebrew/homebrew-core#169239 (comment)

Edit: this PR now includes #1294 since it fixes the CIFuzz and macOS tests.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.

Project coverage is 47.14%. Comparing base (18cdded) to head (fd9d096).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
sqlite3.go 78.94% 2 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1291      +/-   ##
==========================================
- Coverage   47.16%   47.14%   -0.02%     
==========================================
  Files          12       12              
  Lines        1533     1542       +9     
==========================================
+ Hits          723      727       +4     
- Misses        669      672       +3     
- Partials      141      143       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@charlievieth charlievieth force-pushed the cev/col-perf branch 2 times, most recently from c65c946 to fb4206b Compare November 5, 2024 18:09
@charlievieth
Copy link
Contributor Author

Note: this PR includes #1294 since it fixes the CIFuzz and macOS tests.

@charlievieth
Copy link
Contributor Author

@mattn or @rittneje would either of you have some spare cycles to review this PR?

sqlite3.go Outdated Show resolved Hide resolved
// _sqlite3_column_decltypes stores the declared column type in the typs array.
// This function must always be called before _sqlite3_column_types since it
// overwrites the datatype.
static void _sqlite3_column_decltypes(sqlite3_stmt* stmt, uint8_t *typs, int ntyps) {
Copy link
Collaborator

@rittneje rittneje Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it somewhat confusing that this ends up using one integer to contain both the derived declared type and also the real column type.

How much savings would there be relative to today if instead of doing this, we just added a function like static int _sqlite3_column_derived_decltype(sqlite3_stmt* stmt,int iCol)? Then that could return values from your "enum" (0, GO_SQLITE3_DECL_BOOL, or GO_SQLITE3_DECL_DATE), and would not need any bitmasking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it somewhat confusing that this ends up using one integer to contain both the derived declared type and also the real column type.

Yeah, it's a bit tricky. I originally used two slices to store the derived declared type and real type separately, but switching to using one slice and a the bit mask since it saves an alloc.

How much savings would there be relative to today if instead of doing this, we just added a function like static int _sqlite3_column_derived_decltype(sqlite3_stmt* stmt,int iCol)? Then that could return values from your "enum" (0, GO_SQLITE3_DECL_BOOL, or GO_SQLITE3_DECL_DATE), and would not need any bitmasking.

Most of the performance improvement here comes from batching the CGO calls needed to get the column's declared and real type (basically, the performance improvement comes from reducing the number of CGO calls - see BenchmarkStmt10Cols). So going back to having to call a CGO function for each column would negate most of the performance improvements here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow up on my above comment, BenchmarkStmt10Cols shows how batching CGO calls instead of performing them for each column improves performance (here we see a 30% perf improvement, which roughly scales with the number of columns being scanned).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much savings would there be relative to today if instead of doing this, we just added a function like static int _sqlite3_column_derived_decltype(sqlite3_stmt* stmt,int iCol)? Then that could return values from your "enum" (0, GO_SQLITE3_DECL_BOOL, or GO_SQLITE3_DECL_DATE), and would not need any bitmasking.

Just to clarify, my above comments assume that you're suggesting we call this new function in the per-column loop in bind (that is we call it for each column instead of batching the calls). If that is the case then it would negate most of the performance gains here. If you meant something else could you elaborate (just want to make sure we're on the same page here)?

This commit improves the performance of queries by at least 20% by only
collecting the information needed to convert sqlite3 values to Go values
and by batching CGO calls when scanning rows (the performance
improvement scales with the number of columns being scanned).

This commit adds a new coltype field to the SQLiteRows struct which
stores the declared column type (either data/time or boolean) and the
sqlite3 datatype. Previously, this library would fetch the string
representation of each column, which is inefficient and rarely needed
since the non-standard SQLiteRows.DeclTypes method is rarely called.

It also changes the benchmark suite to use an in-memory database since
we do not want the file system interfering with benchmark results.

goos: darwin
goarch: arm64
pkg: github.com/mattn/go-sqlite3
cpu: Apple M1 Max
                             │ base.10.txt  │             new.10.txt              │
                             │    sec/op    │   sec/op     vs base                │
CustomFunctions-10              3.318µ ± 2%   3.115µ ± 2%   -6.10% (p=0.000 n=10)
Suite/BenchmarkExec-10          1.236µ ± 1%   1.240µ ± 2%        ~ (p=0.617 n=10)
Suite/BenchmarkQuery-10         4.004µ ± 7%   3.363µ ± 2%  -16.02% (p=0.000 n=10)
Suite/BenchmarkParams-10        4.241µ ± 1%   3.758µ ± 2%  -11.40% (p=0.000 n=10)
Suite/BenchmarkStmt-10          2.830µ ± 0%   2.378µ ± 2%  -15.97% (p=0.000 n=10)
Suite/BenchmarkRows-10          126.3µ ± 1%   101.3µ ± 1%  -19.79% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10      124.9µ ± 1%   100.5µ ± 2%  -19.56% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10   10.130µ ± 0%   7.042µ ± 1%  -30.48% (p=0.000 n=10)
geomean                         8.655µ        7.328µ       -15.33%

                             │ base.10.txt  │               new.10.txt               │
                             │     B/op     │     B/op      vs base                  │
CustomFunctions-10               568.0 ± 0%     576.0 ± 0%   +1.41% (p=0.000 n=10)
Suite/BenchmarkExec-10           128.0 ± 0%     128.0 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkQuery-10          688.0 ± 0%     648.0 ± 0%   -5.81% (p=0.000 n=10)
Suite/BenchmarkParams-10       1.078Ki ± 0%   1.031Ki ± 0%   -4.35% (p=0.000 n=10)
Suite/BenchmarkStmt-10           920.0 ± 0%     872.0 ± 0%   -5.22% (p=0.000 n=10)
Suite/BenchmarkRows-10         9.305Ki ± 0%   9.188Ki ± 0%   -1.26% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10     9.289Ki ± 0%   9.164Ki ± 0%   -1.35% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10     992.0 ± 0%     696.0 ± 0%  -29.84% (p=0.000 n=10)
geomean                        1.181Ki        1.106Ki        -6.35%
¹ all samples are equal

                             │ base.10.txt │              new.10.txt              │
                             │  allocs/op  │ allocs/op   vs base                  │
CustomFunctions-10              18.00 ± 0%   18.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkExec-10          7.000 ± 0%   7.000 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkQuery-10         23.00 ± 0%   23.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkParams-10        27.00 ± 0%   27.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkStmt-10          25.00 ± 0%   25.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkRows-10          525.0 ± 0%   519.0 ± 0%   -1.14% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10      524.0 ± 0%   518.0 ± 0%   -1.15% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10    39.00 ± 0%   19.00 ± 0%  -51.28% (p=0.000 n=10)
geomean                         46.26        42.17        -8.86%
¹ all samples are equal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants