-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: avoid string to byte conversion copies on insert path #101693
Conversation
9d9327a
to
d463da3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice reduction in allocations! Maybe this should be backported to 23.1.
Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 2 of 2 files at r9, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
admittedly not well versed in the unsafe Go string/bytes stuff, but the changes look good to me except maybe the builtins
if err != nil { | ||
return nil, err | ||
} | ||
return tree.NewDBytes(tree.DBytes(res)), nil | ||
return tree.NewDBytes(tree.DBytes(encoding.UnsafeConvertBytesToString(res))), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, does this get re-used if someone uses another builtin that modifies bytes directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm DBytes are immutable based on comments here:
Most mutating consumers of DBytes do this:
infoBytes := []byte(tree.MustBeDBytes(row[1]))
Which will make a copy.
Adding @yuzefovich to double check my math on DBytes safety, my concern is that maybe the vectorized engine constructs DBytes directly from coldata.Bytes memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice reduction!
to double check my math on DBytes safety, my concern is that maybe the vectorized engine constructs DBytes directly from coldata.Bytes memory.
I don't think we have any issues there. We currently always make a copy when going from vectorized coldata.Bytes
to DBytes
datums. For example, over in vec_to_datum_gen.go
we make a copy by converting []byte
(that we get from the Bytes
vector) to DBytes
(which is a string
).
Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 2 of 2 files at r9, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @nvanbenschoten, @otan, and @rafiss)
-- commits
line 54 at r6:
nit: there is another place where we can do the same - in two spots in execgen/overloads_bin.go
after which you'll need to run ./dev gen execgen
to regenerate the code. Or perhaps leave a TODO for that.
pkg/sql/sem/builtins/builtins.go
line 1171 at r9 (raw file):
Previously, cucaroach (Tommy Reilly) wrote…
I'm DBytes are immutable based on comments here:
Most mutating consumers of DBytes do this:
infoBytes := []byte(tree.MustBeDBytes(row[1]))
Which will make a copy.
Yeah, this should be safe.
pkg/sql/sem/tree/datum.go
line 381 at r3 (raw file):
return nil, MakeParseError(s, types.Bytes, err) } return NewDBytes(DBytes(encoding.UnsafeConvertBytesToString(res))), nil
This function is defined in a later commit, so the build should fail on the third commit. This should be fixed. (Broken builds are bad because they make bisecting more annoying.)
pkg/bench/bench_test.go
line 308 at r1 (raw file):
} // runBenchmarkInsert benchmarks inserting count rows into a table.
nit: s/runBenchmarkInsert/runBenchmarkInsertLarge/
, also mention how it differs from runBenchmarkInsert
.
This benchmark shows the reducation in allocations by the copy avoidance changes. Together the changes result in: ``` name old time/op new time/op delta SQL/Cockroach/InsertLarge/count=1000-10 18.7ms ± 4% 19.1ms ±26% ~ (p=0.780 n=9+10) name old alloc/op new alloc/op delta SQL/Cockroach/InsertLarge/count=1000-10 49.7MB ± 4% 39.7MB ±10% -20.19% (p=0.000 n=9+9) name old allocs/op new allocs/op delta SQL/Cockroach/InsertLarge/count=1000-10 39.0k ± 5% 29.7k ± 8% -23.83% (p=0.000 n=10+9) ``` Release note: None Epic: None
Facilitate some copy avoidance by using []byte instead of string. Only copy when necessary in some cases. Release note: None Epic: None
Make public for use elsewhere. Release note: None Epic: None
Epic: None Release note: None
DBytes is a string under the covers but we usually operate on them with []byte APIs, avoid copies in these cases. Release note: None Epic: None
Cosmetic change to use the new UnsafeConvertBytesToString. Release note: None Epic: None
Release note: None Epic: None
Release note: None Epic: None
Most parse routines don't retain the input string pointer so we can use a pointer to the input bytes in those cases. Release note: None Epic: None
All we do here is write the bytes to a stream so we don't need a copy. Release note: None Epic: None
We copy the DBytes into a []byte and then copy the []byte into a string. Now we get a reference to the string and return that if appropriate or iterator over string bytes directly if not. This is all predicated on DBytes being immutable. Release note: None Epic: None
Epic: None Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek, @nvanbenschoten, @otan, @rafiss, and @yuzefovich)
pkg/sql/sem/tree/datum.go
line 381 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This function is defined in a later commit, so the build should fail on the third commit. This should be fixed. (Broken builds are bad because they make bisecting more annoying.)
Fixed
pkg/bench/bench_test.go
line 308 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/runBenchmarkInsert/runBenchmarkInsertLarge/
, also mention how it differs fromrunBenchmarkInsert
.
👍
Thanks for comments @yuzefovich, I addressed them and added a new commit to get the json.AsText callers. Will hold for @rafiss sign off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes looked safe to me!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek, @nvanbenschoten, @otan, and @yuzefovich)
TFTRs! |
Build succeeded: |
Originated from: #91930
Incorporate those changes and add a benchmark and add some more changes based on benchmark results.
Broken into several commits:
json: avoid copying json.AsText where possible
Epic: None
Release note: None
pgwire: avoid some string to byte copies
Most parse routines don't retain the input string pointer so we can
use a pointer to the input bytes in those cases.
Release note: None
Epic: None
builtins: avoid some []byte to string copies in decode
Release note: None
Epic: None
valueside: remove string to []byte copies
Release note: None
Epic: None
colenc: use new UnsafeConvertBytesToString
Cosmetic change to use the new UnsafeConvertBytesToString.
Release note: None
Epic: None
opt: avoid allocation in DBytes interning
DBytes is a string under the covers but we usually operate on them
with []byte APIs, avoid copies in these cases.
Release note: None
Epic: None
tree: allow access to DBytes/DString/DEncodedKey raw bytes
Epic: None
Release note: None
encoding: unsafeString -> UnsafeConvertBytesToString
Make public for use elsewhere.
Release note: None
Epic: None
lex: avoid some string to byte slice copies
Facilitate some copy avoidance by using []byte instead of string.
Only copy when necessary in some cases.
Release note: None
Epic: None
bench: add a large insert benchmark
This benchmark shows the reducation in allocations by the copy avoidance
changes. Together the changes result in:
Release note: None
Epic: None