Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: avoid string to byte conversion copies on insert path #101693

Merged
merged 12 commits into from
May 31, 2023

Conversation

cucaroach
Copy link
Contributor

@cucaroach cucaroach commented Apr 17, 2023

Originated from: #91930

Incorporate those changes and add a benchmark and add some more changes based on benchmark results.

Broken into several commits:

json: avoid copying json.AsText where possible

Epic: None
Release note: None

pgwire: avoid some string to byte copies

Most parse routines don't retain the input string pointer so we can
use a pointer to the input bytes in those cases.

Release note: None
Epic: None

builtins: avoid some []byte to string copies in decode

Release note: None
Epic: None

valueside: remove string to []byte copies

Release note: None
Epic: None

colenc: use new UnsafeConvertBytesToString

Cosmetic change to use the new UnsafeConvertBytesToString.

Release note: None
Epic: None

opt: avoid allocation in DBytes interning

DBytes is a string under the covers but we usually operate on them
with []byte APIs, avoid copies in these cases.

Release note: None
Epic: None

tree: allow access to DBytes/DString/DEncodedKey raw bytes

Epic: None
Release note: None

encoding: unsafeString -> UnsafeConvertBytesToString

Make public for use elsewhere.

Release note: None
Epic: None

lex: avoid some string to byte slice copies

Facilitate some copy avoidance by using []byte instead of string.
Only copy when necessary in some cases.

Release note: None
Epic: None

bench: add a large insert benchmark

This benchmark shows the reducation in allocations by the copy avoidance
changes. Together the changes result in:

name                                     old time/op    new time/op    delta
SQL/Cockroach/InsertLarge/count=1000-10    18.7ms ± 4%    19.1ms ±26%     ~     (p=0.780 n=9+10)

name                                     old alloc/op   new alloc/op   delta
SQL/Cockroach/InsertLarge/count=1000-10    49.7MB ± 4%    39.7MB ±10%  -20.19%  (p=0.000 n=9+9)

name                                     old allocs/op  new allocs/op  delta
SQL/Cockroach/InsertLarge/count=1000-10     39.0k ± 5%     29.7k ± 8%  -23.83%  (p=0.000 n=10+9)

Release note: None
Epic: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cucaroach cucaroach changed the title avoid copy sql: avoid string to byte conversion copies on insert path Apr 17, 2023
@cucaroach cucaroach force-pushed the avoid-copy branch 2 times, most recently from 9d9327a to d463da3 Compare April 17, 2023 23:07
@cucaroach cucaroach marked this pull request as ready for review April 18, 2023 00:12
@cucaroach cucaroach requested a review from a team April 18, 2023 00:12
@cucaroach cucaroach requested review from a team as code owners April 18, 2023 00:12
@cucaroach cucaroach requested review from msirek and rafiss April 18, 2023 00:12
Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice reduction in allocations! Maybe this should be backported to 23.1.
:lgtm:

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 2 of 2 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)

@cucaroach
Copy link
Contributor Author

cucaroach commented Apr 18, 2023

Thanks @msirek! I'm gonna wait for @rafiss @otan review as well for the pgwire and lex bits.

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admittedly not well versed in the unsafe Go string/bytes stuff, but the changes look good to me except maybe the builtins

if err != nil {
return nil, err
}
return tree.NewDBytes(tree.DBytes(res)), nil
return tree.NewDBytes(tree.DBytes(encoding.UnsafeConvertBytesToString(res))), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, does this get re-used if someone uses another builtin that modifies bytes directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm DBytes are immutable based on comments here:

https://github.com/cucaroach/cockroach/blob/d463da3957254cbcd3a0f9d57f1c327c1a51948b/pkg/sql/sem/tree/datum.go#L1503

Most mutating consumers of DBytes do this:

	infoBytes := []byte(tree.MustBeDBytes(row[1]))

Which will make a copy.

@cucaroach cucaroach requested a review from yuzefovich April 24, 2023 12:57
@cucaroach
Copy link
Contributor Author

Adding @yuzefovich to double check my math on DBytes safety, my concern is that maybe the vectorized engine constructs DBytes directly from coldata.Bytes memory.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice reduction!

to double check my math on DBytes safety, my concern is that maybe the vectorized engine constructs DBytes directly from coldata.Bytes memory.

I don't think we have any issues there. We currently always make a copy when going from vectorized coldata.Bytes to DBytes datums. For example, over in vec_to_datum_gen.go we make a copy by converting []byte (that we get from the Bytes vector) to DBytes (which is a string).

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 2 of 2 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @nvanbenschoten, @otan, and @rafiss)


-- commits line 54 at r6:
nit: there is another place where we can do the same - in two spots in execgen/overloads_bin.go after which you'll need to run ./dev gen execgen to regenerate the code. Or perhaps leave a TODO for that.


pkg/sql/sem/builtins/builtins.go line 1171 at r9 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

I'm DBytes are immutable based on comments here:

https://github.com/cucaroach/cockroach/blob/d463da3957254cbcd3a0f9d57f1c327c1a51948b/pkg/sql/sem/tree/datum.go#L1503

Most mutating consumers of DBytes do this:

	infoBytes := []byte(tree.MustBeDBytes(row[1]))

Which will make a copy.

Yeah, this should be safe.


pkg/sql/sem/tree/datum.go line 381 at r3 (raw file):

		return nil, MakeParseError(s, types.Bytes, err)
	}
	return NewDBytes(DBytes(encoding.UnsafeConvertBytesToString(res))), nil

This function is defined in a later commit, so the build should fail on the third commit. This should be fixed. (Broken builds are bad because they make bisecting more annoying.)


pkg/bench/bench_test.go line 308 at r1 (raw file):

}

// runBenchmarkInsert benchmarks inserting count rows into a table.

nit: s/runBenchmarkInsert/runBenchmarkInsertLarge/, also mention how it differs from runBenchmarkInsert.

cucaroach added 7 commits May 3, 2023 12:53
This benchmark shows the reducation in allocations by the copy avoidance
changes.  Together the changes result in:

```
name                                     old time/op    new time/op    delta
SQL/Cockroach/InsertLarge/count=1000-10    18.7ms ± 4%    19.1ms ±26%     ~     (p=0.780 n=9+10)

name                                     old alloc/op   new alloc/op   delta
SQL/Cockroach/InsertLarge/count=1000-10    49.7MB ± 4%    39.7MB ±10%  -20.19%  (p=0.000 n=9+9)

name                                     old allocs/op  new allocs/op  delta
SQL/Cockroach/InsertLarge/count=1000-10     39.0k ± 5%     29.7k ± 8%  -23.83%  (p=0.000 n=10+9)
```

Release note: None
Epic: None
Facilitate some copy avoidance by using []byte instead of string.
Only copy when necessary in some cases.

Release note: None
Epic: None
Make public for use elsewhere.

Release note: None
Epic: None
DBytes is a string under the covers but we usually operate on them
with []byte APIs, avoid copies in these cases.

Release note: None
Epic: None
Cosmetic change to use the new UnsafeConvertBytesToString.

Release note: None
Epic: None
Release note: None
Epic: None
cucaroach added 5 commits May 3, 2023 12:53
Most parse routines don't retain the input string pointer so we can
use a pointer to the input bytes in those cases.

Release note: None
Epic: None
All we do here is write the bytes to a stream so we don't need a copy.

Release note: None
Epic: None
We copy the DBytes into a []byte and then copy the []byte into a string.
Now we get a reference to the string and return that if appropriate or
iterator over string bytes directly if not.  This is all predicated
on DBytes being immutable.

Release note: None
Epic: None
Epic: None
Release note: None
Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek, @nvanbenschoten, @otan, @rafiss, and @yuzefovich)


pkg/sql/sem/tree/datum.go line 381 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This function is defined in a later commit, so the build should fail on the third commit. This should be fixed. (Broken builds are bad because they make bisecting more annoying.)

Fixed


pkg/bench/bench_test.go line 308 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/runBenchmarkInsert/runBenchmarkInsertLarge/, also mention how it differs from runBenchmarkInsert.

👍

@cucaroach
Copy link
Contributor Author

Thanks for comments @yuzefovich, I addressed them and added a new commit to get the json.AsText callers. Will hold for @rafiss sign off.

@otan otan removed their request for review May 4, 2023 18:53
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes looked safe to me!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek, @nvanbenschoten, @otan, and @yuzefovich)

@cucaroach
Copy link
Contributor Author

TFTRs!
bors r+

@craig
Copy link
Contributor

craig bot commented May 31, 2023

Build succeeded:

@craig craig bot merged commit 774684d into cockroachdb:master May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants