Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: fix --splits regression introduced in #35349 #37401

Merged
merged 1 commit into from
May 8, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented May 8, 2019

workload's Table schemas are SQL schemas, but #35349 switched the
initial data to be returned as a coldata.Batch, which has a more limited
set of types. (Or, in the case of simple workloads that return a
[]interface{}, it's roundtripped through coldata.Batch by the Tuples
helper.) Notably, this means a SQL STRING column is represented the same
as a BYTES column (ditto UUID, etc).

This caused a regression in splits, which received some []byte data for
a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't
work for the UUID column in tpcc's history table nor the VARCHAR in
ycsb's usertable. Happily, a STRING works for both of these. It also
seems to work for BYTES columns, so it seems like the ambiguity is fine
in this case. When/if someone wants to add a workload that splits a
BYTES primary key column containing non-utf8 data, we'll may need to
revisit.

A more principled fix would be to get the fidelity back by parsing the
SQL schema, which in fact we do in importccl.makeDatumFromColOffset.
However, at the moment, this hack works and avoids the complexity and
the undesirable pkg/sql/parser dep.

Closes #37383
Closes #37382
Closes #37381
Closes #37380
Closes #37379
Closes #37378
Closes #37377
Closes #37393

Release note: None

@danhhz danhhz requested a review from tbg May 8, 2019 16:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'd be straight-up lying if I claimed that I really knew what was going on here. I assume you'll be removing the hack not too far down the road, looking forward to more details on this then.

@danhhz
Copy link
Contributor Author

danhhz commented May 8, 2019

No plans to remove the hack. It would introduce quite a bit of complexity that we don't need until someone tries to use a BYTES column in a primary key of some workload.

@tbg
Copy link
Member

tbg commented May 8, 2019

What will happen when someone does use a bytes column? Will it fail in a hard-to-diagnose way? I'm fine punting this work, but it'd be good to know that the failure mode will be somewhat actionable (will they be told they're in unchartered territory, or at least be able to find your message, basically)

If you have the time, add a bit of color to the commit message before you merge because there's very little context here -- I know something fails around splits, I assume it's because a string portion of a primary key got printed as fmt.Sprintf(X'%x', x), and I assume that that threw off the workload split code which then panicked?

workload's Table schemas are SQL schemas, but cockroachdb#35349 switched the
initial data to be returned as a coldata.Batch, which has a more limited
set of types. (Or, in the case of simple workloads that return a
[]interface{}, it's roundtripped through coldata.Batch by the `Tuples`
helper.) Notably, this means a SQL STRING column is represented the same
as a BYTES column (ditto UUID, etc).

This caused a regression in splits, which received some []byte data for
a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't
work for the UUID column in tpcc's history table nor the VARCHAR in
ycsb's usertable. Happily, a STRING works for both of these. It also
seems to work for BYTES columns, so it seems like the ambiguity is fine
in this case. When/if someone wants to add a workload that splits a
BYTES primary key column containing non-utf8 data, we'll may need to
revisit.

A more principled fix would be to get the fidelity back by parsing the
SQL schema, which in fact we do in `importccl.makeDatumFromColOffset`.
However, at the moment, this hack works and avoids the complexity and
the undesirable pkg/sql/parser dep.

Closes cockroachdb#37383
Closes cockroachdb#37382
Closes cockroachdb#37381
Closes cockroachdb#37380
Closes cockroachdb#37379
Closes cockroachdb#37378
Closes cockroachdb#37377
Closes cockroachdb#37393

Release note: None
@danhhz
Copy link
Contributor Author

danhhz commented May 8, 2019

Thanks for pushing me on this, I just took another swing at it that I think is better. RFAL

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, appreciate the second pass!

@danhhz
Copy link
Contributor Author

danhhz commented May 8, 2019

TFTR!

bors r=tbg

craig bot pushed a commit that referenced this pull request May 8, 2019
37401: workload: fix --splits regression introduced in #35349 r=tbg a=danhhz

workload's Table schemas are SQL schemas, but #35349 switched the
initial data to be returned as a coldata.Batch, which has a more limited
set of types. (Or, in the case of simple workloads that return a
[]interface{}, it's roundtripped through coldata.Batch by the `Tuples`
helper.) Notably, this means a SQL STRING column is represented the same
as a BYTES column (ditto UUID, etc).

This caused a regression in splits, which received some []byte data for
a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't
work for the UUID column in tpcc's history table nor the VARCHAR in
ycsb's usertable. Happily, a STRING works for both of these. It also
seems to work for BYTES columns, so it seems like the ambiguity is fine
in this case. When/if someone wants to add a workload that splits a
BYTES primary key column containing non-utf8 data, we'll may need to
revisit.

A more principled fix would be to get the fidelity back by parsing the
SQL schema, which in fact we do in `importccl.makeDatumFromColOffset`.
However, at the moment, this hack works and avoids the complexity and
the undesirable pkg/sql/parser dep.

Closes #37383
Closes #37382
Closes #37381
Closes #37380
Closes #37379
Closes #37378
Closes #37377
Closes #37393

Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 8, 2019

Build succeeded

@craig craig bot merged commit b0607c0 into cockroachdb:master May 8, 2019
@danhhz danhhz deleted the workload branch May 8, 2019 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants