workload: fix --splits regression introduced in #35349 #37401

danhhz · 2019-05-08T16:27:50Z

workload's Table schemas are SQL schemas, but #35349 switched the
initial data to be returned as a coldata.Batch, which has a more limited
set of types. (Or, in the case of simple workloads that return a
[]interface{}, it's roundtripped through coldata.Batch by the Tuples
helper.) Notably, this means a SQL STRING column is represented the same
as a BYTES column (ditto UUID, etc).

This caused a regression in splits, which received some []byte data for
a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't
work for the UUID column in tpcc's history table nor the VARCHAR in
ycsb's usertable. Happily, a STRING works for both of these. It also
seems to work for BYTES columns, so it seems like the ambiguity is fine
in this case. When/if someone wants to add a workload that splits a
BYTES primary key column containing non-utf8 data, we'll may need to
revisit.

A more principled fix would be to get the fidelity back by parsing the
SQL schema, which in fact we do in importccl.makeDatumFromColOffset.
However, at the moment, this hack works and avoids the complexity and
the undesirable pkg/sql/parser dep.

Closes #37383
Closes #37382
Closes #37381
Closes #37380
Closes #37379
Closes #37378
Closes #37377
Closes #37393

Release note: None

cockroach-teamcity · 2019-05-08T16:27:58Z

This change is

tbg

LGTM but I'd be straight-up lying if I claimed that I really knew what was going on here. I assume you'll be removing the hack not too far down the road, looking forward to more details on this then.

danhhz · 2019-05-08T16:36:54Z

No plans to remove the hack. It would introduce quite a bit of complexity that we don't need until someone tries to use a BYTES column in a primary key of some workload.

tbg · 2019-05-08T16:49:06Z

What will happen when someone does use a bytes column? Will it fail in a hard-to-diagnose way? I'm fine punting this work, but it'd be good to know that the failure mode will be somewhat actionable (will they be told they're in unchartered territory, or at least be able to find your message, basically)

If you have the time, add a bit of color to the commit message before you merge because there's very little context here -- I know something fails around splits, I assume it's because a string portion of a primary key got printed as fmt.Sprintf(X'%x', x), and I assume that that threw off the workload split code which then panicked?

workload's Table schemas are SQL schemas, but cockroachdb#35349 switched the initial data to be returned as a coldata.Batch, which has a more limited set of types. (Or, in the case of simple workloads that return a []interface{}, it's roundtripped through coldata.Batch by the `Tuples` helper.) Notably, this means a SQL STRING column is represented the same as a BYTES column (ditto UUID, etc). This caused a regression in splits, which received some []byte data for a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't work for the UUID column in tpcc's history table nor the VARCHAR in ycsb's usertable. Happily, a STRING works for both of these. It also seems to work for BYTES columns, so it seems like the ambiguity is fine in this case. When/if someone wants to add a workload that splits a BYTES primary key column containing non-utf8 data, we'll may need to revisit. A more principled fix would be to get the fidelity back by parsing the SQL schema, which in fact we do in `importccl.makeDatumFromColOffset`. However, at the moment, this hack works and avoids the complexity and the undesirable pkg/sql/parser dep. Closes cockroachdb#37383 Closes cockroachdb#37382 Closes cockroachdb#37381 Closes cockroachdb#37380 Closes cockroachdb#37379 Closes cockroachdb#37378 Closes cockroachdb#37377 Closes cockroachdb#37393 Release note: None

danhhz · 2019-05-08T18:05:31Z

Thanks for pushing me on this, I just took another swing at it that I think is better. RFAL

tbg

Thanks, appreciate the second pass!

danhhz · 2019-05-08T18:50:20Z

TFTR!

bors r=tbg

37401: workload: fix --splits regression introduced in #35349 r=tbg a=danhhz workload's Table schemas are SQL schemas, but #35349 switched the initial data to be returned as a coldata.Batch, which has a more limited set of types. (Or, in the case of simple workloads that return a []interface{}, it's roundtripped through coldata.Batch by the `Tuples` helper.) Notably, this means a SQL STRING column is represented the same as a BYTES column (ditto UUID, etc). This caused a regression in splits, which received some []byte data for a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't work for the UUID column in tpcc's history table nor the VARCHAR in ycsb's usertable. Happily, a STRING works for both of these. It also seems to work for BYTES columns, so it seems like the ambiguity is fine in this case. When/if someone wants to add a workload that splits a BYTES primary key column containing non-utf8 data, we'll may need to revisit. A more principled fix would be to get the fidelity back by parsing the SQL schema, which in fact we do in `importccl.makeDatumFromColOffset`. However, at the moment, this hack works and avoids the complexity and the undesirable pkg/sql/parser dep. Closes #37383 Closes #37382 Closes #37381 Closes #37380 Closes #37379 Closes #37378 Closes #37377 Closes #37393 Release note: None Co-authored-by: Daniel Harrison <[email protected]>

craig · 2019-05-08T19:09:24Z

Build succeeded

GitHub CI (Cockroach)

danhhz requested a review from tbg May 8, 2019 16:27

tbg approved these changes May 8, 2019

View reviewed changes

This was referenced May 8, 2019

roachtest: tpccbench/nodes=6/cpu=16/multi-az failed [#37866] #37250

Closed

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #36720

Closed

roachtest: tpccbench/nodes=3/cpu=4 failed [skipped] #36697

Closed

danhhz force-pushed the workload branch from afa640f to b0607c0 Compare May 8, 2019 18:02

tbg approved these changes May 8, 2019

View reviewed changes

craig bot merged commit b0607c0 into cockroachdb:master May 8, 2019

danhhz deleted the workload branch May 8, 2019 19:21

yuzefovich mentioned this pull request May 9, 2019

panic: unsupported type when running tpc-c workload #37414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workload: fix --splits regression introduced in #35349 #37401

workload: fix --splits regression introduced in #35349 #37401

danhhz commented May 8, 2019 •

edited

Loading

cockroach-teamcity commented May 8, 2019

tbg left a comment

danhhz commented May 8, 2019

tbg commented May 8, 2019

danhhz commented May 8, 2019

tbg left a comment

danhhz commented May 8, 2019

craig bot commented May 8, 2019

workload: fix --splits regression introduced in #35349 #37401

workload: fix --splits regression introduced in #35349 #37401

Conversation

danhhz commented May 8, 2019 • edited Loading

cockroach-teamcity commented May 8, 2019

tbg left a comment

Choose a reason for hiding this comment

danhhz commented May 8, 2019

tbg commented May 8, 2019

danhhz commented May 8, 2019

tbg left a comment

Choose a reason for hiding this comment

danhhz commented May 8, 2019

craig bot commented May 8, 2019

Build succeeded

danhhz commented May 8, 2019 •

edited

Loading