Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CAST(tinyint/smallint/integer/bigint as varbinary) for Spark #9819

Closed
wants to merge 1 commit into from

Conversation

rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented May 15, 2024

Fixes #9820.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2024
Copy link

netlify bot commented May 15, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 77eda91
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6650838991df410008b730b8

@rui-mo rui-mo changed the title Fix cast(int as varbinary) Fix CAST(int as varbinary) May 15, 2024
@rui-mo
Copy link
Collaborator Author

rui-mo commented May 15, 2024

@PHILO-HE Would you like to take a review? Thanks.

velox/expression/CastExpr.cpp Outdated Show resolved Hide resolved
velox/expression/PrestoCastHooks.cpp Outdated Show resolved Hide resolved
@rui-mo rui-mo force-pushed the wip_to_binary branch 2 times, most recently from acab0aa to 59045ff Compare May 15, 2024 11:28
velox/expression/CastExpr.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/specialforms/SparkCastHooks.h Outdated Show resolved Hide resolved
@jinchengchenghh
Copy link
Contributor

Do you need to update the document?

@rui-mo rui-mo force-pushed the wip_to_binary branch 2 times, most recently from 1d4bbb0 to 2dae8eb Compare May 23, 2024 07:04
@rui-mo
Copy link
Collaborator Author

rui-mo commented May 23, 2024

@jinchengchenghh Updated the documentation. Thanks.

@rui-mo rui-mo force-pushed the wip_to_binary branch 2 times, most recently from a6a213a to 6d1ecd0 Compare May 23, 2024 12:34
@rui-mo
Copy link
Collaborator Author

rui-mo commented May 23, 2024

@mbasmanova Could you help review this change? Thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Looks great. Do we have Fuzzer coverage for this type of CAST?

@@ -184,3 +184,22 @@ Valid example
SELECT cast(' -3E+2' as decimal(12, 2)); -- -300.00
SELECT cast('-3E+2 ' as decimal(12, 2)); -- -300.00
SELECT cast(' -3E+2 ' as decimal(12, 2)); -- -300.00

Cast to Varbinary
---------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please, extend to cover full length of the title

const BaseVector& input) {
VELOX_USER_CHECK(
hooks_->canCastIntToBinary(),
"Cannot cast {} to VARBINARY.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems too late for this type of check. Shouldn't we fail much earlier when compiling the expression and creating an instance of CastExpr?

CC: @kagamiori

Copy link
Collaborator Author

@rui-mo rui-mo May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this check to constructSpecialForm which will be called by compileRewrittenExpression.

result = specialForm->constructSpecialForm(
resultType, std::move(compiledInputs), trackCpuUsage, config);

Comment on lines 871 to 873
VectorPtr result;
context.ensureWritable(rows, VARBINARY(), result);
(*result).clearNulls(rows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to

auto result = BaseVector::create<FlatVector<StringView>>(VARBINARY(), rows.end(), context.pool());

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated.

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label May 24, 2024
@mbasmanova mbasmanova changed the title Fix CAST(int as varbinary) Fix CAST(tinyint/smallint/integer/bigint as varbinary) for Spark May 24, 2024
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@rui-mo rui-mo force-pushed the wip_to_binary branch 2 times, most recently from 956cfbd to 9d05813 Compare May 24, 2024 06:50
Copy link
Collaborator Author

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have Fuzzer coverage for this type of CAST?

@mbasmanova We need to add below signatures in getSignaturesForCast for this type.

for (auto fromType : {"tinyint", "smallint", "integer", "bigint"}) {
signatures.push_back(makeCastSignature(fromType, "varbinary"));
}

std::vector<facebook::velox::exec::FunctionSignaturePtr>
getSignaturesForCast() {

But this modification would cause Presto fuzzer test to fail because of this check #9819 (comment). I wonder if I can follow-up below TODO in a separate PR to provide custom cast signatures for Presto and Spark. Thanks.

"cast",
/// TODO: Add supported Cast signatures to CastTypedExpr and expose
/// them to fuzzer instead of hard-coding signatures here.
getSignaturesForCast(),

@mbasmanova
Copy link
Contributor

@rui-mo

I wonder if I can follow-up below TODO in a separate PR to provide custom cast signatures for Presto and Spark.

That would be great. Thanks.

Comment on lines 866 to 868
VectorPtr result = BaseVector::create<FlatVector<StringView>>(
VARBINARY(), rows.end(), context.pool());
auto flatResult = result->asFlatVector<StringView>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseVector::create<FlatVector<StringView>> returns FlatVectorPtr<StringView>. If you want VectorPtr, you can just call BaseVector::create

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the the pointer. Updated.

@@ -980,6 +1029,15 @@ ExprPtr CastCallToSpecialForm::constructSpecialForm(
std::vector<ExprPtr>&& compiledChildren,
bool trackCpuUsage,
const core::QueryConfig& config) {
const auto inputKind = compiledChildren[0]->type()->kind();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this happen after checking that compiledChildren.size() == 1 on L1041?

Might be safer to do that in CastExpr's constructor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch.

Might be safer to do that in CastExpr's constructor.

SparkCastExpr extends CastExpr, so the check would impact Spark as well if it's in the constructor of CastExpr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I thought you had an API in the CastHooks that tells whether this cast is supported or not. I see that you removed that API now.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@rui-mo I'm seeing "Conbench performance report — Found 2 regressions". Would you take a look?

CC: @assignUser @kgpai

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 9446f67.

Copy link

Conbench analyzed the 1 benchmark run on commit 9446f67b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@rui-mo
Copy link
Collaborator Author

rui-mo commented May 27, 2024

@mbasmanova Thanks for your review. It appears that no regression is mentioned in the most recent report. Please kindly contact me if I'm missing something.

@mbasmanova
Copy link
Contributor

@rui-mo Perf testing can be flaky at times. This must be one of these cases. CC: @assignUser

@assignUser
Copy link
Collaborator

@mbasmanova @rui-mo Yeah looks like at the time this was committed the times where very close together so false positives can happen. But while checking this out I noticed that PR a day later clearly made the results worse: https://velox-conbench.voltrondata.run/compare/benchmark-results/0665116962307f828000737782ce0ccc...06651355ec2377e8800073bd529feb0a/

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…ebookincubator#9819)

Summary:
Fixes facebookincubator#9820.

Pull Request resolved: facebookincubator#9819

Reviewed By: amitkdutta

Differential Revision: D57756701

Pulled By: mbasmanova

fbshipit-source-id: 954440314175b4eb9a9dcba553909d448babd935
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…ebookincubator#9819)

Summary:
Fixes facebookincubator#9820.

Pull Request resolved: facebookincubator#9819

Reviewed By: amitkdutta

Differential Revision: D57756701

Pulled By: mbasmanova

fbshipit-source-id: 954440314175b4eb9a9dcba553909d448babd935
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Casting integer to binary type produces inconsistent result with Spark
6 participants