Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support integral cast projection pushdown in redshift #22951

Merged

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Aug 6, 2024

Description

This PR adds support for cast projection pushdown for the Oracle connector. This uses the framework introduced in #22203 for cast projection pushdown.
Currently, this PR only adds support for the SMALLINT, INT, BIGINT type cast pushdown.

The below Source type can be cast pushdown to SMALLINT, INT, BIGINT type

  • Boolean
  • SmallInt
  • Integer
  • Bigint
  • Decimal

Other source types not listed above are not supported for cast pushdown.

Release notes

(X) Release notes are required, with the following suggested text:

# Redshift
* Improve performance for queries casting columns to smallint or to integer or to bigint. ({issue}`22951`)

Comment on lines 615 to 636
case Types.TINYINT:
// TODO what will be the impact here after enabling this
// -- Redshift doesn't support tinyint
return Optional.of(tinyintColumnMapping());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this will be just a dead code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Redshift doesn't support tinyint - how would the cast would look like ?

if (isDecimalType(sourceType) && isIntegralType(targetType)) {
return "CAST(ROUND(%s) AS %s)".formatted(expression, castType);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts if the trino behaviour is a correct one.
Do you know the internals of the trino behaviour?
Maybe at least add a comment here to match trino behaviour? Not sure if it's helpful (kind of obvious if comment exists, and not that obvious if not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Trino cast from double to any other type we use DoubleOperators and here is one example where casting from double to long roundup the value.

return DoubleMath.roundToLong(value, HALF_UP);

Anyway, we will not support Float/Double value cast to other types, since for NaN and Infinity values we are getting behavior deference between with and without pushdown.

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch 3 times, most recently from 55d618d to 1761934 Compare August 13, 2024 08:29
@krvikash
Copy link
Contributor Author

Hi @ebyhr | @Praveen2112 Could you please run this PR with secrets?

@ebyhr
Copy link
Member

ebyhr commented Aug 13, 2024

@krvikash Sure, but did you miss adding TestRedshiftCastPushdown to cloud-tests profile in Redshift pom.xml?

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from 1761934 to 4c2701c Compare August 13, 2024 13:11
@krvikash
Copy link
Contributor Author

Thanks @ebyhr for reminding me. Added test in cloud-tests profile

@ebyhr
Copy link
Member

ebyhr commented Aug 13, 2024

/test-with-secrets sha=4c2701c4942bf52adc08044fef48887489eb020b

Copy link

github-actions bot commented Aug 13, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/10370817481

Copy link

github-actions bot commented Sep 3, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 3, 2024
@krvikash
Copy link
Contributor Author

krvikash commented Sep 4, 2024

Once #22728 is merged, I will continue working on this PR.

@krvikash krvikash self-assigned this Sep 4, 2024
@github-actions github-actions bot removed the stale label Sep 4, 2024
@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from 4c2701c to ef7405b Compare September 26, 2024 10:37
@krvikash
Copy link
Contributor Author

(rebased with master)

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from ef7405b to 1d0542e Compare September 26, 2024 11:50
@krvikash
Copy link
Contributor Author

(fix error prone check)

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from 1d0542e to 92d9563 Compare October 1, 2024 06:46
@krvikash
Copy link
Contributor Author

krvikash commented Oct 1, 2024

Hi @ebyhr Could you please run this PR against secrets?

@ebyhr
Copy link
Member

ebyhr commented Oct 1, 2024

/test-with-secrets sha=92d956338dfa3d357483d8297fe06a4b2e6d94b5

@krvikash krvikash changed the title Support cast projection pushdown in redshift Support integral cast projection pushdown in redshift Oct 7, 2024
@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from 9aebd47 to f6cb700 Compare October 7, 2024 12:58
@krvikash
Copy link
Contributor Author

krvikash commented Oct 7, 2024

(uncommented the test)

@krvikash
Copy link
Contributor Author

krvikash commented Oct 8, 2024

The PR is ready for another round of review.

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from f6cb700 to 186e231 Compare October 10, 2024 20:19
@krvikash
Copy link
Contributor Author

krvikash commented Oct 10, 2024

Thanks @mayankvadariya for the review.

Refactored the test cases.

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from 186e231 to a80b450 Compare October 11, 2024 12:40
@krvikash
Copy link
Contributor Author

(rebased with master)

return "CAST(%s AS %s)".formatted(expression, castType);
}

private boolean isIntegralType(Type type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this check redundant ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's redundant as of now since with this PR we are only supporting integral type cast. This is to ensure that in future if we are supporting other types (like varchar) then we should not fall under CAST(ROUND(%s) AS %s).

case Types.TINYINT:
// -- Redshift doesn't support tinyint, but requires
// tinyint data type mapping for CAST pushdown
return Optional.of(tinyintColumnMapping());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We map that column as smallint and read them as tinyint so things would fail at the time of reading ? What if the expression is hidden within the query like some sort of a filter and this column doesn't need to be read by the underlying system.

public void setupTable()
{
left = closeAfterClass(CastDataTypeTestTable.create(3)
.addColumn("id", "int", asList(11, 12, 13))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up I think it would be nice if we could capture the cast equivalent data so it would be easy to know on how cast would end up

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from a80b450 to d93fc2d Compare October 15, 2024 09:23
@krvikash
Copy link
Contributor Author

Thanks @Praveen2112 for the review. ACs.

new InvalidCastTestCase("c_decimal_16", "bigint", "Cannot cast '9223372036854775808.49' to BIGINT", "(?s).*Value out of range for 8 bytes(?s).*"),

// No pushdown for double datatype to integral types
new InvalidCastTestCase("c_infinity_1", "tinyint", "Out of range for tinyint: Infinity"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Trino handles it out without this pushdown enabled ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses DoubleOperators#castToTinyint for casting

public static long castToTinyint(@SqlType(StandardTypes.DOUBLE) double value)
{
if (Double.isNaN(value)) {
throw new TrinoException(INVALID_CAST_ARGUMENT, "Cannot cast double NaN to tinyint");
}
try {
return SignedBytes.checkedCast((long) MathFunctions.round(value));
}
catch (IllegalArgumentException e) {
throw new TrinoException(NUMERIC_VALUE_OUT_OF_RANGE, "Out of range for tinyint: " + value, e);
}
}

protected List<InvalidCastTestCase> invalidCast()
{
return ImmutableList.of(
new InvalidCastTestCase("c_varchar_decimal", "integer"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we capture the error message on Trino and redshift side ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These data types in invalidCast does not get pushdown, So can't capture the error message here. Trino Error message (.*)Cannot cast (.*) to (.*) is already mentioned inside InvalidCastTestCase. Should be explicitly mention error message here?

@Praveen2112
Copy link
Member

@krvikash Lets update the PR description as well

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from d93fc2d to f37d54c Compare October 15, 2024 10:17
Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% comment

@krvikash krvikash force-pushed the krvikash/redshift-cast-pushdown branch from f37d54c to b5c69fa Compare October 15, 2024 13:12
@krvikash
Copy link
Contributor Author

@Praveen2112 can you please run this PR against secrets?

@Praveen2112
Copy link
Member

/test-with-secrets sha=b5c69fafda2e99316c2a501e786d08978b6bad9b

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/11348864763

@krvikash
Copy link
Contributor Author

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/11348864763

Passed

@Praveen2112 Praveen2112 merged commit e4607a3 into trinodb:master Oct 16, 2024
61 checks passed
@github-actions github-actions bot added this to the 462 milestone Oct 16, 2024
@krvikash krvikash deleted the krvikash/redshift-cast-pushdown branch October 16, 2024 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants