Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Redshift fault tolerant execution #16860

Merged
merged 1 commit into from
Apr 19, 2023

Conversation

grantatspothero
Copy link
Contributor

@grantatspothero grantatspothero commented Apr 3, 2023

Description

Adds support for fault tolerant execution to Redshift connector

Additional context and related issues

These tests can be prone to timing out based on the region that the GHA runner is spun up in.

May reduce the FTE test suite size just for Redshift if this is a problem. Will need to run the tests multiple times in order to verify.

Release notes

(x) Release notes are required, with the following suggested text:

# Redshift
* Add support for [fault-tolerant execution](/admin/fault-tolerant-execution). ({issue}`16860 `)

@ebyhr
Copy link
Member

ebyhr commented Apr 11, 2023

/test-with-secrets sha=b7f9275330dca8a702d7434a1389822440d64b71

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4672620920

@findinpath
Copy link
Contributor

The only thing I find that should be watched for is that the tests take currently take 20 minutes. We should watch every now and then that the duration of the tests doesn't become a problem.

https://github.com/trinodb/trino/actions/runs/4683487128/jobs/8300162603?pr=16968

@@ -707,6 +709,30 @@ jobs:
-Dtest.redshift.aws.region="${AWS_REGION}" \
-Dtest.redshift.aws.access-key="${AWS_ACCESS_KEY_ID}" \
-Dtest.redshift.aws.secret-key="${AWS_SECRET_ACCESS_KEY}"
- name: Fault Tolerant Redshift Tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: The non FTE tests for Redshift are called "Cloud Redsfhift tests" - maybe call these "Cloud Fault Tolerant Redshift Tests"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need separate step definition for fault-tolerant tests.
How does Fault Tolerant Redshift Tests differ from Cloud Redshift Tests above.
Can you just modify Cloud Redshift Tests to pass proper -pl :... based on the test matrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use different redshift clusters for each set of tests.

The redshift tests are prone to timing out as @findinpath mentioned and adding more tests against the same cluster will make that more likely to happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each single CI job uses a single cluster. So having multiple tests run in single profile only increases the wall time of the tests. It doesn't affect flakiness at all.

FTE tests are peculiar though because they use a semaphore to bypass the global concurrency limits. If they cause issues the concurrency can be limited (see the ctor of BaseFailureRecoveryTest iirc). Multiple clusters per PR adds non-negligible cost increase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding more tests against the same cluster will make that more likely to happen.

By Can you just modify Cloud Redshift Tests to pass proper -pl :... based on the test matrix? I did not mean to share the CI job to run fault tolerant and not fault tolerant tests.
But to instantiate two versions of "Cloud Redshift Tests"; one for normal tests; one for FTE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, yes that is possible.

Will set the step name dynamically based on the profile to disambiguate between the fte tests/cloud tests.

Copy link
Member

@anusudarsan anusudarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % @losipiuk 's comment to move to redshift module

@ebyhr
Copy link
Member

ebyhr commented Apr 17, 2023

Updated the mirror PR #16968.

@ebyhr
Copy link
Member

ebyhr commented Apr 17, 2023

The CI status is fake green due to invalid ci.yml. https://github.com/trinodb/trino/actions/runs/4726044056

[Invalid workflow file: .github/workflows/ci.yml#L610](https://github.com/trinodb/trino/actions/runs/4726044056/workflow)
The workflow is not valid. .github/workflows/ci.yml (Line: 610, Col: 13): Unexpected symbol: 'fte-tests''. Located at position 595 within expression: matrix.modules != 'plugin/trino-singlestore' && ! (contains(matrix.modules, 'trino-delta-lake') && contains(matrix.profile, 'cloud-tests')) && ! (contains(matrix.modules, 'trino-delta-lake') && contains(matrix.profile, 'gcs-tests')) && ! (contains(matrix.modules, 'trino-iceberg') && contains(matrix.profile, 'cloud-tests')) && ! (contains(matrix.modules, 'trino-bigquery') && [...]

@grantatspothero
Copy link
Contributor Author

The CI status is fake green due to invalid ci.yml. https://github.com/trinodb/trino/actions/runs/4726044056

[Invalid workflow file: .github/workflows/ci.yml#L610](https://github.com/trinodb/trino/actions/runs/4726044056/workflow)
The workflow is not valid. .github/workflows/ci.yml (Line: 610, Col: 13): Unexpected symbol: 'fte-tests''. Located at position 595 within expression: matrix.modules != 'plugin/trino-singlestore' && ! (contains(matrix.modules, 'trino-delta-lake') && contains(matrix.profile, 'cloud-tests')) && ! (contains(matrix.modules, 'trino-delta-lake') && contains(matrix.profile, 'gcs-tests')) && ! (contains(matrix.modules, 'trino-iceberg') && contains(matrix.profile, 'cloud-tests')) && ! (contains(matrix.modules, 'trino-bigquery') && [...]

doh, typo in quoting the string. fixed.

@ebyhr
Copy link
Member

ebyhr commented Apr 18, 2023

Updated the mirror PR #16968.

@losipiuk
Copy link
Member

Awesome. Thanks.

@losipiuk losipiuk merged commit 3bd49d7 into trinodb:master Apr 19, 2023
@github-actions github-actions bot added this to the 414 milestone Apr 19, 2023
This was referenced Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants