Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky E2E test checking in-cluster shardawareness #1038

Closed
wants to merge 3 commits into from

Conversation

zimnx
Copy link
Collaborator

@zimnx zimnx commented Sep 20, 2022

Both on CI and on local environments traffic between test binary and
Scylla Pods is NATed. Previous implementation gave false positives
because we are low on resources (we use just 2 shards) so probability of
hitting expected shard was high even for random assignment.
It wasn't spotted because message about wrong shard assignment is
printed out by the driver to stdout and no error is reported anywhere.

Gocql doesn't expose any way of checking if shardaware ports are used
successfully, hence I implemented simple driver within a test, able to
send initial packet and reading to which shard connection was established.

It also check in-cluster connectivity which is how clients are
connecting from, compared to from outside in previous appraoch.

Fixes #1028

Requires:

  • Client-node communication via PodIPs

@zimnx zimnx added the kind/flake Categorizes issue or PR as related to a flaky test. label Sep 20, 2022
@zimnx zimnx added this to the v1.8 milestone Sep 20, 2022
@zimnx zimnx requested a review from tnozicka September 20, 2022 15:25
@zimnx zimnx force-pushed the mz/1028-flake-shardawareness branch 4 times, most recently from 3045a05 to 11b44c9 Compare September 20, 2022 15:37
@zimnx zimnx marked this pull request as draft September 20, 2022 17:38
@zimnx zimnx force-pushed the mz/1028-flake-shardawareness branch 8 times, most recently from 5b741a1 to 93c0b50 Compare September 26, 2022 11:04
Both on CI and on local environments traffic between test binary and
Scylla Pods is NATed. Previous implementation gave false positives
because we are low on resources (we use just 2 shards) so probability of
hitting expected shard was high even for random assignment.
It wasn't spotted because message about wrong shard assignment is
printed out by the driver to stdout and no error is reported anywhere.

Gocql doesn't expose any way of checking if shardaware ports are used
successfully, hence I implemented simple driver within a test, able to
send initial packet and reading to which shard connection was established.

It also check in-cluster connectivity which is how clients are
connecting from, compared to from outside in previous appraoch.

Fixes scylladb#1028
@zimnx zimnx force-pushed the mz/1028-flake-shardawareness branch from 93c0b50 to 6b87711 Compare September 26, 2022 11:10
@zimnx zimnx force-pushed the mz/1028-flake-shardawareness branch from 6b87711 to f2e59ef Compare September 29, 2022 09:50
@tnozicka tnozicka removed this from the v1.8 milestone Aug 17, 2023
@scylla-operator-bot
Copy link
Contributor

@zimnx: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify f2e59ef link true /test verify
ci/prow/docs f2e59ef link true /test docs
ci/prow/verify-deps f2e59ef link true /test verify-deps
ci/prow/images f2e59ef link true /test images
ci/prow/build f2e59ef link true /test build
ci/prow/e2e-gke f2e59ef link true /test e2e-gke

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@zimnx zimnx closed this Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flake: ScyllaCluster should allow to build connection pool using shard aware ports
2 participants