Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cassandra.partition-size-for-batch-select work expectedly #21940

Merged

Conversation

abicky
Copy link
Contributor

@abicky abicky commented May 12, 2024

Description

This PR fixes cassandra.partition-size-for-batch-select behavior that uses one more partition than expected when executing CQLs.

Additional context and related issues

Steps to reproduce

Firstly, prepare a Cassandra table and records:

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
cqlsh> CREATE TABLE test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key int, clustering_key text, PRIMARY KEY(partition_key, clustering_key));
cqlsh> INSERT INTO test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key, clustering_key) VALUES (0, '0');
cqlsh> INSERT INTO test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key, clustering_key) VALUES (1, '1');
cqlsh> INSERT INTO test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key, clustering_key) VALUES (2, '2');

Then, launch trino-esrver-dev with the following changes:

diff --git a/testing/trino-server-dev/etc/catalog/cassandra.properties b/testing/trino-server-dev/etc/catalog/cassandra.properties
index d62b8a86cd4..1fb8208b60f 100644
--- a/testing/trino-server-dev/etc/catalog/cassandra.properties
+++ b/testing/trino-server-dev/etc/catalog/cassandra.properties
@@ -3,3 +3,4 @@ connector.name=cassandra
 cassandra.contact-points=localhost
 cassandra.allow-drop-table=true
 cassandra.load-policy.dc-aware.local-dc=datacenter1
+cassandra.partition-size-for-batch-select=2
diff --git a/testing/trino-server-dev/etc/log.properties b/testing/trino-server-dev/etc/log.properties
index b615d661c74..8bec119a802 100644
--- a/testing/trino-server-dev/etc/log.properties
+++ b/testing/trino-server-dev/etc/log.properties
@@ -6,6 +6,7 @@
 #
 
 io.trino=INFO
+io.trino.plugin.cassandra.CassandraSession=DEBUG
 
 # show classpath for plugins
 io.trino.server.PluginManager=DEBUG

After the server launches, you can reproduce the bug by executing the following Trino query:

select * from cassandra.test_cassandra_split_manager_keyspace.single_partition_key_column_table where partition_key in (0, 1, 2);

Here is an output from the server, and, as you can see in the log, the CQL Trino executed includes three partitions even though we set cassandra.partition-size-for-batch-select to 2.

2024-05-12T21:33:20.427+0900	DEBUG	Query-20240512_123318_00000_vupj4-174	io.trino.plugin.cassandra.CassandraSession	Execute cql for partition keys with IN clauses: SELECT DISTINCT partition_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE partition_key IN (0,1,2)
2024-05-12T21:33:21.003+0900	DEBUG	SplitRunner-20240512_123318_00000_vupj4.0.0.0-1-201	io.trino.plugin.cassandra.CassandraSession	Execute cql: SELECT partition_key,clustering_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE "partition_key" in (0,1,2)

The expected output should be as follows (I got the log when I checked the behavior of this PR):

2024-05-12T21:45:20.869+0900	DEBUG	Query-20240512_124519_00000_gkm5v-172	io.trino.plugin.cassandra.CassandraSession	Execute cql for partition keys with IN clauses: SELECT DISTINCT partition_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE partition_key IN (0,1,2)
2024-05-12T21:45:21.441+0900	DEBUG	SplitRunner-20240512_124519_00000_gkm5v.0.0.0-1-201	io.trino.plugin.cassandra.CassandraSession	Execute cql: SELECT partition_key,clustering_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE "partition_key" in (0,1)
2024-05-12T21:45:21.441+0900	DEBUG	SplitRunner-20240512_124519_00000_gkm5v.0.0.0-2-202	io.trino.plugin.cassandra.CassandraSession	Execute cql: SELECT partition_key,clustering_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE "partition_key" in (2)

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Cassandra connector
* Fix `cassandra.partition-size-for-batch-select` behavior that uses one more partition than expected when executing CQLs (#21940)

Copy link

cla-bot bot commented May 12, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@abicky
Copy link
Contributor Author

abicky commented May 12, 2024

I submitted the signed CLA at 10:23 UTC on May 12.

@ebyhr ebyhr requested review from ebyhr and mayankvadariya May 12, 2024 23:37
Copy link

cla-bot bot commented May 13, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@abicky
Copy link
Contributor Author

abicky commented May 17, 2024

@ebyhr @mayankvadariya Thank you for reviewing this pull request! Is there anything else I have to do? I'm not sure what I can do to make verification/cla-signed pass because I've already submitted the CLA.

@ebyhr
Copy link
Member

ebyhr commented May 17, 2024

Registering CLA is a manual process and usually happens per 2 weeks.

@martint Can you handle @abicky's CLA registration?

Copy link

cla-bot bot commented May 29, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@abicky abicky force-pushed the fix-cassandra-partition-size-for-batch-select branch from a1510b6 to 8bbe238 Compare May 29, 2024 13:44
Copy link

cla-bot bot commented May 29, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@abicky
Copy link
Contributor Author

abicky commented May 29, 2024

I've rebased the branch onto the master branch and resolved the conflict with the PR #22017 (CassandraType#getTrinoType is no longer available).

@ebyhr
Copy link
Member

ebyhr commented Jun 3, 2024

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Jun 3, 2024
Copy link

cla-bot bot commented Jun 3, 2024

The cla-bot has been summoned, and re-checked this pull request!

This commit fixes the bug where a single select for a single
partition key column table can include one more partition than
expected.
@ebyhr ebyhr force-pushed the fix-cassandra-partition-size-for-batch-select branch from 8bbe238 to 2230310 Compare June 3, 2024 12:17
@ebyhr ebyhr merged commit 9978120 into trinodb:master Jun 3, 2024
18 checks passed
@github-actions github-actions bot added this to the 450 milestone Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants