Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Batch Lookup With VIndex Functions in Planner #9042

Closed
mattlord opened this issue Oct 20, 2021 · 0 comments · Fixed by #9426
Closed

Support Batch Lookup With VIndex Functions in Planner #9042

mattlord opened this issue Oct 20, 2021 · 0 comments · Fixed by #9426
Assignees

Comments

@mattlord
Copy link
Contributor

mattlord commented Oct 20, 2021

Feature Description

Vitess provides a nice method to know where a row would land in the keyspace by allowing you to query the vindex function and pass it an id value to see which shard the row would land on:

mysql> select * from customer.hash where id = 5;
+------+-------------+-------------+-----------+------------------+-------+
| id   | keyspace_id | range_start | range_end | hex_keyspace_id  | shard |
+------+-------------+-------------+-----------+------------------+-------+
| 5    | p�<�
             �z    |             | �         | 70bb023c810ca87a | -80   |
+------+-------------+-------------+-----------+------------------+-------+
1 row in set (0.00 sec)

However, you must perform this check one id at a time. This makes it less convenient when you are planing to do large bulk inserts / data loads:

mysql> select id, hex_keyspace_id, shard from customer.hash where id IN (5,9999);
ERROR 1105 (HY000): unsupported: where clause for vindex function must be of the form id = <val> (not equality)

mysql> select id, hex_keyspace_id, shard from customer.hash where id = 5;
+------+------------------+-------+
| id   | hex_keyspace_id  | shard |
+------+------------------+-------+
| 5    | 70bb023c810ca87a | -80   |
+------+------------------+-------+
1 row in set (0.00 sec)

mysql> select id, hex_keyspace_id, shard from customer.hash where id = 9999;
+------+------------------+-------+
| id   | hex_keyspace_id  | shard |
+------+------------------+-------+
| 9999 | 9a49bef394017d2d | 80-   |
+------+------------------+-------+
1 row in set (0.00 sec)

Use Case(s)

It's common to have bulk data copy/load pipelines (ETL or other asynchronous data pipelines). In this pipeline you will often perform transformations to support loading this data in the most efficient way knowing the details of the target database. With Vitess, it would be nice to check the landing keyspace_id/shard for rows in bulk in order to more efficiently organize data in the pipeline so that it's grouped by shard in order to load the data as quickly as possible in Vitess with limited impact/load on the database (e.g. generating multi-row INSERTs for each shard).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant