-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced shard targeting #3666
Comments
Being able to invoke a vindex in a Future possibility: someone someone may want to be able to provide a list of shards or keyspace IDs in a |
In general I like this proposal, but I have a question about whether we are being overly generalized here... Specifically we have concrete use cases for targeting a specific shard, for forcing a query to scatter to all shards, and for targeting a specific shard based on keyspace ID. What I'm having a harder time envisioning is why you would ever want to target a subset of the shards, i.e. when would one use Given that I think we should consider not supporting partial key ranges and restricting the feature to force routing to either one or all shards. |
I was talking to @demmer IRL about this. We were thinking that having ranges that do not cover a full shard is what feels scary. @sougou what do you think if we put validations to not allow the following?
Only something like this will be allowed: |
The specific use case for partial shard targeting comes from apps that want to pull a lot of data, but in parallel. This would work well for map-reduces, or message subscribers that would like to subscribe to a subset of the total shards. I share the discomfort of partial shard targeting. The only reason I thought we should allow it was to be consistent in our inconsistency: a keyspace id targeting targets an entire shard and not just the keyspace id. So, if that's the only concern, we can instead enforce full shard matching. We actually have that logic already in the code. |
I'm good with that too. |
There is a possible conflict here: let's say we're splitting from -80 to -40 and 40-80. If I specify -80, do I mean the keyrange -80 (that can then be converted to -40 and 40-80) or the shard -80? (that exists as a standalone shard). So I don't think we can be smart about it, is has to be a different notation. I'd also recommend going with exact shard matching, for the boundaries. This is easily compatible with horizontal resharding, and would prevent mistakes, I think. |
* The following is the first towards enhanced shard targeting as described in github vitessio#3666. * The following adds support for targeting a keyrange using the following syntaxt: keyspace[range]@tablet_type. * TODO: key range targeting within a statement. * Renames handleRange to shardRangeExec * Be consistent on how stats are calculated Address PR comments * Be more DRY and consolidate some code repitition. * Rename variables for consistency. * Oops - update to not use result, err var and just rss Signed-off-by: Rafael Chacon <[email protected]>
Current status for this feature:
|
Signed-off-by: Adam Saponara <[email protected]>
Support `USE keyspace[keyspace_id]` (#3666)
Any movement on this? |
Target a shard within a statement looks very interesting 💡 It will solve a lot of problems, especially if you are migrating from a monolithic database and you have heavy background jobs, that process data without taking into account any sharding keys, so instead of walk through the all shards we can speed up the process and run parallel jobs for each shard, you just need to know the shard number and then all queries will be addressed to the shard you needed, like: This idea is incredible and don't need to use |
In a sharded system like Vitess, it's sometimes necessary to bypass the SQL engine (V3) layer of VTGate to directly target specific shards. There are a few use cases that drive this need:
SET
, etc.Currently, there is some amount of shard targeting supported through the
USE
statement, likekeyspace:-80@master
.In the above construct
-80
is expected to match a shard name. However, there are now two new requirements:select * from `keyspace:-80`.t
The
keyspace:-80
syntax clearly signifies that-80
is a shard name. In order to avoid confusion, we should not overload that to signify keyrange. Instead, let's introduce the following proposed syntax:keyspace[-80]
for keyrange-80
keyspace[20-40]
for keyrange20-40
keyspace[11]
for targeting the shard that has keyspace id0x11
This is similar to the go slice operation except that we use
-
instead of:
. The one difference between targeting a keyrange from a specific shard is that the keyrange does not need to cover a full shard. For example, if you had shards:-20-40-60...
, then a keyrange of10-30
would cover shards-20
and20-40
. This is in line with keyspace id targeting, because it only guarantees that the shard containing the keyspace id will be targeted. But if the query matched rows outside of that keyspace id, those would still be returned.Note that this type of targeting does not give you the SQL correctness of the V3 engine. For example, if you had a join statement that targeted multiple shards, it would be blindly passed through to each shard, and the results would be combined and returned as is. This is also why the feature is powerful, because it lets you do things that cannot be done with traditional SQL statements.
Finally, this type of targeting can be extended to the
FROM
clause of SQL statements. The V3 engine will pull the associated constructs out into a separate route, and perform a pass-through query. If such constructs are involved in a join or other post-processing operations, then those things will be performed at the vtgate level.The syntax will be odd, but there is no escape:
select a from `keyspace[20-40]`.t
.Of course, tablet type targeting like
@master
will be disallowed in the from clause. That would just be too preposterous :).The text was updated successfully, but these errors were encountered: