Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced shard targeting #3666

Open
sougou opened this issue Feb 20, 2018 · 10 comments
Open

Enhanced shard targeting #3666

sougou opened this issue Feb 20, 2018 · 10 comments

Comments

@sougou
Copy link
Contributor

sougou commented Feb 20, 2018

In a sharded system like Vitess, it's sometimes necessary to bypass the SQL engine (V3) layer of VTGate to directly target specific shards. There are a few use cases that drive this need:

  • Pull data from the information_schema of a specific shard.
  • Perform specific DBA functions using SET, etc.
  • Execute statements that are otherwise not expressible using standard SQL, like: delete 100 rows from every shard.

Currently, there is some amount of shard targeting supported through the USE statement, like keyspace:-80@master.

In the above construct -80 is expected to match a shard name. However, there are now two new requirements:

  • Target a set of shards
  • Target a shard based on a specified keyspace id
  • Target a shard within a statemement, like select * from `keyspace:-80`.t

The keyspace:-80 syntax clearly signifies that -80 is a shard name. In order to avoid confusion, we should not overload that to signify keyrange. Instead, let's introduce the following proposed syntax:

  • keyspace[-80] for keyrange -80
  • keyspace[20-40] for keyrange 20-40
  • keyspace[11] for targeting the shard that has keyspace id 0x11

This is similar to the go slice operation except that we use - instead of :. The one difference between targeting a keyrange from a specific shard is that the keyrange does not need to cover a full shard. For example, if you had shards: -20-40-60..., then a keyrange of 10-30 would cover shards -20 and 20-40. This is in line with keyspace id targeting, because it only guarantees that the shard containing the keyspace id will be targeted. But if the query matched rows outside of that keyspace id, those would still be returned.

Note that this type of targeting does not give you the SQL correctness of the V3 engine. For example, if you had a join statement that targeted multiple shards, it would be blindly passed through to each shard, and the results would be combined and returned as is. This is also why the feature is powerful, because it lets you do things that cannot be done with traditional SQL statements.

Finally, this type of targeting can be extended to the FROM clause of SQL statements. The V3 engine will pull the associated constructs out into a separate route, and perform a pass-through query. If such constructs are involved in a join or other post-processing operations, then those things will be performed at the vtgate level.

The syntax will be odd, but there is no escape: select a from `keyspace[20-40]`.t.

Of course, tablet type targeting like @master will be disallowed in the from clause. That would just be too preposterous :).

@sougou
Copy link
Contributor Author

sougou commented Feb 20, 2018

@demmer @dweitzman

@dweitzman
Copy link
Member

Being able to invoke a vindex in a use statement would make this a little simpler for clients since they wouldn't need to duplicate hashing logic from the vindex (strawman syntax: ```use keyspace:binary_md5:20@replica;`)

Future possibility: someone someone may want to be able to provide a list of shards or keyspace IDs in a use statement instead of just one. It might be worth reflecting on what the syntax for that might look like if/when that day comes.

@rafael rafael self-assigned this Feb 20, 2018
@demmer
Copy link
Member

demmer commented Feb 20, 2018

In general I like this proposal, but I have a question about whether we are being overly generalized here...

Specifically we have concrete use cases for targeting a specific shard, for forcing a query to scatter to all shards, and for targeting a specific shard based on keyspace ID.

What I'm having a harder time envisioning is why you would ever want to target a subset of the shards, i.e. when would one use select a from `keyspace[20-40]`.t, especially since (as you point out) that's not even necessarily going to be a covering range.

Given that I think we should consider not supporting partial key ranges and restricting the feature to force routing to either one or all shards.

@rafael
Copy link
Member

rafael commented Feb 20, 2018

I was talking to @demmer IRL about this. We were thinking that having ranges that do not cover a full shard is what feels scary. @sougou what do you think if we put validations to not allow the following?

 10-20-40-50-60..., then a keyrange of 10-30 would cover shards -20 and 20-50

Only something like this will be allowed: 10-50 (so the ranges covers full shards (10-20, 40-50)).

@sougou
Copy link
Contributor Author

sougou commented Feb 20, 2018

The specific use case for partial shard targeting comes from apps that want to pull a lot of data, but in parallel. This would work well for map-reduces, or message subscribers that would like to subscribe to a subset of the total shards.

I share the discomfort of partial shard targeting. The only reason I thought we should allow it was to be consistent in our inconsistency: a keyspace id targeting targets an entire shard and not just the keyspace id.

So, if that's the only concern, we can instead enforce full shard matching. We actually have that logic already in the code.

@demmer
Copy link
Member

demmer commented Feb 20, 2018

I'm good with that too.

@alainjobart
Copy link
Contributor

There is a possible conflict here: let's say we're splitting from -80 to -40 and 40-80. If I specify -80, do I mean the keyrange -80 (that can then be converted to -40 and 40-80) or the shard -80? (that exists as a standalone shard). So I don't think we can be smart about it, is has to be a different notation.

I'd also recommend going with exact shard matching, for the boundaries. This is easily compatible with horizontal resharding, and would prevent mistakes, I think.

rafael added a commit to tinyspeck/vitess that referenced this issue Mar 5, 2018
* The following is the first towards enhanced shard targeting as described in
github vitessio#3666.
* The following adds support for targeting a keyrange using the following
syntaxt: keyspace[range]@tablet_type.
* TODO: key range targeting within a statement.
* Renames handleRange to shardRangeExec
* Be consistent on how stats are calculated

Address PR comments

* Be more DRY and consolidate some code repitition.
* Rename variables for consistency.
* Oops - update to not use result, err var and just rss

Signed-off-by: Rafael Chacon <[email protected]>
@rafael
Copy link
Member

rafael commented Mar 22, 2018

Current status for this feature:

  • Support for use keyspace:-80
  • Support for use keyspace[-]
  • Support for use keyspace[11]
  • Support for select * from `keyspace:-80`.t
  • Support for select * from `keyspace:[-]`.t
  • Support for select * from `keyspace[11]`.t
  • Support for delete from `keyspace:-80`.t
  • Support for delete from `keyspace:[-]`.t
  • Support for delete from `keyspace[11]`.t
  • Support for update from `keyspace:-80`.t
  • Support for update from `keyspace:[-]`.t
  • Support for update from `keyspace[11]`.t

adsr added a commit to adsr/vitess that referenced this issue Nov 29, 2018
demmer added a commit that referenced this issue Dec 14, 2018
Support `USE keyspace[keyspace_id]` (#3666)
@derekperkins
Copy link
Member

Any movement on this?

@psyxoz
Copy link

psyxoz commented Apr 8, 2021

Target a shard within a statement looks very interesting 💡 It will solve a lot of problems, especially if you are migrating from a monolithic database and you have heavy background jobs, that process data without taking into account any sharding keys, so instead of walk through the all shards we can speed up the process and run parallel jobs for each shard, you just need to know the shard number and then all queries will be addressed to the shard you needed, like: select * from 'keyspace[11]'.t ❤️

This idea is incredible and don't need to use use 'keyspace:-80', which is incompatible with Rails and connections pool, if you share the connection on the whole app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants