Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Owned Primary Vindexes #5375

Closed
sougou opened this issue Oct 29, 2019 · 2 comments
Closed

RFC: Owned Primary Vindexes #5375

sougou opened this issue Oct 29, 2019 · 2 comments

Comments

@sougou
Copy link
Contributor

sougou commented Oct 29, 2019

This proposal presents a way to allow for owned lookup vindexes to also be the Primary Vindex.

Problem statement

An owned lookup vindex is a vindex for which a mapping is created from the input column value(s) to a keyspace id at the time of insert. This necessarily means that the keyspace id has to be known before this mapping can be created.

This also means that an owned vindex cannot be the primary vindex. This is because the primary vindex is the one that determines the keyspace id of a row based on the input value. The essential thinking is: if one can compute the keyspace id for a row, then there is no need to store a mapping for it.

There are, however, situations where these seemingly contradictory situations can co-exist. Here are two use cases:

  1. I want to generate a random keyspace id at the time of row insertion, but I want to remember it for later.
  2. I want to use multiple column values to compute a keyspace id, but would like to save a mapping from just one column to the computed keyspace id. This way, the lookup can be used in situations where only the mapped column was provided in the where clause. This is an upcoming use case for those who wish to geo-partition their data based on regions.

In the above cases, the vindex is capable of generating a keyspace id, and at the same time, it needs to save that data so that it can be used later when Map is called.

Solution

In order to support the new use cases, the following changes can be made:

  1. Extend the Vindex API where a vindex can export a MapNew function. This function will generate a keyspace ID.
  2. Allow owned lookup vindexes that support the MapNewfunction to be the primary vindex.
  3. If a MapNew function exists for the primary vindex, the insert will call it. Otherwise, it will use the regular Map function.
  4. If the vindex is owned, then the regular code for an owned vindex is executed, even if it’s the primary vindex. This is because MapNew would have returned us the keyspace id for the row.

Why not a MapAndCreate function instead?
A MapAndCreate will end up duplicating the work performed by an owned Vindex for the Create part. Having MapNew only generate the keyspace id keeps the functionality more orthogonal and composable.

Why do we need a separate MapNew function instead of just reusing the Map function?
This is to address the first use case. In the first use case, the MapNew function will generate a random keyspace id, whereas the Map function will perform the lookup.

Resharding

We currently don’t support resharding through a lookup vindex. This is because a vttablet is not able to read from a lookup table that may be distributed across different keyspaces and shards. Also, performing a lookup for each vreplication row may be a performance bottleneck.

For the first use case of random keyspace id generation, there is no recourse; Looking up the keyspace id may be the only way to reshard. However, this is currently only a hypothetical use case. So, there’s no need to solve this problem immediately.

For the second use case, which addresses the geo-partitioning problem, Map and MapNew have the same implementation. The only difference is that MapNew takes multiple columns as input, whereas Map takes only one column.

We have always wanted to extend Map to accept multiple columns as input. We now have the opportunity to do so. In such cases, a resharding will be able to use Map with all column values as input, thereby avoiding the need to read from lookup tables.

The VTGate itself can continue to just send the first column's value for the vindex. However, VReplication can use all column values to allow for Map to return efficiently.

@sougou
Copy link
Contributor Author

sougou commented Oct 31, 2019

The following concerns were raised by @deepthi:

  • MapNew is not a good name.
  • If we extended Map to accept multi-column inputs, we'll be potentially breaking custom vindexes that others have developed.

Taking these into consideration, the following trade-offs can be made:

  • Create a new optional API for vindexes that want to support multi-column mapping:
type MultiColumn interface {
	VerifyMulti(cursor VCursor, rowsColValues [][]sqltypes.Value, ksids [][]byte) ([]bool, error)
	MapMulti(cursor VCursor, rowsColValues [][]sqltypes.Value) ([]key.Destination, error)
}
  • Punt on a separate MapNew: the insert can just invoke Map or MultiMap for now. The exotic use case of generating random keyspace ids is harder to implement. But the simplicity of the API seems worth it. We can always implement this special API in the future if the need arises.

@sougou
Copy link
Contributor Author

sougou commented Dec 17, 2019

An update on this. It turns out that our understanding was convoluted. The above requirements can be more flexibly met by two different vindexes: the first one would a multi-column primary vindex, and the second one would be a normal owned lookup vindex.

So, we only needed to add support for multi-column primary vindexes.

This change was implemented in #5509.

@sougou sougou closed this as completed Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants