Introduce Concatenated Fixed-width Composite or CFC vindex #7537

wangmeng99 · 2021-02-23T00:42:45Z

Description

The purpose of this vindex is to shard the rows based on the prefix of
sharding key. Imagine the sharding key is defined as (s1, s2, ... sN), a
prefix of this key is (s1, s2, ... sj) (j <= N). This vindex puts the rows
with the same prefix among a same group of shards instead of scatter them
around all the shards. The benefit of doing so is that prefix queries will
only fanout to a subset of shards instead of all the shards. Specifically
this vindex maps the full key, i.e. (s1, s2, ... sN) to a
key.DestinationKeyspaceID and the prefix of it, i.e. (s1, s2, ... sj)(j<N)
to a key.DestinationKeyRange. Note that the prefix to key range mapping is
only active in 'LIKE' expression. When a column with CFC defined appears in
other expressions, e.g. =, !=, IN etc, it behaves exactly as other
functional unique vindexes.

Related Issue(s)

A single column vindex for controlled fanout during prefix queries #7529

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

jmoldow · 2021-02-23T05:56:21Z

Reviewers, please see also the discussion on #7332

go/test/endtoend/vtgate/prefixfanout/main_test.go

go/vt/vtgate/engine/fake_vcursor_test.go

go/vt/vtgate/engine/route_test.go

go/vt/vtgate/planbuilder/route.go

go/vt/vtgate/vindexes/cfc.go

frouioui · 2021-02-26T08:35:19Z

Regarding the license header, you can use this one:

/*
Copyright 2021 The Vitess Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

go/vt/vtgate/planbuilder/route.go

systay · 2021-02-26T08:39:02Z

go/vt/vtgate/vindexes/cfc.go

+
+// Cost returns the cost as 1.
+func (vind *CFC) Cost() int {
+	return 1


Did you consider other vindex costs when you arrived at this value?

I'm not so familiar the numeric system to infer the vindex cost. Do you have suggestions for how to calculate the cost ?

The cost can be equal to number of columns involved.

can you change the cost to sum of cost of hash function involved for each column

which is no. of column * cost of hash function

when offsets parameter is set, we know the number of columns but if it isn't set, we have no idea how many columns are involved. What do you suggest in this case ?

go/vt/vtgate/vindexes/cfc_test.go

systay · 2021-02-26T08:55:21Z

I had only minor nitpicks to contribute with, except from maybe wanting to avoid the type casting in route.go.
This feels like a solid contribution that adds a very helpful feature. Nice work!

jmoldow · 2021-02-27T06:04:02Z

go/vt/vtgate/vindexes/cfc.go

+// to a `key.DestinationKeyRange`. Note that the prefix to key range mapping is
+// only active in 'LIKE' expression. When a column with CFC defined appears in
+// other expressions, e.g. =, !=, IN etc, it behaves exactly as other
+// functional unique vindexes.


Since internally keyspace ID is a byte array, it's not immediately clear what would be on the righthand side of the comparison operator. It could be a byte sequence, or a hexadecimal string, or some other representation.

I see from your tests that you use hexadecimal. That makes sense, since that's how they are represented in public-facing documentation, and also in the in_keyrange() function, and also in shard names.

You should make that explicit in the documentation here, and anywhere else relevant.

EDIT: Reading the code more, I see you are not using hexadecimal (but that wasn't immediately clear, because all of your test cases only used the characters [a-fA-F0-9]). You are taking whatever is on the right-hand side and interpreting it as raw bytes.

go/vt/vtgate/vindexes/cfc.go

jmoldow · 2021-02-27T08:09:05Z

go/vt/vtgate/vindexes/cfc.go

+
+// we don't use the full hashed value because it's very long.
+// keyrange resolution is done via comparing []byte so longer
+// keyspace ids have performance impact.


If you use hashing functions that are already in use for vindexes, then that shouldn't be an issue, right? If they were already too long, then they would already be causing issues.

Maybe smaller, more performant hash functions are needed just for cases where you are concatenating more than 3 values together, but for the common use-cases of 2 or 3 values, maybe an ordinary vindex hash function, in its entirety, would do?

md5 hash is 16-byte. If there are only 2 or 3 components in the composite index, you are right that it doesn't matter much. However, the interface here allows arbitrarily concatenated composite index. Hence this comment is about this generic case. Specifically it makes the hashed value (of the concatenation) at most same size as the input value so that there is no added performance cost in bytes.Compare if people choose to use many components in the composite index.

jmoldow · 2021-02-27T08:15:53Z

go/vt/vtgate/vindexes/cfc.go

+func md5hash(in []byte) []byte {
+	n := len(in)
+	out := vMD5Hash(in)
+	if n < len(out) {


Based on your current implementation, this means that your unhashed data is coupled to the hashed data size. I cannot choose to take an 8-byte value, and hash it to a 4-byte value.

Unless you do all the hashing yourself, and leave hash and offsets defined, which I think is perfectly fine.

I'm more pointing out that, I think this is going to limit the utility of the hashing.

If I take my 8-byte value, choose 4 of the bytes, and then hash that, the composite hash might not be distributed as well, if I'm wrong about the distribution of those 4 bytes.

Though, the approach you're taking now has similar issues. If the hashing function returns an N-byte array, and you only take the first M-bytes of that array, it might not be distributed very well. If you really want an M-byte array instead, you should choose a function that is designed to return M-bytes, while still acting as a good hash function.

All of this makes me question whether doing automatic hashing is worth it. Or if the client should compute their own hashes, that way they can make their own determination how good of a hash they need.

The other limiting requirement here is the fact that the inputs must be fixed-width. With the client defining the function that produces the column value, they can support variable-width component functions, e.g. identity(c1) concat identity(c2), or reverse(c1) concat reverse(c2). You don't have that kind of flexibility with this hashing system. Which isn't necessarily a bad thing. But again, I'm wondering how much these hashes would end up being used, versus manual value mapping in the client.

I think it would help to see some real-world examples of how you are planning on using this. The RFC and tests only contain toy schemas. What would it look like in a production schema, which wanted to achieve soft-colocation using this CFC concept.

Thanks for the comments. I debated with myself for whether to support hashing at all, very similar to what you said. Finally I decided to add it as a way of convenience so that application doesn't have to do it. At the end of the day, there are only a limited number of shards, regardless how we hash it it's just a best effort to distribute the keyspace ids. When the components in a composite index have some narrow value range, e.g. some column in the composite index may be an enum, adding a parameter to vschema is arguably more convenient than changing application code to produce a hashed concatenated column.

A more concrete example: inventory system. A simplified data model would be that, there is 'product' within which there are many items. The mental model is, there could be couple orders of magnitude more items than products. Items check-in and check-out the inventory for sales/return/restock etc. We want to have per product view of its inventory status, e.g. how many items in stock within an hour block, average time in the warehouse in the last 7 days, return rate, etc.

product { product_id product_name msrp } item { item_id product_id serial_no price check_in_at check_out_at status }

Let's assume the status change or item creation has very high throughput. We can shard by item id and create a global secondary index on product id and normalize all the columns needed in the query with the index. But we have to perform two inserts to insert one item, one in the item table the other in the global secondary index table on another shard. It has its pros and cons. CFC vindex provides another option. We can have the item table look like

item { item_id product_id serial_no price check_in_at check_out_at status keyid // first 2 bytes of product_id + item_id }

If we define CFC vindex on keyid, given a product_id, we can then use

select count(*) from items where keyid like 'xx%' and product_id = 'xxxxxx' and check_in_at > t0 and check_in_at <= t1 and check_out_at is null

to find out within (t0, t1) how many items for the product whose id is 'xxxxxx'. This query only hit a subset of shards instead of all shards. (i.e. 'xx123' and 'xx456' may very well belong to two different shards) The write throughput of inserting/updating an item is spread across more than one shard so that its qps is not bottlenecked by a single shard.

go/vt/vtgate/vindexes/cfc.go

wangmeng99 · 2021-03-10T19:26:57Z

@systay @jmoldow could you guys take another look ?

go/test/endtoend/vtgate/prefixfanout/main_test.go

go/vt/vtgate/vindexes/cfc.go

harshit-gangal · 2021-03-15T09:03:19Z

go/vt/vtgate/vindexes/cfc.go

+
+// Cost returns the cost as 1.
+func (vind *CFC) Cost() int {
+	return 1


The cost can be equal to number of columns involved.

go/vt/vtgate/vindexes/cfc.go

go/vt/vtgate/vindexes/cfc_test.go

go/vt/vtgate/planbuilder/route.go

go/vt/vtgate/engine/fake_vcursor_test.go

wangmeng99 · 2021-04-08T04:48:51Z

Anyone knows how to fix this test which I didn't touch ? @systay @harshit-gangal

W0408 02:03:55.645384    2708 logging.go:73] grpc: addrConn.createTransport failed to connect to {fv-az54-115.internal.cloudapp.net:36991  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.1.0.70:36991: operation was canceled". Reconnecting...
--- FAIL: TestStreamMigrateMainflow (506.29s)

Signed-off-by: Meng Wang <[email protected]>

wangmeng99 · 2021-04-22T00:52:54Z

@systay @harshit-gangal PTAL

harshit-gangal · 2021-04-26T11:16:43Z

@systay @harshit-gangal PTAL

I see there are still few comments which are not addressed. I have some additional comments as well.

Also, for clarity you can make the comment as resolved if you have replied or changed the code accordingly.

Signed-off-by: Meng Wang <[email protected]>

wangmeng99 · 2021-04-28T23:51:22Z

@harshit-gangal PTAL

harshit-gangal

LGTM.

Just have one comment.

Signed-off-by: Meng Wang <[email protected]>

harshit-gangal · 2021-05-11T03:57:49Z

@wangmeng99 Could you add the documentation regarding cfc index here: https://vitess.io/docs/reference/features/vindexes/

wangmeng99 requested review from harshit-gangal and systay as code owners February 23, 2021 00:42

wangmeng99 requested a review from sougou as a code owner February 23, 2021 20:34

wangmeng99 force-pushed the cfc_vindex branch 2 times, most recently from 1908d6c to 4cf9b98 Compare February 23, 2021 23:15