-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor reindexing code to only reindex specific tokenizers. #2883
Conversation
Currently during reindexing, the whole index is being deleted and rebuilt, even if there are tokenizers that did not change between schemas. This change allows reindexing to happen only for tokenizers that need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 9 unresolved discussions (waiting on @golangcibot, @martinmr, and @manishrjain)
posting/index.go, line 81 at r1 (raw file):
// convenience as manually getting the tokenizers in every test that need them // would add a lot of extra code. func indexTokens(attr, lang string, src types.Val) ([]string, error) {
Put this code in the _test.go
file.
posting/index.go, line 91 at r1 (raw file):
// original uid -> value edge. // TODO - See if we need to pass op as argument as t should already have Op. func (txn *Txn) addIndexMutationsForTokenizers(ctx context.Context, tokenizers []tok.Tokenizer,
Revert naming change.
posting/index.go, line 92 at r1 (raw file):
// TODO - See if we need to pass op as argument as t should already have Op. func (txn *Txn) addIndexMutationsForTokenizers(ctx context.Context, tokenizers []tok.Tokenizer, t *pb.DirectedEdge, p types.Val, op pb.DirectedEdge_Op) error {
These args are going out of hands. Let's create a struct, which can house these, so we can cut down the args to 2 (ctx and obj). Also, no need to rename the functions.
posting/index.go, line 96 at r1 (raw file):
uid := t.Entity x.AssertTrue(uid != 0) tokens, err := indexTokensForTokenizers(attr, t.GetLang(), tokenizers, p)
This can take the obj as described above.
posting/index.go, line 124 at r1 (raw file):
// Get the complete list of tokenizers from the schema. tokenizers := schema.State().Tokenizer(t.Attr) return txn.addIndexMutationsForTokenizers(ctx, tokenizers, t, p, op)
If the struct doesn't have tokenizers specified (empty), then you can automatically do this. No need to create a helper func.
posting/index.go, line 598 at r1 (raw file):
} type indexRebuildInfo struct {
This seems like it could be housed in IndexRebuild struct?
posting/index.go, line 604 at r1 (raw file):
} func needsIndexRebuild(old *pb.SchemaUpdate, current *pb.SchemaUpdate) indexRebuildInfo {
Make this a func of IndexRebuild. Call it calculateTokenizers
or calculateActions
or something.
posting/index.go, line 677 at r1 (raw file):
} } return indexRebuildInfo{
Can define this above. So, you don't need to define tokenizersToDelete, etc. Also, make([]string, 0)
has no benefit over var x []string
, latter is natural way to write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 12 unresolved discussions (waiting on @golangcibot, @manishrjain, and @gitlw)
posting/index.go, line 81 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Put this code in the
_test.go
file.
Done.
posting/index.go, line 81 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
It seems this is only being inside the index_test.go file, can we move this function into that test file?
Done.
posting/index.go, line 91 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Revert naming change.
Done.
posting/index.go, line 92 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
These args are going out of hands. Let's create a struct, which can house these, so we can cut down the args to 2 (ctx and obj). Also, no need to rename the functions.
Done.
posting/index.go, line 96 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
This can take the obj as described above.
Done.
posting/index.go, line 120 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
To better distinguish this method from the following one, maybe it can be renamed to addIndexMutationsForAllTokenizers?
I ended up removing this special case so there is only one method called addIndexMutations. See comment below.
posting/index.go, line 124 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
If the struct doesn't have tokenizers specified (empty), then you can automatically do this. No need to create a helper func.
I don't think this is a good idea since we might pass an empty list of tokenizers by mistake and end up rebuilding the entire index. But I agree that the special function is not great so I've removed it. Now the callers are responsible for setting the list of tokenizers.
posting/index.go, line 437 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
rename to deleteIndexForAllTokenizers?
Renamed to deleteEntireIndex which is a bit shorter.
posting/index.go, line 598 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
This seems like it could be housed in IndexRebuild struct?
Not really. This struct is only being used for normal indices (not count or reverse) so I think it would be confusing to put it there. Also, it's generated in a different place than the IndexRebuild struct.
posting/index.go, line 604 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Make this a func of IndexRebuild. Call it
calculateTokenizers
orcalculateActions
or something.
I made all the need... functions a function of IndexRebuild. I kept the name for consistency since it's still determining whether the rebuild needs to happen at all and sometimes it does nothing with the tokenizers so that would be confusing.
posting/index.go, line 664 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
If we change the prevTokens and currTokens from map[string]bool to map[string]struct{} (since they are essentially sets), then we can reuse the x.Diff method to get the differences.
Done.
posting/index.go, line 677 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Can define this above. So, you don't need to define tokenizersToDelete, etc. Also,
make([]string, 0)
has no benefit overvar x []string
, latter is natural way to write it.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r1, 1 of 2 files at r2, 2 of 2 files at r3.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @golangcibot, @martinmr, and @gitlw)
posting/index.go, line 124 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
I don't think this is a good idea since we might pass an empty list of tokenizers by mistake and end up rebuilding the entire index. But I agree that the special function is not great so I've removed it. Now the callers are responsible for setting the list of tokenizers.
Rebuilding the entire index is the current behavior. If that's the worst case scenario, that's not a bad tradeoff to achieve code simplicity.
posting/index.go, line 437 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Renamed to deleteEntireIndex which is a bit shorter.
I renamed it to deleteAllTokens and below one to deleteTokensFor(attr, tokenizer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @golangcibot, @manishrjain, and @gitlw)
posting/index.go, line 124 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Rebuilding the entire index is the current behavior. If that's the worst case scenario, that's not a bad tradeoff to achieve code simplicity.
We still would need to pass the list of tokenizers in the arguments (even if it's nil) so I don't think this would reduce the complexity by that much. Right now this function is only called a couple of times so having the caller get the tokenizers is not that much trouble.
posting/index.go, line 437 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
I renamed it to deleteAllTokens and below one to
deleteTokensFor(attr, tokenizer)
Thanks.
Currently during reindexing, the whole index is being deleted and rebuilt, even if there are tokenizers that did not change between schemas. This change allows reindexing to happen only for tokenizers that need it.
…odeinc#2883) Currently during reindexing, the whole index is being deleted and rebuilt, even if there are tokenizers that did not change between schemas. This change allows reindexing to happen only for tokenizers that need it.
Currently during reindexing, the whole index is being deleted and
rebuilt, even if there are tokenizers that did not change between
schemas. This change allows reindexing to happen only for tokenizers
that need it.
This change is