Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor reindexing code to only reindex specific tokenizers. #2883

Merged
merged 6 commits into from
Jan 15, 2019

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Jan 9, 2019

Currently during reindexing, the whole index is being deleted and
rebuilt, even if there are tokenizers that did not change between
schemas. This change allows reindexing to happen only for tokenizers
that need it.


This change is Reviewable

Currently during reindexing, the whole index is being deleted and
rebuilt, even if there are tokenizers that did not change between
schemas. This change allows reindexing to happen only for tokenizers
that need it.
@martinmr martinmr requested a review from manishrjain January 9, 2019 01:44
posting/index.go Outdated Show resolved Hide resolved
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 9 unresolved discussions (waiting on @golangcibot, @martinmr, and @manishrjain)


posting/index.go, line 81 at r1 (raw file):

// convenience as manually getting the tokenizers in every test that need them
// would add a lot of extra code.
func indexTokens(attr, lang string, src types.Val) ([]string, error) {

Put this code in the _test.go file.


posting/index.go, line 91 at r1 (raw file):

// original uid -> value edge.
// TODO - See if we need to pass op as argument as t should already have Op.
func (txn *Txn) addIndexMutationsForTokenizers(ctx context.Context, tokenizers []tok.Tokenizer,

Revert naming change.


posting/index.go, line 92 at r1 (raw file):

// TODO - See if we need to pass op as argument as t should already have Op.
func (txn *Txn) addIndexMutationsForTokenizers(ctx context.Context, tokenizers []tok.Tokenizer,
	t *pb.DirectedEdge, p types.Val, op pb.DirectedEdge_Op) error {

These args are going out of hands. Let's create a struct, which can house these, so we can cut down the args to 2 (ctx and obj). Also, no need to rename the functions.


posting/index.go, line 96 at r1 (raw file):

	uid := t.Entity
	x.AssertTrue(uid != 0)
	tokens, err := indexTokensForTokenizers(attr, t.GetLang(), tokenizers, p)

This can take the obj as described above.


posting/index.go, line 124 at r1 (raw file):

	// Get the complete list of tokenizers from the schema.
	tokenizers := schema.State().Tokenizer(t.Attr)
	return txn.addIndexMutationsForTokenizers(ctx, tokenizers, t, p, op)

If the struct doesn't have tokenizers specified (empty), then you can automatically do this. No need to create a helper func.


posting/index.go, line 598 at r1 (raw file):

}

type indexRebuildInfo struct {

This seems like it could be housed in IndexRebuild struct?


posting/index.go, line 604 at r1 (raw file):

}

func needsIndexRebuild(old *pb.SchemaUpdate, current *pb.SchemaUpdate) indexRebuildInfo {

Make this a func of IndexRebuild. Call it calculateTokenizers or calculateActions or something.


posting/index.go, line 677 at r1 (raw file):

		}
	}
	return indexRebuildInfo{

Can define this above. So, you don't need to define tokenizersToDelete, etc. Also, make([]string, 0) has no benefit over var x []string, latter is natural way to write it.

posting/index.go Outdated Show resolved Hide resolved
posting/index.go Outdated Show resolved Hide resolved
posting/index.go Outdated Show resolved Hide resolved
posting/index.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 12 unresolved discussions (waiting on @golangcibot, @manishrjain, and @gitlw)


posting/index.go, line 81 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Put this code in the _test.go file.

Done.


posting/index.go, line 81 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

It seems this is only being inside the index_test.go file, can we move this function into that test file?

Done.


posting/index.go, line 91 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Revert naming change.

Done.


posting/index.go, line 92 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

These args are going out of hands. Let's create a struct, which can house these, so we can cut down the args to 2 (ctx and obj). Also, no need to rename the functions.

Done.


posting/index.go, line 96 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This can take the obj as described above.

Done.


posting/index.go, line 120 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

To better distinguish this method from the following one, maybe it can be renamed to addIndexMutationsForAllTokenizers?

I ended up removing this special case so there is only one method called addIndexMutations. See comment below.


posting/index.go, line 124 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

If the struct doesn't have tokenizers specified (empty), then you can automatically do this. No need to create a helper func.

I don't think this is a good idea since we might pass an empty list of tokenizers by mistake and end up rebuilding the entire index. But I agree that the special function is not great so I've removed it. Now the callers are responsible for setting the list of tokenizers.


posting/index.go, line 437 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

rename to deleteIndexForAllTokenizers?

Renamed to deleteEntireIndex which is a bit shorter.


posting/index.go, line 598 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This seems like it could be housed in IndexRebuild struct?

Not really. This struct is only being used for normal indices (not count or reverse) so I think it would be confusing to put it there. Also, it's generated in a different place than the IndexRebuild struct.


posting/index.go, line 604 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Make this a func of IndexRebuild. Call it calculateTokenizers or calculateActions or something.

I made all the need... functions a function of IndexRebuild. I kept the name for consistency since it's still determining whether the rebuild needs to happen at all and sometimes it does nothing with the tokenizers so that would be confusing.


posting/index.go, line 664 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

If we change the prevTokens and currTokens from map[string]bool to map[string]struct{} (since they are essentially sets), then we can reuse the x.Diff method to get the differences.

Done.


posting/index.go, line 677 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Can define this above. So, you don't need to define tokenizersToDelete, etc. Also, make([]string, 0) has no benefit over var x []string, latter is natural way to write it.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 4 files at r1, 1 of 2 files at r2, 2 of 2 files at r3.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @golangcibot, @martinmr, and @gitlw)


posting/index.go, line 124 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I don't think this is a good idea since we might pass an empty list of tokenizers by mistake and end up rebuilding the entire index. But I agree that the special function is not great so I've removed it. Now the callers are responsible for setting the list of tokenizers.

Rebuilding the entire index is the current behavior. If that's the worst case scenario, that's not a bad tradeoff to achieve code simplicity.


posting/index.go, line 437 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Renamed to deleteEntireIndex which is a bit shorter.

I renamed it to deleteAllTokens and below one to deleteTokensFor(attr, tokenizer)

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @golangcibot, @manishrjain, and @gitlw)


posting/index.go, line 124 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Rebuilding the entire index is the current behavior. If that's the worst case scenario, that's not a bad tradeoff to achieve code simplicity.

We still would need to pass the list of tokenizers in the arguments (even if it's nil) so I don't think this would reduce the complexity by that much. Right now this function is only called a couple of times so having the caller get the tokenizers is not that much trouble.


posting/index.go, line 437 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I renamed it to deleteAllTokens and below one to deleteTokensFor(attr, tokenizer)

Thanks.

@martinmr martinmr merged commit de73f40 into master Jan 15, 2019
@martinmr martinmr deleted the martinmr/reindexing-by-tokenizer branch January 15, 2019 22:27
martinmr added a commit that referenced this pull request Jan 29, 2019
Currently during reindexing, the whole index is being deleted and
rebuilt, even if there are tokenizers that did not change between
schemas. This change allows reindexing to happen only for tokenizers
that need it.
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
…odeinc#2883)

Currently during reindexing, the whole index is being deleted and
rebuilt, even if there are tokenizers that did not change between
schemas. This change allows reindexing to happen only for tokenizers
that need it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants