-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent alphas from asking zero to serve tablets during queries. #3091
Conversation
This change makes queries read-only in the sense that they won't ask zero to serve any predicates. Same change that was reverted due to a bug detected by running "dgraph increment". The bug has been fixed by asking zero what group is serving a particular tablet without requesting to serve that tablet. This preserves both correctness and the read-only property of queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @martinmr)
dgraph/cmd/zero/zero.go, line 578 at r1 (raw file):
// Serves returns the tablet currently serving the predicate or nil if the predicate // is not being served by any alpha. func (s *Server) Serves(
Instead of adding a new endpoint, just modify Tablet to add a new field in there, called bool read_only
, which can indicate to Zero in the ShouldServe
endpoint.
worker/groups.go, line 328 at r1 (raw file):
// BelongsToReadOnly acts like BelongsTo except it does not ask zero to serve // the tablet for key if no group is currently serving it. func (g *groupi) BelongsToReadOnly(key string) uint32 {
Let's make this return (uint32, error), so an error can be returned back properly by the caller. The other funcs in this file should ideally also return errros, to make our error handling better. There's a diff between not being able to communicate to zero, and considering a tablet as unserved -- and an error can help identify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @martinmr)
dgraph/cmd/zero/zero.go, line 579 at r1 (raw file):
// is not being served by any alpha. func (s *Server) Serves( ctx context.Context, tablet *pb.Tablet) (*pb.Tablet, error) {
I'd suggest changing the 2nd parameter from tablet to "predicate string"
query/query.go, line 173 at r1 (raw file):
ReadTs uint64 Attr string UnknownAttr bool
I'd suggesting having a field of type error to indicate something wrong has happened, which then can be used for handling other types of errors for early termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 16 files reviewed, 4 unresolved discussions (waiting on @gitlw and @manishrjain)
dgraph/cmd/zero/zero.go, line 578 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Instead of adding a new endpoint, just modify Tablet to add a new field in there, called
bool read_only
, which can indicate to Zero in theShouldServe
endpoint.
Done.
dgraph/cmd/zero/zero.go, line 579 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
I'd suggest changing the 2nd parameter from tablet to "predicate string"
Removed this method in favor of adding a read_only parameter to tablet so this is not needed anymore.
query/query.go, line 173 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
I'd suggesting having a field of type error to indicate something wrong has happened, which then can be used for handling other types of errors for early termination.
I am not sure how I would use this field. I modified the methods in groups.go to return errors so any issue while trying to determining group membership should make the query terminate early.
worker/groups.go, line 328 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Let's make this return (uint32, error), so an error can be returned back properly by the caller. The other funcs in this file should ideally also return errros, to make our error handling better. There's a diff between not being able to communicate to zero, and considering a tablet as unserved -- and an error can help identify that.
Done. BelongsTo and ServesTablet (both versions of each method) now return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify 3 important edge cases I mentioned in the comments?
Reviewed 14 of 14 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @gitlw and @martinmr)
worker/draft.go, line 188 at r2 (raw file):
return err } else if !servesTablet { return fmt.Errorf("group 1 should always serve reserved predicate %s",
I'm seeing this error sometimes.
Can you check 3 things for me:
- Do we do something to ensure in Zero that reserved predicates are assigned group 1?
- That /moveTablet endpoint would fail for them?
- That Zero does not accidentally move them during auto-rebalancing?
Do these before merging this change.
worker/groups.go, line 163 at r2 (raw file):
if tablet, err := g.Tablet(pred); err != nil { failed = true glog.V(1).Infof("Error while getting tablet for pred %s: %v", pred, err)
glog.Errorf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 16 files reviewed, 4 unresolved discussions (waiting on @gitlw and @manishrjain)
worker/draft.go, line 188 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
I'm seeing this error sometimes.
Can you check 3 things for me:
- Do we do something to ensure in Zero that reserved predicates are assigned group 1?
- That /moveTablet endpoint would fail for them?
- That Zero does not accidentally move them during auto-rebalancing?
Do these before merging this change.
- This is already being done in ShouldServe.
2 and 3 were not happening. I opened Ensure reserved predicates cannot be moved. #3137 to fix these issues.
worker/groups.go, line 163 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
glog.Errorf
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 16 files reviewed, 4 unresolved discussions (waiting on @gitlw and @manishrjain)
worker/draft.go, line 188 at r2 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
- This is already being done in ShouldServe.
2 and 3 were not happening. I opened Ensure reserved predicates cannot be moved. #3137 to fix these issues.
Done. Merged #3137 so we should be good to go now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @gitlw, @manishrjain, and @martinmr)
worker/groups.go, line 163 at r4 (raw file):
if tablet, err := g.Tablet(pred); err != nil { failed = true glog.Errorf("Error while getting tablet for pred %s: %v", pred, err)
I'd use pred %q
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 17 files reviewed, 4 unresolved discussions (waiting on @gitlw and @manishrjain)
worker/groups.go, line 163 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
I'd use
pred %q
.
Done.
…aph-io#3091) This change makes queries read-only in the sense that they won't ask zero to serve any predicates. Same change that was reverted due to a bug detected by running "dgraph increment". The bug has been fixed by asking zero what group is serving a particular tablet without requesting to serve that tablet. This preserves both correctness and the read-only property of queries.
This change makes queries read-only in the sense that they won't ask
zero to serve any predicates.
Same change that was reverted due to a bug detected by running "dgraph
increment". The bug has been fixed by asking zero what group is serving
a particular tablet without requesting to serve that tablet. This
preserves both correctness and the read-only property of queries.
I verified "dgraph increment" is working as expected with this change.
This change is