Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete tablets which don't belong #3051

Merged
merged 8 commits into from
Feb 21, 2019
Merged

Delete tablets which don't belong #3051

merged 8 commits into from
Feb 21, 2019

Conversation

manishrjain
Copy link
Contributor

@manishrjain manishrjain commented Feb 20, 2019

Tablet deletion:

  • With recent changes, when a predicate is moved from one group to another, the tablet was left behind in the source group.
  • This PR adds a way for Zero to instruct the source group to delete the predicate after successfully transferring ownership to the destination group.
  • This PR also removes cleanupTablets method, so the source group does not make the decision of deleting tablets.
  • Rollup operation now does the size calculation per tablet and sends it out to Zero.
  • We no longer do a periodic calculateTabletSize anymore.
  • When Zero receives a tablet size update, after applying the proposal, it determines which tablets need to be deleted, and sends an instruction to Alpha leader accordingly.

Potential issue: Considering the job of tablet deletion happens just after leader rollup, it is possible that the followers would still be doing their rollups when the tablet deletion happens. In that case, a race condition might happen where the deleted tablet could resurface because of roll-up operation (the key is read, then clean predicate deletes it, then the key is written back). In the future, we could cancel a roll-up operation if we determine it could cause a race condition like this.

This change is Reviewable

@manishrjain manishrjain marked this pull request as ready for review February 21, 2019 01:36
Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Some comments but nothing major. Feel free to merge after looking at them.

Reviewed 5 of 5 files at r1.
Reviewable status: 4 of 5 files reviewed, 6 unresolved discussions (waiting on @manishrjain and @martinmr)


dgraph/cmd/zero/tablet.go, line 155 at r1 (raw file):

	span.Annotate(nil, msg)

	in.DestGid = 0 // Indicates deletion for the predicate.

// Indicates deletion of predicate in the source group.


dgraph/cmd/zero/zero.go, line 644 at r1 (raw file):

	pl := s.Leader(gid)
	if pl == nil {
		return x.Errorf("Unable to reach leader of group: %d.", gid)

Remove period at the end.


worker/draft.go, line 856 at r1 (raw file):

			val, _ = m.LoadOrStore(pk.Attr, sz)
		}
		sz := val.(*int64)

is sz supposed to stand for size? If so, I think renaming it to size is clearer.


worker/draft.go, line 903 at r1 (raw file):

				pred := key.(string)
				space := val.(*int64)
				sz := atomic.LoadInt64(space)

Same here regarding sz vs size.


worker/groups.go, line 140 at r1 (raw file):

	glog.Infof("Server is ready")

	gr.closer = y.NewCloser(3) // Match CLOSER:1 in this file.

Is this change supposed to be here?


worker/predicate_move.go, line 182 at r1 (raw file):

	}
	if in.DestGid == 0 {
		glog.Infof("Got instructed to delete tablet: %v", in.Predicate)

"Was instructed"

@manishrjain manishrjain changed the title Code to delete predicates which don't belong. Delete tablets which don't belong Feb 21, 2019
@manishrjain manishrjain merged commit 5b8c880 into master Feb 21, 2019
@manishrjain manishrjain deleted the mrjn/delete-tables branch February 21, 2019 22:49
manishrjain added a commit that referenced this pull request Feb 21, 2019
Tablet deletion:

- With recent changes, when a predicate is moved from one group to another, the tablet was left behind in the source group.
- This PR adds a way for Zero to instruct the source group to delete the predicate after successfully transferring ownership to the destination group.
- This PR also removes `cleanupTablets` method, so the source group does not make the decision of deleting tablets.
- Rollup operation now does the size calculation per tablet and sends it out to Zero.
- We no longer do a periodic `calculateTabletSize` anymore.
- When Zero receives a tablet size update, after applying the proposal, it determines which tablets need to be deleted, and sends an instruction to Alpha leader accordingly.

Potential issue: Considering the job of tablet deletion happens just after leader rollup, it is possible that the followers would still be doing their rollups when the tablet deletion happens. In that case, a race condition might happen where the deleted tablet could resurface because of roll-up operation (the key is read, then clean predicate deletes it, then the key is written back). In the future, we could cancel a roll-up operation if we determine it could cause a race condition like this.

Changes:
* Code to delete predicates which don't belong.
* Test and fix the predicate delete logic.
* Remove cleanup tablets. This is now coordinated by Zero.
* Use rollup to calculate the tablet sizes. That way, we don't iterate over the data periodically.
* Address Martin's comments
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
Tablet deletion:

- With recent changes, when a predicate is moved from one group to another, the tablet was left behind in the source group.
- This PR adds a way for Zero to instruct the source group to delete the predicate after successfully transferring ownership to the destination group.
- This PR also removes `cleanupTablets` method, so the source group does not make the decision of deleting tablets.
- Rollup operation now does the size calculation per tablet and sends it out to Zero.
- We no longer do a periodic `calculateTabletSize` anymore.
- When Zero receives a tablet size update, after applying the proposal, it determines which tablets need to be deleted, and sends an instruction to Alpha leader accordingly.

Potential issue: Considering the job of tablet deletion happens just after leader rollup, it is possible that the followers would still be doing their rollups when the tablet deletion happens. In that case, a race condition might happen where the deleted tablet could resurface because of roll-up operation (the key is read, then clean predicate deletes it, then the key is written back). In the future, we could cancel a roll-up operation if we determine it could cause a race condition like this.

Changes:
* Code to delete predicates which don't belong.
* Test and fix the predicate delete logic.
* Remove cleanup tablets. This is now coordinated by Zero.
* Use rollup to calculate the tablet sizes. That way, we don't iterate over the data periodically.
* Address Martin's comments
@rstorr
Copy link

rstorr commented Oct 2, 2020

@CodeLingoBot capture error messages shouldn't end in a period

@codelingo
Copy link

codelingo bot commented Oct 2, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants