Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: can't recover after a upgrade failed #120

Merged
merged 28 commits into from
Oct 19, 2018

Conversation

xiaojingchen
Copy link
Contributor

@xiaojingchen xiaojingchen commented Oct 13, 2018

this pr fixes #119 issue.
In order to fix the issue and make codes more reasonable, this pr rewrited upgrade feature and changed some unit test case, so the code has changed a lot, but in principle it should only affect the upgrade.

btw the pr should be merge after pr #112
This PR also fixes #134
@weekface @tennix @onlymellb @gregwebs PTAL

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need to forceUpgrade:

https://github.com/pingcap/tidb-operator/pull/120/files#diff-1f146ae94e5d78830633639d134919cdR44

The gracefulUpgrade can cover this all the situations.

return controller.RequeueErrorf("tidbcluster: [%s/%s]'s pd upgraded pod: [%s] are not ready", ns, tcName, podName)
}
continue
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove else

if tc.Status.PD.StatefulSet.CurrentReplicas == 0 {
return controller.RequeueErrorf("tidbcluster: [%s/%s]'s pd doesn't have old version pod to upgrade", ns, tcName)
if !templateEqual(newSet.Spec.Template, oldSet.Spec.Template) {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set Partition to the max

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add a comment to indicate the reason why return nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Partition default is max

if !tc.PDAllPodsStarted() {
return controller.RequeueErrorf("tidbcluster: [%s/%s]'s pd pods are not all created", ns, tcName)
}
setUpgradePartition(newSet, *oldSet.Spec.UpdateStrategy.RollingUpdate.Partition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err
}
return controller.RequeueErrorf("tidbcluster: [%s/%s]'s pd member: [%s] is transferring leader to pd member: [%s]", ns, tcName, upgradePodName, targetName)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove else

upgradePodName := pdPodName(tcName, ordinal)
if tc.Status.PD.Leader.Name == upgradePodName {
var targetName string
if ordinal == *newSet.Spec.Replicas-1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define *newSet.Spec.Replicas-1 as a var:

lastOrdinal = *newSet.Spec.Replicas-1

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

podName := tikvPodName(tcName, i-1)
setUpgradePartition(newSet, *oldSet.Spec.UpdateStrategy.RollingUpdate.Partition)
for i := tc.Status.TiKV.StatefulSet.Replicas - 1; i >= 0; i-- {
store := tku.getStoreByOrdinal(tc, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getStoreByOrdinal method should not range over tc.Status.TiKV.TombstoneStores . The reason is: #124

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@@ -84,11 +84,16 @@ func (tku *tikvUpgrader) Upgrade(tc *v1alpha1.TidbCluster, oldSet *apps.Stateful
return controller.RequeueErrorf("tidbcluster: [%s/%s]'s tikv pod: [%s] have not label: %s", ns, tcName, podName, apps.ControllerRevisionHashLabelKey)
}

if store == nil {
setUpgradePartition(newSet, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set partition to i?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return an error if store == nil

weekface
weekface previously approved these changes Oct 18, 2018
Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

targetName = pdPodName(tcName, *newSet.Spec.Replicas-1)
revision, exist := pod.Labels[apps.ControllerRevisionHashLabelKey]
if !exist {
return controller.RequeueErrorf("tidbcluster: [%s/%s]'s pd pod: [%s] have not label: %s", ns, tcName, podName, apps.ControllerRevisionHashLabelKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/have not/has no/

return nil
}

func (pu *pdUpgrader) pdStatefulSetIsUpgrading(set *apps.StatefulSet, tc *v1alpha1.TidbCluster) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused and duplicated with the method in pd_member_manager.go

@@ -106,7 +129,7 @@ func TestPDUpgraderUpgrade(t *testing.T) {
changePods: nil,
transferLeaderErr: false,
errExpectFn: func(g *GomegaWithT, err error) {
g.Expect(err.Error()).To(Equal("tidbcluster: [default/upgrader]'s pd upgraded pods are not all ready"))
g.Expect(err.Error()).To(Equal(fmt.Sprintf("tidbcluster: [default/upgrader]'s pd upgraded pod: [%s] are not ready", pdPodName(upgradeTcName, 2))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/are/is/

}
revision, exist := pod.Labels[apps.ControllerRevisionHashLabelKey]
if !exist {
return controller.RequeueErrorf("tidbcluster: [%s/%s]'s tidb pod: [%s] have not label: %s", ns, tcName, podName, apps.ControllerRevisionHashLabelKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/have not/has no/

return controller.RequeueErrorf("tidbcluster: [%s/%s]'s upgraded tikv pods are not all running", ns, tcName)
revision, exist := pod.Labels[apps.ControllerRevisionHashLabelKey]
if !exist {
return controller.RequeueErrorf("tidbcluster: [%s/%s]'s tikv pod: [%s] have not label: %s", ns, tcName, podName, apps.ControllerRevisionHashLabelKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/have not/has no/

return controller.RequeueErrorf("tidbcluster: [%s/%s]'s tikv pod: [%s] have not label: %s", ns, tcName, podName, apps.ControllerRevisionHashLabelKey)
}

if store == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to after line 76 to avoid retrieving pod when store is nil.

}
}

return nil
return controller.RequeueErrorf("tidbcluster: [%s/%s] have not find store status of tikv pod: [%s]", ns, tcName, upgradePodName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no store status found for tikv pod"

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

1 similar comment
@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@weekface weekface mentioned this pull request Oct 19, 2018
@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tennix
Copy link
Member

tennix commented Oct 19, 2018

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-tests

@weekface weekface merged commit 71b0ecd into pingcap:master Oct 19, 2018
queenliuxx pushed a commit to queenliuxx/tidb-operator that referenced this pull request Dec 19, 2018
* tidb graceful upgrade

* fix upgrade state sync bug
yahonda pushed a commit that referenced this pull request Dec 27, 2021
Co-authored-by: Lilian Lee <[email protected]>
Co-authored-by: pingcap-github-bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic during startup can not upgrade after last upgrade failed
4 participants