-
Notifications
You must be signed in to change notification settings - Fork 740
e2e: TestBackupAndRestore failed to restore seed etcd member #1825
Comments
Actually the e2e-testing.logs shows the BackupAndRestoreTest test failed. |
I see. |
Some further debugging suggest restore operator was gone for a period:
|
Actually it might not be related to restore operator restarted. The restart is possibly due to disruptive_test. So this is a flake that we need to reproduce |
flake testing jenkins: |
reproducible |
When reproducing the bug, I have found that the issues come from DNS resolving. Digging further, I found that the pod spec isn't right -- it doesn't have the check-dns init container! Digging the code shows that this is somehow override in addRecoveryToPod(): etcd-operator/pkg/util/k8sutil/k8sutil.go Lines 212 to 214 in 27bf4f8
Let's fix this first and also verify if it fix the bug. |
Found the root. Very weird:
|
Actually I suspect the |
The other type of frequent failure happens when scaling up from 1->2:
|
Regarding the restore failure, here the analysis: Logs for restore failed:
In snapshot_command.go VerifyBootstrap(): Note that the compare behavior is different in etcd 3.3: |
fixed: #1875 (comment) |
jenkins job: https://jenkins-etcd.prod.coreos.systems/view/operator/job/etcd-operator-e2eslow-pr/1138/console
log:
It means the e2e-testing pod status.phase is not succeeded.
But the e2e-testing pod's log seems fine:
https://jenkins-etcd.prod.coreos.systems/view/operator/job/etcd-operator-e2eslow-pr/lastSuccessfulBuild/artifact/_output/logs/e2e-testing.e2e-testing.log
The text was updated successfully, but these errors were encountered: