-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Use namespace and name to match PVB to Pod restore #3051
Conversation
ab7889d
to
159a86b
Compare
Here is the sample workload, from Bridget, that gives us a repro https://gist.github.com/zubron/dce63c10b5aea026f988b8a14d2934c5#file-app-yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this, @ashish-amarnath! Just a couple of minor suggestions in test names but is otherwise good. Also looks like some other restore tests need to be updated following this change.
Could I get a summary of what the issue was? Were PVBs being assigned to the wrong pod? |
@nrb I wrote up my findings for this issue in these two comments: #3027 (comment) and #3027 (comment) |
159a86b
to
f960adb
Compare
f960adb
to
106e1d5
Compare
To summarize the bug here. The fix here is to use both the pod name and its namespace to find the podvolumebackups that pertain to the pod. Looking at the CI failures now. |
Signed-off-by: Ashish Amarnath <[email protected]>
106e1d5
to
35ee6f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Ashish Amarnath <[email protected]>
Signed-off-by: Ashish Amarnath <[email protected]>
Signed-off-by: Ashish Amarnath <[email protected]>
Signed-off-by: Ashish Amarnath <[email protected]>
In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]>
In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]>
In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]>
* Use pod namespace from backup when matching PVBs In #3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes #3467. Signed-off-by: Bridget McErlean <[email protected]> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <[email protected]>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <[email protected]>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <[email protected]>
* Restore CAPI cluster objects in a better order Restoring CAPI workload clusters without this ordering caused the capi-controller-manager code to panic, resulting in an unhealthy cluster state. This can be worked around (https://community.pivotal.io/s/article/5000e00001pJyN41611954332537?language=en_US), but we provide the inclusion of these resources as a default in order to provide a better out-of-the-box experience. Signed-off-by: Nolan Brubaker <[email protected]> * Add changelog Signed-off-by: Nolan Brubaker <[email protected]> * Use pod namespace from backup when matching PVBs (#3475) * Use pod namespace from backup when matching PVBs In #3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes #3467. Signed-off-by: Bridget McErlean <[email protected]> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <[email protected]> * Allow Dockerfiles to be configurable (#3634) For internal builds of Velero, we need to be able to specify an alternative Dockerfile which uses an alternative image registry to pull the base images from. This change adapts our Makefile such that both the main Dockerfile and build image Dockerfile can be overridden. We have some special handling for the build image to only build when the Dockerfile has changed. In this case, we check whether a custom Dockerfile has been provided, and always rebuild in that case. For custom build image Dockerfiles, use a fixed tag rather than the one based on commit SHA of the original file. Signed-off-by: Bridget McErlean <[email protected]> * Combine CRD install verification into 1 job, and update k8s versions (#3448) * Validate CRDs against latest Kubernetes versions Add Kubernetes v1.19 and v1.20 series images, and consolidate the job into a single file to reduce repetition. Signed-off-by: Nolan Brubaker <[email protected]> * Ignore job if the changes are only site/design Signed-off-by: Nolan Brubaker <[email protected]> * Fix codespell error Signed-off-by: Nolan Brubaker <[email protected]> * Cache Velero binary for reuse on workers This will cache the Velero binary based on the PR number and a SHA256 of the generated binary. This way, the runners testing each version of Kubernetes do not need to build it independently. Signed-off-by: Nolan Brubaker <[email protected]> * Fix GitHub event access Signed-off-by: Nolan Brubaker <[email protected]> * Wrap output path in quotes Signed-off-by: Nolan Brubaker <[email protected]> * Move code checkout to build step Signed-off-by: Nolan Brubaker <[email protected]> * Also cache go modules Signed-off-by: Nolan Brubaker <[email protected]> * Fix syntax issues Signed-off-by: Nolan Brubaker <[email protected]> * Download cached binary on each node Signed-off-by: Nolan Brubaker <[email protected]> * Use cached go modules on main CI Signed-off-by: Nolan Brubaker <[email protected]> * Add changelog for v1.5.4 Signed-off-by: Bridget McErlean <[email protected]> Co-authored-by: Nolan Brubaker <[email protected]>
Signed-off-by: Ashish Amarnath <[email protected]>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <[email protected]>
Signed-off-by: Ashish Amarnath <[email protected]>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <[email protected]>
Signed-off-by: Ashish Amarnath <[email protected]>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <[email protected]> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <[email protected]>
Signed-off-by: Ashish Amarnath [email protected]
Fixes: #3027
Fixes: #2691
Fixes: #2956
@zubron Thanks for the debugging and repro example.