Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "managedFields" isn't restored #5701

Closed
ywk253100 opened this issue Dec 16, 2022 · 7 comments
Closed

The "managedFields" isn't restored #5701

ywk253100 opened this issue Dec 16, 2022 · 7 comments
Assignees
Milestone

Comments

@ywk253100
Copy link
Contributor

When restoring data from a backup velero doesn't restore managedFields existing in the original data.

ManagedFields contains the history of which controller/tool is responsible for each field/list entry in the object, and this history is then used to determine the result of next operation when using server side apply patches as described in https://kubernetes.io/docs/reference/using-api/server-side-apply/

Not only valero intentionally drops those valuable informations, but after restoring an object velero itself becomes manager of all the informations, thus preventing any other controller to delete fields ore remove list entries.

In order for the system to work properly when using server side apply after a restore, it is required that velero restores managedFields existing in the original objects (removing any track of velero itself).

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@reasonerjt
Copy link
Contributor

@ywk253100
Thanks for the write-up.
We should also double-check the current choice in velero to clear most of the metadata of a resource.

Maybe we should by default keep most of the fields but only clean up the ones that will cause problems.

@ywk253100 ywk253100 modified the milestones: 1.10.1, v1.11 Jan 4, 2023
@reasonerjt
Copy link
Contributor

#2416 was also trying to address the problem in managedFields

@ywk253100
Copy link
Contributor Author

Doesn't restore managedFields causes the issue when patching the objects with server-side apply:

  1. Create a configmap
kubectl apply -f configmap.yaml --server-side
  1. Backup the configmap
  2. Delete the configmap and restore it
  3. After the restoring, the manager becomes velero-server
kubectl get cm cm3 -o yaml --show-managed-fields
apiVersion: v1
data:
  player_initial_lives: "4"
  ui_properties_file_name: user-interface.properties
kind: ConfigMap
metadata:
  creationTimestamp: "2023-01-17T08:39:35Z"
  labels:
    velero.io/backup-name: my-backup-02
    velero.io/restore-name: my-restore-01
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:player_initial_lives: {}
        f:ui_properties_file_name: {}
      f:metadata:
        f:labels:
          .: {}
          f:velero.io/backup-name: {}
          f:velero.io/restore-name: {}
    manager: velero-server
    operation: Update
    time: "2023-01-17T08:39:35Z"
  name: cm3
  namespace: default
  resourceVersion: "28857226"
  uid: c5a2dac4-c3b3-4925-908f-5cd415901f3e
  1. Modify the configmap and apply it again, get conflict error
kubectl apply -f workload/configmap.yaml --server-side
error: Apply failed with 1 conflict: conflict with "velero-server" using v1: .data.player_initial_lives
Please review the fields above--they currently have other managers. Here
are the ways you can resolve this warning:
* If you intend to manage all of these fields, please re-run the apply
  command with the `--force-conflicts` flag.
* If you do not intend to manage all of the fields, please edit your
  manifest to remove references to the fields that should keep their
  current managers.
* You may co-own fields by updating your manifest to match the existing
  value; in this case, you'll become the manager if the other manager(s)
  stop managing the field (remove it from their configuration).
See https://kubernetes.io/docs/reference/using-api/server-side-apply/#conflicts

@sbueringer
Copy link

Would be nice to get this fixed. This can lead to significant issues with ClusterAPI (we are using server side apply very extensively)

@ywk253100
Copy link
Contributor Author

@sbueringer The plan is to fix this in v1.10 which should be consumed by the TKG H release.
We didn't see any issues during the testing after restoration(upgrade/scale/delete WL clusters) in the TKG G release. And per my understanding, the recommended way for controllers is to always "force" conflicts which should not be impacted by this issue. Correct me if I'm wrong.

@sbueringer
Copy link

sbueringer commented Feb 13, 2023

The plan is to fix this in v1.10 which should be consumed by the TKG H release.

Sounds good!

And per my understanding, the recommended way for controllers is to always "force" conflicts which should not be impacted by this issue.

This is correct and Cluster API uses force conflict everywhere.

The issue is a bit more nuanced.

Following example:

  • A ClusterClass-based cluster has been created
    • Just as an example, we have similar effects with legacy clusters
  • The Cluster API controller creates MachineDeployments for the cluster
    • The Cluster API controller now owns ~ all fields of the MachineDeployment
  • Velero backup
  • Reset mgmt cluster
  • Velero restore
  • Initially "velero-server" now owns ~ all fields of the MachineDeployment
  • After a reconcile of the Cluster API controller, the Cluster API controller (field manager "capi-topology") now also owns ~ all fields of the MachineDeployment
  • Important to note: the fields are now owned by "velero-server" and "capi-topology"
  • Now let's assume a user wants to unset an annotation on the MD or an optional field like "nodeDrainTimeout"
  • The user removes the corresponding config from Cluster.spec.topology
  • The Cluster API controller reconciles and doesn't set the annotation or the optional field anymore
  • Usually at this point the annotation / field would be dropped, but because they are also owned by "velero-server" they are just kept.

So tl;dr velero backup / restore leads to co-ownership of all fields by "velero-server" and "capi-topology". It's still possible for the Cluster API controller to change fields (that's the part where force conflicts is relevant), but it is impossible to unset fields (including labels / annotations)

If a user wants to correct this they have to essentially either cleanup the managed fields manually or overwrite all fields with different values so that on the next reconcile the Cluster API controller can get sole ownership of the fields.

@ywk253100
Copy link
Contributor Author

Fixed by #5853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants