Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Remove crd conversion webhook #5065

Merged

Conversation

nshankar13
Copy link
Contributor

@nshankar13 nshankar13 commented Sep 1, 2022

Signed-off-by: nshankar13 [email protected]

Description: Resolves issue: #4967

As discussed in the issue, the crd conversion webhook currently does not act differently than how k8s would update CRDs with the conversion strategy set to None. Typically, the conversionwebhook is necessary for schema changes between CRDs on upgrades (for instance, if the field foo was renamed to the field bar). However, we can handle these scenarios separately down the road if the need arises.

Testing done: Tested upgrading from v1.0.0 to a v1.2.1-test version built from this branch and the upgrades succeeded.

OSM successfully upgraded mesh [osm] in namespace [osm-system]

Affected area:

Functional Area
New Functionality [ ]
CI System [ ]
CLI Tool [ ]
Certificate Management [ ]
Control Plane [X]
Demo [ ]
Documentation [ ]
Egress [ ]
Ingress [ ]
Install [ ]
Networking [ ]
Observability [ ]
Performance [ ]
SMI Policy [ ]
Security [ ]
Sidecar Injection [ ]
Tests [ ]
Upgrade [X]
Other [ ]

Please answer the following questions with yes/no.

  1. Does this change contain code from or inspired by another project?

    • Did you notify the maintainers and provide attribution?
  2. Is this a breaking change?

  3. Has documentation corresponding to this change been updated in the osm-docs repo (if applicable)?

if err := crdconversion.NewConversionWebhook(ctx, crdClient, certManager, osmNamespace); err != nil {
events.GenericEventRecorder().FatalEvent(err, events.InitializationError, fmt.Sprintf("Error creating crd conversion webhook: %s", err))
}

version.SetMetric()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the reconciler as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steeling looks like we also add the reconcile label to the mutatingwebhook and validatingwebhook, should we handle the reconciler removal in a follow up PR? Or will removing the conversionwebhook while keeping the reconciler introduce bugs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should delete it for all. it's broken on upgrades in that an old reconciler will stomp over any changes we try to make

@shashankram shashankram added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 1, 2022
Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on a fix for the upgrade issue.

I want to ensure we address a couple of things before we make a decision on completely removing the CRD conversion framework.

  1. Does this actually fix the bug. If yes, kindly provide a repro for the issue and add a e2e test to prove this is the correct fix.
  2. Have a proposal on how CRD schema changes are to be handled. Completely removing this will only push this problem down the road for someone else that needs to update the CRD schema to worry about. I don't believe this is the correct approach to solving this problem.
  3. Instead of deleting the framework, add a feature flag to optionally turn it on, as it may be desirable to re-integrate this in the future. We do not know of how we can seamlessly handle CRD conversions when the API schema changes, so I am uncomfortable trading 1 bug fix without a concrete proposal to address future needs, which will arise.

Overall, I'd like to be more cautious on making such changes without a definitive way to repro and verify the fix while addressing future extensibility.

@steeling
Copy link
Contributor

steeling commented Sep 1, 2022

Thanks for working on a fix for the upgrade issue.

I want to ensure we address a couple of things before we make a decision on completely removing the CRD conversion framework.

  1. Does this actually fix the bug. If yes, kindly provide a repro for the issue and add a e2e test to prove this is the correct fix.

I have a repro for this, happy to demo it, although adding a test for this would be difficult (but not impossible). I think we should move forward with the fix now, and can circle back to an E2E test in the future.

For those wondering the steps are simple:

  1. Deploy OSM v1.0.0
  2. Scale the bootstrap pod to 0
  3. Upgrade to v1.2.0
  4. The new bootstrap pod fails to initialize
  5. Attempt to scale the old bootstrap pod back up
  6. Both the old and new bootstrap pod cannot start, and the upgrade fails.
  7. Rollout restart the injector and controller, and now all components are down and nothing can turn on.
  1. Have a proposal on how CRD schema changes are to be handled. Completely removing this will only push this problem down the road for someone else that needs to update the CRD schema to worry about. I don't believe this is the correct approach to solving this problem.

There's a proposal here with the full description

  1. Instead of deleting the framework, add a feature flag to optionally turn it on, as it may be desirable to re-integrate this in the future. We do not know of how we can seamlessly handle CRD conversions when the API schema changes, so I am uncomfortable trading 1 bug fix without a concrete proposal to address future needs, which will arise.

I disagree with adding a feature flag for this. Version control makes it trivial to bring it back if we desire, by simply reverting this PR. In the proposal I demonstrate how and why we won't need a conversion webhook, and that there are a lot of (unsafe) misconceptions about how the webhook works.

Overall, I'd like to be more cautious on making such changes without a definitive way to repro and verify the fix while addressing future extensibility.

@shashankram
Copy link
Member

Thanks for working on a fix for the upgrade issue.
I want to ensure we address a couple of things before we make a decision on completely removing the CRD conversion framework.

  1. Does this actually fix the bug. If yes, kindly provide a repro for the issue and add a e2e test to prove this is the correct fix.

I have a repro for this, happy to demo it, although adding a test for this would be difficult (but not impossible). I think we should move forward with the fix now, and can circle back to an E2E test in the future.

For those wondering the steps are simple:

  1. Deploy OSM v1.0.0
  2. Scale the bootstrap pod to 0
  3. Upgrade to v1.2.0
  4. The new bootstrap pod fails to initialize
  5. Attempt to scale the old bootstrap pod back up
  6. Both the old and new bootstrap pod cannot start, and the upgrade fails.
  7. Rollout restart the injector and controller, and now all components are down and nothing can turn on.
  1. Have a proposal on how CRD schema changes are to be handled. Completely removing this will only push this problem down the road for someone else that needs to update the CRD schema to worry about. I don't believe this is the correct approach to solving this problem.

There's a proposal here with the full description

  1. Instead of deleting the framework, add a feature flag to optionally turn it on, as it may be desirable to re-integrate this in the future. We do not know of how we can seamlessly handle CRD conversions when the API schema changes, so I am uncomfortable trading 1 bug fix without a concrete proposal to address future needs, which will arise.

I disagree with adding a feature flag for this. Version control makes it trivial to bring it back if we desire, by simply reverting this PR. In the proposal I demonstrate how and why we won't need a conversion webhook, and that there are a lot of (unsafe) misconceptions about how the webhook works.

Overall, I'd like to be more cautious on making such changes without a definitive way to repro and verify the fix while addressing future extensibility.

@steeling Thanks for sharing your proposal, looking at it now.

Also please share a repro script and link it to the bug if possible.

@shashankram shashankram self-requested a review September 1, 2022 21:20
@shashankram shashankram dismissed their stale review September 1, 2022 21:21

Unblocking this based on offline discussions.

@nshankar13 nshankar13 force-pushed the remove_conversion_webhook branch 5 times, most recently from 9f4c9ea to 9e4ab55 Compare September 2, 2022 05:43
@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2022

Codecov Report

Merging #5065 (8b06752) into main (ab69461) will increase coverage by 0.16%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #5065      +/-   ##
==========================================
+ Coverage   68.83%   69.00%   +0.16%     
==========================================
  Files         214      204      -10     
  Lines       16025    15745     -280     
==========================================
- Hits        11031    10865     -166     
+ Misses       4934     4821     -113     
+ Partials       60       59       -1     
Flag Coverage Δ
unittests 69.00% <0.00%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/osm-bootstrap/osm-bootstrap.go 45.36% <0.00%> (-0.89%) ⬇️
pkg/metricsstore/metricsstore.go 92.67% <ø> (-0.23%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@nshankar13 nshankar13 force-pushed the remove_conversion_webhook branch from 9e4ab55 to 278514c Compare September 2, 2022 06:07
dockerfiles/Dockerfile.osm-crds Outdated Show resolved Hide resolved
dockerfiles/Dockerfile.osm-crds Outdated Show resolved Hide resolved
@nshankar13 nshankar13 force-pushed the remove_conversion_webhook branch from 278514c to 37c6c3a Compare September 2, 2022 17:35
charts/osm/templates/osm-deployment.yaml Outdated Show resolved Hide resolved
cmd/osm-bootstrap/crds/patch_crds.sh Outdated Show resolved Hide resolved
cmd/osm-bootstrap/osm-bootstrap.go Outdated Show resolved Hide resolved
cmd/osm-bootstrap/osm-bootstrap.go Outdated Show resolved Hide resolved
cmd/osm-bootstrap/osm-bootstrap.go Outdated Show resolved Hide resolved
cmd/osm-bootstrap/osm-bootstrap.go Outdated Show resolved Hide resolved
cmd/osm-bootstrap/osm-bootstrap.go Outdated Show resolved Hide resolved
cmd/osm-bootstrap/osm-bootstrap.go Outdated Show resolved Hide resolved
cmd/osm-bootstrap/osm-bootstrap.go Show resolved Hide resolved
@nshankar13 nshankar13 force-pushed the remove_conversion_webhook branch 2 times, most recently from 30c9774 to 8b06752 Compare September 2, 2022 18:11
@nshankar13 nshankar13 force-pushed the remove_conversion_webhook branch from 8b06752 to c5ea9ed Compare September 2, 2022 18:31
@steeling steeling merged commit 68e99eb into openservicemesh:main Sep 2, 2022
nshankar13 added a commit to nshankar13/osm that referenced this pull request Sep 2, 2022
Remove crd conversion webhook (openservicemesh#5065)

Signed-off-by: nshankar13 <[email protected]>
nshankar13 added a commit to nshankar13/osm that referenced this pull request Sep 2, 2022
Remove crd conversion webhook (openservicemesh#5065)

Signed-off-by: nshankar13 <[email protected]>

Add health handler

Signed-off-by: nshankar13 <[email protected]>
nshankar13 added a commit to nshankar13/osm that referenced this pull request Sep 2, 2022
Remove crd conversion webhook (openservicemesh#5065)

Signed-off-by: nshankar13 <[email protected]>

Add health handler

Signed-off-by: nshankar13 <[email protected]>
keithmattix pushed a commit that referenced this pull request Sep 6, 2022
* [backport] cherry-pick 68e99eb to release-v1.1

Remove crd conversion webhook (#5065)

Signed-off-by: nshankar13 <[email protected]>

Add health handler

Signed-off-by: nshankar13 <[email protected]>

* health handler

Signed-off-by: nshankar13 <[email protected]>

* pin linter to v1.46.0

Signed-off-by: nshankar13 <[email protected]>

Signed-off-by: nshankar13 <[email protected]>
jsturtevant added a commit to jsturtevant/osm that referenced this pull request Sep 7, 2022
Signed-off-by: James Sturtevant <[email protected]>
keithmattix pushed a commit that referenced this pull request Sep 8, 2022
Signed-off-by: James Sturtevant <[email protected]>
whitneygriffith pushed a commit to whitneygriffith/osm that referenced this pull request Sep 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants