Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider defaulting 'force' to false with Helm v3 #3798

Closed
TBBle opened this issue Mar 6, 2020 · 14 comments · Fixed by #4513
Closed

Consider defaulting 'force' to false with Helm v3 #3798

TBBle opened this issue Mar 6, 2020 · 14 comments · Fixed by #4513

Comments

@TBBle
Copy link
Contributor

TBBle commented Mar 6, 2020

Expected behavior

skaffold dev successfully updates a Helm v3-deployed cluster when code is changed on-disk.

Actual behavior

skaffold dev fails to update a Helm v3-deployed cluster when code is changed on-disk, because it passes --force to Helm, which leads to the following failures for services with dynamic (i.e. empty in the YAML) clusterIP values:

Error: UPGRADE FAILED: failed to replace object: Service "project-skaffold-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable

skaffold is bouncing off helm/helm#7082, the mega-ticket around the difference between --force in Helm v2 and Helm v3. See particularly helm/helm#7082 (comment) and a bunch of other references to 'immutable'.

Given the change in behaviour of helm --force, and until something like helm/helm#7082 introduces helm --recreate or similar (the equivalent of helm --force under Helm v2) I'd suggest either defaulting skaffold --force to off for Helm v3, or ignoring it until helm/helm#7082 is resolved.

Information

(Mostly redacted to remove comments and local project details)

apiVersion: skaffold/v1
kind: Config
build:
  tagPolicy:
    gitCommit:
      variant: Tags
  artifacts:
    - image: project-repo.registry.wargaming.net/service/skaffold
      context: ../service
      docker:
        dockerfile: Dockerfile
deploy:
  helm:
    releases:
      - name: project-skaffold
        chartPath: install/helm/project
        # We commit our subcharts, including requirements.lock and external charts.
        # This flag lets skaffold monitor the charts/ directory for updates.
        skipBuildDependencies: true
        imageStrategy:
          helm:
            explicitRegistry: true
        values:
          service.image: project-repo.registry.wargaming.net/service/skaffold

Steps to reproduce the behavior

  1. Working with our local project, but it should work in any Helm project that contains Services
  2. skaffold dev
  3. Modify the code, so that skaffold redeploys
  4. Failure observed.
  5. Control-C skaffold
  6. Repeat at step 1

Successful flow:

  1. Working with our local project, but it should work in any Helm project that contains Services
  2. skaffold dev --force=false
  3. Modify the code, so that skaffold redeploys
  4. Redeploy successful, work continues.
@balopat
Copy link
Contributor

balopat commented Mar 25, 2020

Thanks for opening @TBBle and for the detailed analysis!

I came to the same conclusion on #3864:

Workaround
skaffold dev --force=false

Next steps

  • We should figure out how to handle this more gracefully on our end - the change in --force between Helm v2/v3 should be reflected
  • we should follow proposal: handle immutable resources on update helm/helm#7082 and other conversations around this so that we can align with the right semantics on Skaffold's end

@GavinOsborn
Copy link

A quick +1 from me. We've run into identical issues that @TBBle reports.

This becomes even more frustrating if; like us; you use Cloud Code for VSCode as I'm not aware of a way to use the --force=false workaround described.

@apapia
Copy link

apapia commented Apr 30, 2020

Same issue here. Is there away to workaround with Cloud Code for Intellij to use --force=false?

@apapia
Copy link

apapia commented May 4, 2020

In case someone else runs into this, I found you can set the flag via the SKAFFOLD_FORCE env var and that the Cloud Code for Intellij plugin allows you to configure env vars in the deployment configuration.

@joberdick
Copy link

This does feel wrong to default to force. Can we atleast get a way to over-ride via skaffold.yaml?

I dont want to have to tell all of our dev teams to add the --force=false command. If we have it exposed in the skaffold.yaml that we can let folks copy, its less of pain on the end user.

@tejal29
Copy link
Member

tejal29 commented Jun 15, 2020

@TBBle You add the --force flag right now to the helm config flags.install config like this.

deploy:
  helm:
    releases:
      - name: skaffold-helm
    flags:
      install: ["--force"]

Would this work? We can definitely move this config to deploy section and add a more verbose Force config key.

@TBBle
Copy link
Contributor Author

TBBle commented Jun 15, 2020

No, other way round: skaffold is adding --force implicitly, and it doesn't work with Helm v3. So it should be off be default.

If anyone wants to use it they could pass --force on the command line like we have to pass --force=false now. It still won't work for most non-trivial charts charts, but the option already exists in case someone finds a use for it, or has a chart with no immutable fields.

It's not totally clear why skaffold set this in the first place; it was a pain point with Helm v2 too, but not a blocker.

I don't really want to have to put force: false in every skaffold.yaml I have, and have to carry those options (and relevant comments) forever.

@nkubala
Copy link
Contributor

nkubala commented Jul 21, 2020

hey all, I'm looking into this one now. based on the timeline for EOLing Helm 2, I'm inclined to simply change the default here, but I'm trying to understand and repro the issue before I do.

the flag was originally added in #1568, which was addressing an issue where deployments were being overwritten everytime with skaffold deploy - a valid issue, but this was only happening with kubectl, NOT with helm. helm might have done the same thing, but I think the right way to fix this would have been to have skaffold deploy always issue a helm install instead of a helm upgrade if it found an existing release. so, this PR may have fixed one issue but caused another.

looking at how things are now, it seems like with the release of helm 3 this did indeed start breaking people, based on the tracking issue on the helm repo linked in here. i've tried to repro this with no success though - the only way I can get skaffold to give me the Invalid value: "": field is immutable error is by explicitly setting clusterIP: "" in my service. setting clusterIP: None or just excluding it from the YAML altogether causes skaffold to work just fine in skaffold dev. @TBBle @GavinOsborn do you all set clusterIP: "" in your service yaml?

additionally, when I set it to the empty string, changing the default value of force in skaffold actually doesn't fix the issue - I get failures either way. it's possible I have a bad repro here, but maybe someone can chime in or give me a sample service yaml to play with?

➜  helm version
version.BuildInfo{Version:"v3.1.1", GitCommit:"afe70585407b420d0097d07b21c47dc511525ac8", GitTreeState:"clean", GoVersion:"go1.13.8"}
➜  skaffold version
v1.12.0-80-g82c503c98-dirty

@andrewhertog
Copy link
Contributor

@nkubala I'm running the following:

❯ helm version
version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"dirty", GoVersion:"go1.14.3"}
❯ skaffold version
v1.12.0

In troubleshooting this with some of my colleagues, we discovered that the issue was in helm 3.2 not in helm 3.1.1.
To reproduce, try with the bitnami/postgres helm chart and the default values. Run skaffold dev and then once its running and loaded, force a chart reload by changing one of the chart values.
The error only shows on a helm upgrade --force where the clusterIP value is set on cluster, but not set in the helm chart due to how the helm patching works for services.
helm/helm#6378 (comment) has a good description of the issue, but unfortunately the solution given there doesn't really solve the issue. From further discussion in the thread, its the patching that occurs when --force is provided that currently causes the issue.

While my PR (#4513 ) hopefully resolves this issue, its also written with the thought that we should never be assuming that extra flags are required. To me this doesn't seem like a good precedent to set, and easily leads to confusion. It makes more sense to be able to add additional flags.
For instance with the helm upgrade example, for skaffold dev there is no way to disable the --false flag, due to the assumption that it is always required on dev. I also don't think that the --force flag being true/set should be the default in any case (my personal opinion)

@GavinOsborn
Copy link

Hey @nkubala,
I definitely don't set clusterIP in our services, no.
I'd be happy to try and put together a simple repro tonight.

For the absence of doubt, you are triggering a reload in your testing?
skaffold dev worked fine for me on the first build and deploy, it was rebuilds triggered by a change that caused this error.

@TBBle
Copy link
Contributor Author

TBBle commented Jul 22, 2020

I'm not setting explicit clusterIP, but it shouldn't be needed to reproduce this. The forced-update via PUT requires reading back the current version of the resource, modifying it, and the writing it, so the clusterIP: "" ends up there because it's there on the live version.

See helm/helm#7082 (comment) and kubernetes/kubernetes#11237.

My initial report was with Helm 3.0.2. I'm not sure why Helm 3.1.1 wouldn't have had the same issue. It's possible that at some point, helm upgrade --force was skipping unchanged objects, in which case you'd have to change the Service itself to trigger the error. I don't think that was the case in my initial repro, I believe it was just rebuilding the container image, but that was a while ago.

@ifitzpat
Copy link

I've tried different versions of helm (3.0.3, 3.1.3, and 3.2.4) and all of them has the same issue. As above, using --force functions as a workaround.

@nkubala
Copy link
Contributor

nkubala commented Jul 22, 2020

ok, did some experimentation. after trying everything to repro on v3.1.1, I couldn't do it, so I upgraded to v3.2.4...and I immediately see the issue. no idea what changed there, but it seems like they may have inadvertently fixed the issue for one patch release and reverted it. so it's safe to say it still exists today.

I was also able to confirm that removing --force does fix the issue just like everyone here has already said 😄 so I think we should just get @andrewhertog's PR merged and be done with it for now.

I completely agree that we shouldn't be requiring explicit flags to run skaffold. this is something I've heard called the "happy path" by someone around here and it really resonates with me - you should be able to run skaffold dev on your project and it should just work. in most casess, when we're not supporting this in skaffold, it's likely because we don't fully understand the use case; we're not trying to make developers lives harder :) so thanks for sticking with us on this issue, but if there are other places where we're requiring you to set flags in order to to run and you think you're running a standard workflow, please comment on the issue and we'll try our best to understand and accomodate you.

@nkubala nkubala self-assigned this Jul 22, 2020
@GavinOsborn
Copy link

I was worried this could get bogged down in unfortunate back compat conversations.

Great outcome 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants