Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use k8s cluster default imagePullPolicy #50

Open
arno01 opened this issue Aug 25, 2021 · 15 comments
Open

use k8s cluster default imagePullPolicy #50

arno01 opened this issue Aug 25, 2021 · 15 comments
Labels
repo/provider Akash provider-services repo issues

Comments

@arno01
Copy link

arno01 commented Aug 25, 2021

Reproducer

  1. build docker image "test:123" which would just "echo 123" and "docker push" it to the registry;
  2. create the deployment (send-manifest);
  3. get the logs from the provider (provider lease-logs);
  4. close the deployment
  5. repeat first 4 steps except that do some modification to the image in step 1, say "echo 555", but keep the same image tag "test:123".

Update: this is mostly about the image pull policy for the images with the :latest and untagged images as seen from below discussion. Other tags, such as test:123 should stay test:123 1:1 (immutable).

Expected behavior:

Provider should pull the new image if it has :latest tag or untagged.

Actual behavior:

The provider will not pull the new image, it will start the old image.

Workaround:

There is no workaround to a changed default K8s behavior.

Provider in question:

I haven't tested different providers.

"equinix-metal-ams1","akash","mn2-ng","https://provider.provider-0.prod.ams1.akash.pub:8443","akash14c4ng96vdle6tae8r4hc2w4ujwrsh3x9tuudk0"
@arno01
Copy link
Author

arno01 commented Aug 25, 2021

cc @dmikey

@boz
Copy link

boz commented Aug 25, 2021

@arno01, this is expected behavior. You need to use a different tag to deploy updates. Images are heavily cached, pushing changes to a single tag will cause both versions to be running very easily, even if we forced a pull every time.

@boz boz closed this as completed Aug 25, 2021
@arno01
Copy link
Author

arno01 commented Aug 29, 2021

@arno01, this is expected behavior. You need to use a different tag to deploy updates. Images are heavily cached, pushing changes to a single tag will cause both versions to be running very easily, even if we forced a pull every time.

@boz this should be mentioned in the Akash Docs then, since I've already seen two cases where people would not figure why their image is not working at one provider but does at the other one. 1 2

And then this breaks the default Kubernetes behavior for the :latest image tag, since it should always be re-pulling images of this tag:

  • omit the imagePullPolicy and use :latest as the tag for the image to use; Kubernetes will set the policy to Always.

Source: https://kubernetes.io/docs/concepts/containers/images/#updating-images

I would actually be more inclined to let people specify in their deployment manifests the imagePullPolicy in order to get more flexible about this.

And then, most of people are going to be using same tags, e.g. nginx:stable, nginx:mainline, nginx:latest, nginx:1.21 (can replace nginx practically with any image).
The issue I see with that (the providers are heavily caching the images) is that the security issues in the libraries of the images (nor the apps themselves in some cases, i.e. nginx:stable, nginx:1, nginx:latest) will not get pushed as people re-deploy their deployments should they get alerted by CVE reports.

@boz
Copy link

boz commented Aug 29, 2021

it is best practice to use a version tag. we can add that to the docs.

@boz
Copy link

boz commented Aug 30, 2021

It says as much right in your source

Caution:
You should avoid using the latest tag when deploying containers in production, as it is harder to track which version of the image is running and more difficult to roll back to a working version.

Instead, specify a meaningful tag such as v1.42.0.

@arno01
Copy link
Author

arno01 commented Aug 31, 2021

@boz

It says as much right in your source

Caution:
You should avoid using the latest tag when deploying containers in production, as it is harder to track which version of the image is running and more difficult to roll back to a working version.

Instead, specify a meaningful tag such as v1.42.0.

I agree with that recommendation, however the concern I've raised in this issue has nothing to do with that recommendation.

This is a recommendation for running the production containers which many of people do not.
Many (probably even the majority) are going to be using no tag which defaults to :latest. (People are usually modifying their images then pushing them again and over again, with :latest tag.)
(Edit: And this is not only about the :latest tag. See Update 1 below)

As I'm typing this, there is a 3rd occurrence I'm seeing people are struggling because of this and I'm having to explain them the Akash providers are heavily caching the images, they should use a new tag for every new image update.

My point is not to lobby for using the :latest tag but rather to fix the imagePullPolicy which is not currently set to Always for the untagged images (or tagged with :latest tag) on the Akash providers.
I suggest to stick with the default imagePullPolicy
otherwise there are more and more people will be hitting this issue, there is no doubt about it.

Update 1:
And then, the same applies to the other than :latest image tags.
There are plenty of projects leverage CI/CD for constantly building and re-pushing the same tags, when that happens, the libraries are getting updated (i.e. openssl library in the nginx container).
In this case, the image tag stays always same, just as I've already described above https://github.com/ovrclk/akash/issues/1354#issuecomment-907866119
Which leads to security issues when Akash provider caches these images and is not re-pulling them.

Basically this deviation from the defaults is not just a nuisance but rather a security issue in not having to re-pull the images. (simplest example: nginx:stable won't get re-pulled, therefore, users won't get security updates on the redeployment nor other people who make their first nginx:stable deployment will likely to receive a stale cached copy of nginx:stable [which may turn out to not be as :stable as it sounds ;)])

@zjor
Copy link

zjor commented Oct 16, 2021

100% agree with @arno01
I wish :latest worked out of the box

@arno01 arno01 reopened this Jun 11, 2022
@tidrolpolelsef
Copy link

My recommendation back when mainnet launched was to block :latest, :stable etc. The labels don't actually mean anything unfortunately. For example, there is no guaranteed that :latest is in fact the latest.

@andy108369
Copy link
Contributor

andy108369 commented Jul 14, 2022

For example, there is no guaranteed that :latest is in fact the latest.

It should guarantee that for the moment someone deploys the app, which isn't working that way as of now because of imagePullPolicy: IfNotPresent.
It won't mean it's the :latest some prolonged time from the moment someone deployed the app.

This is how K8s works and have always been working by default everywhere for the :latest tag, I've mentioned that above

I'd prefer sticking with the widely accepted defaults (for imagePullPolicy) rather than inventing something new.

And should we want :latest to actually mean they will always be :latest, we should probably look into implementing something like https://keel.sh which would automatically take care updating the images whenever there is a new behind the :latest tag. But there should be a toggle for that, i.e. services.<name>.image_auto_update: true for example.

As a compromise, the imagePullPolicy could be set via the SDL i.e. services.<name>.imagePullPolicy: always for example.

@andy108369
Copy link
Contributor

andy108369 commented Jul 14, 2022

To get the default image pull policy, we should only remove this line https://github.com/ovrclk/akash/blob/v0.16.4/provider/cluster/kube/builder/workload.go#L56

From the doc

When you first create a Deployment, StatefulSet, Pod, or other object that includes a Pod template, then by default the pull policy of all containers in that pod will be set to IfNotPresent if it is not explicitly specified. This policy causes the kubelet to skip pulling an image if it already exists.

So K8s sets PullIfNotPresent by default, just like we do, except that when we do it, we ignore these special conditions:

Default image pull policy
When you (or a controller) submit a new Pod to the API server, your cluster sets the imagePullPolicy field when specific conditions are met:

  • if you omit the imagePullPolicy field, and the tag for the container image is :latest, imagePullPolicy is automatically set to Always;
  • if you omit the imagePullPolicy field, and you don't specify the tag for the container image, imagePullPolicy is automatically set to Always;
  • if you omit the imagePullPolicy field, and you specify the tag for the container image that isn't :latest, the imagePullPolicy is automatically set to IfNotPresent.

https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting

As @tidrolpolelsef said, it's one of those weird config options where 'unset' has its own unique meaning :-)

@andy108369 andy108369 changed the title provider would not pull the new image when the tag is same use k8s cluster default imagePullPolicy Jul 15, 2022
@andy108369
Copy link
Contributor

PR akash-network/provider#54

@andy108369
Copy link
Contributor

andy108369 commented Nov 28, 2022

FWIW, I've tested this further, apparently the HEAD requests are used for obtaining the image reference and they aren’t triggering the Docker Hub rate limiter upon Pod restart even with the imagePullPolicy: Always, as long as image reference haven't been updated on the remote. I.e., this triggers the rate limiter only when it actually pulls the new image, not just queries (HEAD) for the new one.

https://asciinema.org/a/541059

The Docker Hub API isn't too restrictive (100 pulls per 6 hours per IP address). Source

@anilmurty
Copy link

Wanted to update this thread after my discussion with @boz today:
Pros of supporting/ allowing users to use the "latest" tag in SDLs (and pulling a fresh uncached image when used or no tag specified):

  1. Allows autoupdating of container images in cases like presearch
  2. Allows us to potentially build a CI/CD style pipeline where new artifacts get deployed automatically to akash
  3. Allows easier integration with solutions like Fleek.co
  4. Is consistent with Kubernetes default behavior

For these reasons we are going to work to allow this. The only concern/ downside is making sure we document this behavior clearly and note that a running pod will not be updated unless the pod is terminated (forcing a new pod to be spun up, which will result in the "latest" latest image to be pulled down) - which I believe is how solutions like the presearch "autopdater" handle this anyway (their autoupdater is a fork of https://github.com/containrrr/watchtower)

@andy108369
Copy link
Contributor

andy108369 commented Nov 28, 2022

forcing a new pod to be spun up, which will result in the "latest" latest image to be pulled down

And this (the pod restart) can be easily triggered from within the pod itself or the outside (through lease-shell, then kill <child process of PID 1> inside the pod).
Either way, this can also be scheduled, etc

@vpavlin
Copy link

vpavlin commented Aug 19, 2024

Completely agree - this isn't just about latest, but any tag (test, dev, stage, rolling..) - I should have the freedom to be able to deploy from a tag and have deployed what image hash the tag is actually pointing to.

+100 for imagePullPolicy: always:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repo/provider Akash provider-services repo issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants