Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The latest Docker images are not pulled on tutor k8s #662

Closed
lpm0073 opened this issue May 12, 2022 · 10 comments
Closed

The latest Docker images are not pulled on tutor k8s #662

lpm0073 opened this issue May 12, 2022 · 10 comments

Comments

@lpm0073
Copy link

lpm0073 commented May 12, 2022

Bug description
When deploying to a Kubernetes cluster using v12.2.2, and with the mfe plugin having been previously enabled and the image subsequently rebuilt, the latest version of the mfe image is not deployed into the cluster. I'm using the following commands to enable the mfe plugin:

      pip install tutor-mfe
      tutor plugins enable mfe
      tutor config save --set MFE_DOCKER_IMAGE=765796256872.dkr.ecr.us-east-1.amazonaws.com/mfe:latest
      
      # more configuration stuff happens here
      # ......

      tutor k8s start
      tutor k8s init

However, the mfe pod seems to be ignored by tutor on subsequent redeployments regardless of the state of the pod relative to whatever image contains the "latest" tag. This behavior is different from that of the openedx image using the same command patterns, which will detect changes to the state of the image and will automatically replace the pod with the latest version of the image.

How to reproduce

Given the following in my AWS ECR repository, if the Kubernetes cluster is presently running a pod pulled from 13.1.5-202205120033 (the 2nd version of the image from the top) and I redeploy using the command above then I'm expecting that tutor will detect that the sha of the latest image is different from that inside the running pod and it will therefore replace the pod using the new, latest image.
Screen Shot 2022-05-12 at 6 54 16

Environment

Additional context

@regisb
Copy link
Contributor

regisb commented May 12, 2022

Hey @lpm0073,
This does not look like an issue with Tutor, but more like with Kubernetes. I suggest you close this issue and move the conversation to the forum: https://discuss.overhang.io/

Same for the following two issues from the ecommerce plugin:
overhangio/tutor-ecommerce#25
overhangio/tutor-ecommerce#26

And the mandatory reminder :) https://docs.tutor.overhang.io/troubleshooting.html

@lpm0073
Copy link
Author

lpm0073 commented May 15, 2022

checking in on this. I'm still researching, and have yet to find a suitable solution. big picture it seems that the pull policy that tutor creates for k8s needs to be modified so that deployment manifests specify to always check for changes to the sha of the latest tag. however, i'm uncertain of potential side affects of this -- hence the ongoing research.

@regisb
Copy link
Contributor

regisb commented May 16, 2022

Closing this for now.

@regisb regisb closed this as completed May 16, 2022
@lpm0073
Copy link
Author

lpm0073 commented May 31, 2022

this is still an open issue.

following some additional research, and consistent with your comments in your email from 12-may, Kubernetes does not re-pull a container as part of deployment. that being the case, tutor really should explicitly force the containers to be replaced bc otherwise a deployment (or more precisely, a re-deployment) is knowingly incomplete.

as a work-around i'm going to publish this Github Action -- https://github.com/lpm0073/aws-eks-delete-deployment -- that will delete a deployment in a namespace in a cluster, if it exists. We'll get the desired effect, for now, by calling this action immediately prior to deploying with tutor.

@regisb regisb changed the title tutor k8s init does not deploy latest build of mfe The latest Docker images are not pulled on tutor k8s Jun 7, 2022
@regisb
Copy link
Contributor

regisb commented Jun 7, 2022

To clarify: in your case, the latest images are not pulled because the imagePullPolicy is set to IfNotPresent (the default). The recommended solution is usually not to use "latest" version tags but instead to tag every image differently: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy

That being said, I agree that not automatically pulling the images might be confusing for Open edX operators. Should we change the imagePullPolicy to "Always" for Tutor-managed containers? (openedx, forum, etc.) I'd like to ask others about this: @fghaas @angonz what do you think?

@fghaas
Copy link
Contributor

fghaas commented Jun 7, 2022

I for one think the current imagePullPolicy makes sense. What we've been doing is every time we create a new custom image, we bump the tag (by appending -0, -1, -2 etc. to <something>_DOCKER_IMAGE). I think this is a good practice to follow, because it also enables easy rollback to a prior image if anything goes wrong with the most recently built one. You just reset the image reference to the prior one, and you're on your way.

@angonz
Copy link
Contributor

angonz commented Jun 7, 2022

I agree that using latest might lead to unexpected results, mostly when there are many image versions. Using explicit versions is much better and gives more control of what you deploy. @lpm0073 is there any reason why still using latest? Do you think that versioning images would fix this problem?

@regisb
Copy link
Contributor

regisb commented Jun 7, 2022

@fghaas @angonz I agree that using pinned versions is "better" from a platform management perspective. But that's not the question; what I would like to know is whether setting imagePullPolicy: Always would affect your current workflow in any way?

@fghaas
Copy link
Contributor

fghaas commented Jun 7, 2022

Well, it would certainly increase deployment times. The openedx image in particular is huge. So having to pull it every time you want to, say, ramp up capacity in a pinch with additional LMS or CMS containers would definitely slow things down (also, consider that people are getting interested in pod autoscaling; see #677).

And, it encourages bad practices.

And if we follow along the lines of several recent PRs (#622, #675), why should Tutor even concern itself with changing the pull policy when a plugin can always use kustomize to change it?

@regisb
Copy link
Contributor

regisb commented Jun 7, 2022

Well, it would certainly increase deployment times.

OK then I agree this is not acceptable. @lpm0073 I suggest that you follow @fghaas recommendation to tag every image with a different version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants