Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use default kubernetes imagePullPolicy #587

Merged
merged 3 commits into from
Nov 27, 2024

Conversation

yongfengdu
Copy link
Collaborator

@yongfengdu yongfengdu commented Nov 19, 2024

For latest tag, it should use Always.
For tags other than latest, it will use IfNotPresent.
https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting

Description

The summary of the proposed changes as long as the relevant motivation and context.

Issues

#597

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

@yongfengdu yongfengdu added this to the v1.2 milestone Nov 19, 2024
@yongfengdu
Copy link
Collaborator Author

Target for 1.2-dev.
We won't fix this for 1.1 release because IfNotPresent is proper policy for released version.

@yongfengdu yongfengdu marked this pull request as draft November 21, 2024 01:51
@eero-t
Copy link
Contributor

eero-t commented Nov 21, 2024

Target for 1.2-dev. We won't fix this for 1.1 release because IfNotPresent is proper policy for released version.

That is the default when:

  • policy is not specified
  • tag is something else than latest
  • tag has not been updated to/from latest without re-deploying the pod
    • policy for pod is set when it's created and not modified after that, unless it's explicitly set/changed

@yongfengdu yongfengdu marked this pull request as ready for review November 25, 2024 03:10
Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM .. but a question .. is it better to leave blank or specify "always" in image pull policy?

@lianhao
Copy link
Collaborator

lianhao commented Nov 25, 2024

agreed with @mkbhanda , I think we can make it looks nicer by:

in values.yaml, leave the image.pullPolicy as a comment with a little bit explanation, which gives the user some hints that it's configurable, i.e.

image:
  repostitory: opea/abc
  tag: "latest"
  # Uncomment the following line to set desired image pull policy if needed, as one of Always, IfNotPresent, Never. 
  # pullPolicy: ""

In the corresponding deployment.yaml, change the image policy settings as optional:

   ... ...
   {{- if .Values.image.pullPolicy }}
   imagePullPolicy: {{ .Values.image.pullPolicy }}
   {{- end }}

@eero-t
Copy link
Contributor

eero-t commented Nov 25, 2024

Looking at the CI errors...

  • audioqna/speecht5 tests: CI is using stale speecht5-gaudi image version (same as without the ImagePullPolicy fix):
  • charts-validate test: Helm repo missing from CI:
    • Error: no repository definition for [https://zilliztech.github.io/milvus-helm/.](https://zilliztech.github.io/milvus-helm/)
    • => Fix dependency, or add the missing repos via helm repo add
  • GMC test:
    • Response check failed, please check the logs in artifacts
    • => CI checks response before it has fully completed?
  • ChatQnA, TGI & vLLM tests:
    • Shard 0 failed to start
    • RuntimeError: synStatus=8 [Device not found] Device acquire failed
    • => CI Gaudi k8s device plugin error; pod resource request is satisfied, but device is missing?

@yongfengdu
Copy link
Collaborator Author

yongfengdu commented Nov 26, 2024

The helm lint fail is due to "helm dep build", refer to this:
helm/helm#8036
Will need to add the repo first as @eero-t mentioned, #611

@yongfengdu
Copy link
Collaborator Author

Checked the GMC CI chatqna.log, the output file is good with "[DONE]" ending. That's more like an issue from client side, or the "router" sevice of GMC.

  • GMC test:

    • Response check failed, please check the logs in artifacts
    • => CI checks response before it has fully completed?

Gaudi device conflict, rerun should solve this.

  • ChatQnA, TGI & vLLM tests:

    • Shard 0 failed to start
    • RuntimeError: synStatus=8 [Device not found] Device acquire failed
    • => CI Gaudi k8s device plugin error; pod resource request is satisfied, but device is missing?

@eero-t
Copy link
Contributor

eero-t commented Nov 26, 2024

Ok, the other CI issues are now fixed. The 2 remaining ones (audioqna + speecht5 on Gaudi) are both due to CI using stale opea/speecht5-gaudi:latest image with this issue:

 Traceback (most recent call last):
  File "/home/user/comps/tts/speecht5/dependency/speecht5_server.py", line 15, in <module>
    from comps import CustomLogger
  File "/home/user/comps/__init__.py", line 7, in <module>
    from comps.cores.proto.docarray import (
  File "/home/user/comps/cores/proto/docarray.py", line 10, in <module>
    from pydantic import Field, conint, conlist, field_validator
ImportError: cannot import name 'field_validator' from 'pydantic' (/usr/local/lib/python3.10/dist-packages/pydantic/__init__.cpython-310-x86_64-linux-gnu.so)

* Use kubernetes default imagePullPolicy.
Always for latest image and IfNotPresent for others.
* Use 0-latest as dev version for helm charts

Signed-off-by: Dolpher Du <[email protected]>
Signed-off-by: Dolpher Du <[email protected]>
Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The speecht5-gaudi issue is reported at opea-project/GenAIComps#948, which is not related to this change.

@lianhao
Copy link
Collaborator

lianhao commented Nov 27, 2024

We'll let the PR merge in first even with issue opea-project/GenAIComps#948 to avoid huge CI tests cases involved in this PR, and a bunch of other PRs are pending on this one

@lianhao lianhao merged commit 0f21681 into opea-project:main Nov 27, 2024
64 of 66 checks passed
@yongfengdu yongfengdu deleted the pullpolicy branch November 27, 2024 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants