Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes integration #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

kubernetes integration #3

wants to merge 1 commit into from

Conversation

boniek83
Copy link

@boniek83 boniek83 commented Apr 13, 2022

This is implementation of per pod gpu monitoring from #1

Example in example/kubernetes assumes rdc already contains rdc_prometheus_py patch (of course you can just ADD prepatched rdc_prometheus_py to Dockerfile if you want to test it right now).

You need to build container image and push it to your container image repository and modify some things in rdc.yaml file: location of both container images, nodeSelector (to match label of worker nodes that contain AMD GPUs) and podresources-api's volume location - in my case it was on a host machine.

Rdc and rdc_prometheus.py don't have to be inside of kubernetes to make it work - it is just easier that way to make an example.

Tested and works in production on kubernetes 1.21. Example output:
https://gist.github.com/boniek83/7eaefe7f46edad1ef28046118c354c17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant