kubernetes integration #3

boniek83 · 2022-04-13T14:27:32Z

This is implementation of per pod gpu monitoring from #1

Example in example/kubernetes assumes rdc already contains rdc_prometheus_py patch (of course you can just ADD prepatched rdc_prometheus_py to Dockerfile if you want to test it right now).

You need to build container image and push it to your container image repository and modify some things in rdc.yaml file: location of both container images, nodeSelector (to match label of worker nodes that contain AMD GPUs) and podresources-api's volume location - in my case it was on a host machine.

Rdc and rdc_prometheus.py don't have to be inside of kubernetes to make it work - it is just easier that way to make an example.

Tested and works in production on kubernetes 1.21. Example output:
https://gist.github.com/boniek83/7eaefe7f46edad1ef28046118c354c17

kubernetes integration

09c2851

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes integration #3

kubernetes integration #3

boniek83 commented Apr 13, 2022 •

edited

Loading

kubernetes integration #3

Are you sure you want to change the base?

kubernetes integration #3

Conversation

boniek83 commented Apr 13, 2022 • edited Loading

boniek83 commented Apr 13, 2022 •

edited

Loading