-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] Run GPU workload in Gardener cluster and provide concept how to enable GPU in Kyma Runtime #18771
Comments
Progress updateI was able to build and run nvidia drivers using fork of https://github.com/gardenlinux/gardenlinux-nvidia-installer.
License analysisThe drivers are not distributed with gardenlinux due to the NVIDIA license. The statements in the license clearly say that Given that, I would rather avoid distributing the driver using docker images. We can protect images with the secret, but our users have access to the image pull secret and we cannot fully control who has access to the image and can download it. Nevertheless, that approach is suitable only for our own teams. We cannot redistribute drivers to external customers. RecommendationI suggest building Kyma module to download, compile, and install the driver when needed. The daemonset can be created using gardenlinux docker image that contains all kernel header files required for compilation. |
Another ides from @a-thaler: |
Based on the outcomes, we agreed to establish a new tutorial like sample application https://github.com/kyma-project/gpu-driver which can be deployed in a manual way.
On an OS upgrade, the application will restart and automatically apply the proper driver dependent on the new OS version. The tool will evolve into a Kyma module in mid-term. |
Challenges:
-> for the upgrade phase multiple daemonsets must be running and daemonsets must have long-running pods which is not needed and a problem in regards to security Proposal: shift to a simple operator which
|
Users want to run their applications on GPU. In order to execute code that requires GPU you need proper drivers installed on the node. Investigate what is needed and propose a concept of automating this process. These are the aspects to cover:
The text was updated successfully, but these errors were encountered: