-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updating AMD bootc image #767
updating AMD bootc image #767
Conversation
yevgeny-shnaidman
commented
Aug 26, 2024
- using OOT driver and firmware instead of in-tree
- moving to DKMS and ROCM 6.1.2 version
1) using OOT driver and firmware instead of in-tree 2) moving to DKMS and ROCM 6.1.2 version Signed-off-by: Yevgeny Shnaidman <[email protected]>
0f7a8a9
to
062841f
Compare
/cc @fabiendupont |
The multi-stage build has too many stages. During the installation of the `amggpu-dkms` package, the modules are built and installed in `/lib/modules/${KERNEL_VERSION}`. If the installation of the package is done in the `driver-toolkit` image, the extra dependencies are very limited. This change removes the `source` stage and installs the `amdgpu-dkms` package on top of `driver-toolkit`. The `amdgpu-dkms` packages installs the modules in `/lib/modules/${KERNEL_VERSION}/extra` and these are the only modules in that folder. The `amdgpu-dkms-firmware` package is installed as a dependency of `admgpu-dkms` and it installs the firwmare files in `/lib/firmware/updates/amdgpu·`. So, this change removes the in-tree `amdgpu` modules and firmware, then copies the ones generated by DKMS in the `builder` stage. The change also moves the repository definitions to the `repos.d` folder and adds the AMD public key to verify the signatures of the AMD RPMs. The users call a wrapper script called `ilab` to hide the `instructlab` container image and the command line options. This change copies the file from `nvidia-bootc` and adjusts the logic. The main change is that `/dev/kfd` and `/dev/dri` devices are passed to the container, instead of `nvidia.com/gpu=all`. The `ilab` wrapper is copied in the `amd-bootc` image. The Makefile is also modified to reflect these changes. Signed-off-by: Fabien Dupont <[email protected]>
Instead of commenting here, I submitted a pull request in @yevgeny-shnaidman's repository: yevgeny-shnaidman#1. |
…vers Remove source stage for amd-bootc
LGTM |
ARG INSTRUCTLAB_IMAGE="quay.io/ai-lab/instructlab-amd:latest" | ||
ARG BASEIMAGE="quay.io/centos-bootc/centos-bootc:stream9" | ||
ARG DRIVER_TOOLKIT_IMAGE="quay.io/ai-lab/nvidia-builder:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense? Using nvidia-builder on amd-bootc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a driver-toolkit image, which contains all the package needed for kernel module compilation. So, it is related to a kernel version, and does not contain any packages that might be specific to nvidia. The name unfortunate, but this is what we have right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed moving the bootc images out of the ai-lab-recipes repository. One of the changes would be to have a quay.io/<org>/driver-toolkit:<kernel-version>
. The driver toolkit images could be hosted in the quay.io/fedora and quay.io/centos orgs.
/lgtm |