NAS-129905 / 24.10 / Mark GPU as critical if it has devices which are in an iommu group which has a critical device #13976
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When we want to passthrough a GPU, what needs to happen is that all the IOMMU groups with all their devices including ones which are not GPU related in which the GPU's devices are placed need to be isolated.
Currently we had validation in place where we didn't allow to isolate a GPU if any of it's devices were critical for the system like CPU/memory etc - however this can result in a scenario where the following happens:
A GPU having a device which is in an IOMMU group which has a critical device, so when that GPU is going to be configured for passthrough and an attempt to start the VM is going to be made, that will crash.
Solution
Properly mark a GPU as critical covering the case discussed above so we don't allow isolating such GPU's in the first place. Secondly a reasonable critical reason has been added as well which will clarify why the GPU has been marked as critical.