Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia kernel drivers too old to detect 4080 Super card #42

Open
Smokowski opened this issue Sep 2, 2024 · 7 comments
Open

Nvidia kernel drivers too old to detect 4080 Super card #42

Smokowski opened this issue Sep 2, 2024 · 7 comments

Comments

@Smokowski
Copy link

Hello i have a system which i have added as a GPU provider on golem.
The problem is that, since the nvidia driver used by golem is too old, My 4070 Super is only reported as "NVIDIA Graphics Device" Instead of the proper model name.
This will prevent me from getting any customers since they will not know what i am offering.
Please update the driver or implement some other fix so that my GPU name is detected properly

PC Specs:
CPU : Intel i5 13500
GPU: Nvidia RTX 4070 Super
RAM: 64 GB DDR5 ( 4 x 16GB )

My Provider:
Node ID: 0xc2931f4abf619901fff5818b7d86fc17a8677f09

@marmarek
Copy link
Collaborator

marmarek commented Sep 2, 2024

The multi-gpu work includes golemfactory/golem-nvidia-kernel#6 which bumps nvidia driver version to 535 (from 525). I'm not sure if that's enough for your card.

@Smokowski
Copy link
Author

Seems like 535 is on the older side of things as RTX 4080 Super seems to rely on newer one at least according to one of our users. Is there a possibility to bump the driver further to make sure it supports newer cards?

@fepitre
Copy link
Contributor

fepitre commented Sep 2, 2024

Does 550 could work?

@Smokowski
Copy link
Author

Ye i would assume that 550 should pick up even the newest 4080 super cards

@fepitre
Copy link
Contributor

fepitre commented Sep 3, 2024

I think we can bump version, but it needs to be done also in ya-runtime-vm-nvidia.

@cryptobench
Copy link
Member

cryptobench commented Sep 23, 2024

@fepitre @marmarek @prekucki the community member is still not happy and he is not receiving any tasks of course since nobody knows what they're renting. He created a post on the VFIO subreddit to try and understand what was happening and that the error is not on his side.

https://www.reddit.com/r/VFIO/comments/1fn81ut/set_pass_through_pci_device_name/

It doesn't work like you think it does.
PCI Devices have a Vendor ID and Device ID. OS has a PCI ID database that tells it what those means, so when you use lspci, you get that nice NVIDIA Corporation and product name. Whatever OS your guest is using doesn't have that database updated, so it just tells you the Device ID (2702) that you would get with lspci -nn. It is aware that the Vendor ID is NVIDIA Corporation though, that is why you see it (Cause it has been in use for... decades?).
Basically, your guest OS may be too old to have a PCI database definition containing the GeForce 4080, so you need to update them. Completely unrelated to passthrough itself, would also happen on bare metal

.

@dankasak
Copy link

dankasak commented Oct 3, 2024

Hi there. Just checking ... is this project in active development? I'm also interested in participating, but I see this old driver would be a blocker for me too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants