Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update kernel version reported by uname -r #11117

Open
thundergolfer opened this issue Nov 4, 2024 · 9 comments
Open

Update kernel version reported by uname -r #11117

thundergolfer opened this issue Nov 4, 2024 · 9 comments
Labels
type: enhancement New feature or request

Comments

@thundergolfer
Copy link
Contributor

Description

Currently gVisor reports the kernel version as 4.4.0 and this version hasn't been updated in 5+ years: ebe8001

I'm wondering about a couple things:

  1. Could this version number be bumped higher?
  2. Is it possible to comment on this value why the version is set at this version and what the policy is around updating it?

Is this feature related to a specific bug?

No specific bug, but a user thought we were running a very old (2016) Linux kernel and that prompted me to look into this 🙂.

Do you have a specific solution in mind?

No response

@thundergolfer thundergolfer added the type: enhancement New feature or request label Nov 4, 2024
@ayushr2
Copy link
Collaborator

ayushr2 commented Nov 4, 2024

The last bump (3.11.10 -> 4.4.0) happened 6 years back: 5a0be6f.

I don't think we have any explicit policy around it yet. I agree we should probably bump it to some 6.x version now.

@thundergolfer
Copy link
Contributor Author

Thanks, sounds good to me!

@EtiennePerot
Copy link
Contributor

Bumping this version isn't just a matter of increasing the version counter; as per the last time this was done (5a0be6f), this requires also adding stub handlers for the system call numbers that were added since 4.4. This is so that error messages are more descriptive, the auto-generated syscall table remains up-to-date with the syscalls that match the kernel version number gVisor reports, and so that the telemetry for unimplemented syscalls (the counter metric that tracks the calls to unimplemented syscalls) works for those system calls. Patches welcome :)

@Ammar-Alnagar
Copy link

i am having an issue where my training script hangs due to this bug ,
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher

is there a way to upgrade the kernel for myself ? or a way to bypass it ?

@EtiennePerot
Copy link
Contributor

@Ammar-Alnagar, it is unlikely that the application is hanging specifically because of the reported kernel version.

My guess from reading this message is that it's warning the user about some specific Linux bug that got fixed in version 5.5.0. But gVisor isn't Linux. Changing gVisor's reported version to 5.5.0 will not change the behavior of the gVisor kernel. So I suggest filing a separate bug with more details about what application you are running and gVisor logs in order to investigate why the script is hanging.

@Ammar-Alnagar
Copy link

Ammar-Alnagar commented Jan 13, 2025

@EtiennePerot Oh ok , thanks for the response

@Ammar-Alnagar
Copy link

@EtiennePerot , I just checked and my issue is directly related to the gVisor Kernel version , as it detects the gVisor kernel and just hangs there.
I dont really know why but that what the "accelerate" library team said when i emailed them.
is there perhaps a workaround for me ?

@EtiennePerot
Copy link
Contributor

EtiennePerot commented Jan 13, 2025

This warning appears to be coming from a Python library:
https://github.com/huggingface/accelerate/blob/f0b030554cbcd01c5541c449e92066715f21a99e/src/accelerate/utils/other.py#L320-L335

So you could always just monkeypatch it out.

import accelerate.utils
accelerate.utils.check_os_kernel = lambda: None

If the program still hangs, then you can now be certain that the problem isn't a matter of checking the kernel version, and therefore this is not the same bug as this one.

@Ammar-Alnagar
Copy link

I will be sure to try it , Thanks alot for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants