Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: failed to determine memory area for node: open /sys/devices/system/node/node0/memory_failure/state: no such file or directory #341

Closed
matildeY opened this issue May 3, 2023 · 13 comments

Comments

@matildeY
Copy link

matildeY commented May 3, 2023

Failed to get topologyNodes return error
failed to determine memory area for node: open /sys/devices/system/node/node0/memory_failure/state: no such file or directory

1- since latest linux kernel there is a new directory called "memory_failure" that matches pattern defined in function memoryTotalPhysicalBytesFromPath and makes it failed.

2- is it possible to call "memTotalPhysicalBytes" instead of "memoryTotalPhysicalBytesFromPath" in function "AreaForNode" which contains a fallback function to the syslog approach ?

@jaypipes
Copy link
Owner

jaypipes commented May 3, 2023

Hi @matildeY! Thanks for submitting this bug report. Do you think that the fix for a related bug will resolve this issue? I'm planning on cutting a ghw release with that fix today (was waiting on a couple other PRs to merge first). If you could pull the latest main branch and check on your system whether the issue still appears that would be super helpful. Thank you!

@jaypipes jaypipes added the bug label May 3, 2023
@matildeY
Copy link
Author

matildeY commented May 3, 2023

hi, thanks for your answer, before submitting this bug i've tried with the main branch and the error is still present.

The memoryTotalUsableBytesFromPath call is ok and does not return error

@jaypipes
Copy link
Owner

jaypipes commented May 3, 2023

hi, thanks for your answer, before submitting this bug i've tried with the main branch and the error is still present.

The memoryTotalUsableBytesFromPath call is ok and does not return error

OK, thank you for verifying that @matildeY! I will push a fix ASAP.

@jaypipes jaypipes added the memory label May 3, 2023
jaypipes added a commit that referenced this issue May 6, 2023
The `/sys/devices/system/memory` and `/sys/devices/system/node/nodeX`
subdirectories contains one or more subdirectories that begin with the
word "memory" and end in a 0-based cell/block index for that memory.
e.g. `/sys/devices/system/node/node0/memory63` is a directory containing
information files about the 64th memory block in NUMA node 0.

Previously, code that gathered total physical memory by looking at these
subdirectories was using a simple glob on `memory*` to determine those
memory block subdirectories. However, in some recent Linux kernels a
`/sys/devices/system/memory/memory_failure` file causes that simple glob
to backfire. This patch replaces the simple glob with a read of the
`/sys/devices/system/memory` or `/sys/devices/system/node/nodeX`
directory and regex matches on `memory\d$` to determine if the
subdirectory is a memory block one.

Fixes Issue #341

Signed-off-by: Jay Pipes <[email protected]>
@asiffer
Copy link

asiffer commented Jun 8, 2023

Hi! I got the same warning when calling ghw.GPU :
WARNING: failed to determine memory area for node: open /sys/devices/system/node/node0/memory_failure/state: no such file or directory
(it is hidden when I use the ghw.WithDisableWarnings() option)

@jaypipes
Copy link
Owner

jaypipes commented Jun 8, 2023

Hi! I got the same warning when calling ghw.GPU : WARNING: failed to determine memory area for node: open /sys/devices/system/node/node0/memory_failure/state: no such file or directory (it is hidden when I use the ghw.WithDisableWarnings() option)

Hi @asiffer! What version of ghw are you using? I believe I fixed that issue but have not yet cut a new release of ghw that includes it.

@asiffer
Copy link

asiffer commented Jun 8, 2023

Thanks @jaypipes for your quick response!
v0.10.0 in my go.mod file

@jaypipes
Copy link
Owner

jaypipes commented Jun 8, 2023

@asiffer I just cut a new release: https://github.com/jaypipes/ghw/releases/tag/v0.11.0

Wait a bit for Go mod/packaging to publish the new package and then update your go.mod. Should be good to go after that! :)

@asiffer
Copy link

asiffer commented Jun 8, 2023

Awesome 🤩 Thanks a lot!

@ffromani
Copy link
Collaborator

@asiffer @matildeY
Hi! could you please kindly confirm version v0.11.0 fixes your issues, so we can close this bug? Thanks!

@asiffer
Copy link

asiffer commented Jun 20, 2023

Yes the last version fixed it 👍🏻 Thanks again

@matildeY
Copy link
Author

yes, it's ok for me. Thanks

jaypipes added a commit to jaypipes/k8s-resource-topology-exporter that referenced this issue Mar 4, 2024
This commit uplifts the github.com/jaypipes/ghw dependency to the
v0.12.0 release (released July 2023). Included in the v0.11.0 release
was a fix for detecting memory areas in hardware with NUMA architecture
and Linux kernels > 5.19.0 as well as ARM platforms.

We are running resource-topology-exporter in our environments and ran
into the aforementioned bug in ghw 0.9.0, thus this commit to uplift to
a more modern ghw release.

Related jaypipes/ghw#341

Signed-off-by: Jay Pipes <[email protected]>
ffromani pushed a commit to k8stopologyawareschedwg/resource-topology-exporter that referenced this issue Mar 5, 2024
This commit uplifts the github.com/jaypipes/ghw dependency to the
v0.12.0 release (released July 2023). Included in the v0.11.0 release
was a fix for detecting memory areas in hardware with NUMA architecture
and Linux kernels > 5.19.0 as well as ARM platforms.

We are running resource-topology-exporter in our environments and ran
into the aforementioned bug in ghw 0.9.0, thus this commit to uplift to
a more modern ghw release.

Related jaypipes/ghw#341

Signed-off-by: Jay Pipes <[email protected]>
(cherry picked from commit ea4bb57)
@EnumuratedDev
Copy link

Doesn't seem to be fixed. I am getting this warning with 0.12.0

@ffromani
Copy link
Collaborator

Doesn't seem to be fixed. I am getting this warning with 0.12.0

Hi, could you please give more details? e.g. which kernel version are you running?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants