Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unable to handle kernel paging request" at kmem_cache_alloc, and hung processes #7987

Closed
vthriller opened this issue Oct 4, 2018 · 4 comments
Labels
Status: Stale No recent activity for issue

Comments

@vthriller
Copy link

vthriller commented Oct 4, 2018

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version
Linux Kernel gentoo-sources-4.9.95
Architecture x86_64
ZFS Version 0.7.9-r0-gentoo
SPL Version 0.7.9-r0-gentoo

Describe the problem you're observing

  • spotted unusual CPU usage statistics (lots of iowait)
  • typed in dmesg and found a bunch of oopses
  • tried random processes like ls on one of the ZFS mountpoints only to see them hang in D state indefinitely
  • typed in htop and found the following in D state:
    • txg_sync
    • khugepaged
    • dbuf_evict
    • and a couple of aforementioned userspace processes (but nothing that I didn't run while poking this thing around)

Describe how to reproduce the problem

No idea what triggered this, was away at the moment when CPU usage jumped up according to monitoring system, and no cron jobs were scheduled around said time either.

I have a slight suspicion that it might have something to do with zram-backed swap, so I'm currently swapping it off to a disk-backed swap, although I doubt that it might affect anything at this point.

Include any warning/errors/backtraces from the system logs

Again, this is what dmesg shows at the moment.

@vthriller vthriller changed the title "unable to handle kernel paging request" at kmem_cache_alloc and hung processes "unable to handle kernel paging request" at kmem_cache_alloc, and hung processes Oct 4, 2018
@vthriller
Copy link
Author

I have a slight suspicion that it might have something to do with zram-backed swap, so I'm currently swapping it off to a disk-backed swap, although I doubt that it might affect anything at this point.

Well, swapoff processes stalled relatively quickly and are not killable, and swapon --show shows the exact same values for well over an hour now. No new kernel log messages though.

Unfortunately this kernel has CONFIG_CRASH_DUMP unset, so I'm going to leave this issue as it is and force-reboot the system after 136 days of uptime.

At last, here are the traces for all blocked processes (sysrq-w).

(#4319 and #6880 is the closest I was able to google, but I'm not sure whether these issues are really that relevant.)

@vthriller
Copy link
Author

I have a slight suspicion that it might have something to do with zram-backed swap

Well, 156 days of uptime later I got the same thing without zram block devices.

@vthriller
Copy link
Author

Well, backtraces didn't change that much from the last time, except for missing ARC functions in the middle of the stack.

try reproducing it with a new version

Thanks, I'm already planning an upgrade for both kernel and ZoL.

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@stale stale bot closed this as completed Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

2 participants
@vthriller and others