-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"kernel NULL pointer dereference" with sustained network traffic #341
Comments
BTW, I'm using firmware from cset 320084a. I'm now building the latest kernel (cset b6a2703) to confirm the issue is still present. Edit:
|
Here's my cmdline.txt:
And here's my config.txt:
|
Back with some rather good news: it seems that 245f716 with I'll test some more with As a side note, I would very much like to have a defconfig that enables only the basic RPi-related drivers, with nothing fancy: just support for USB, smsc95xx, i2c, sdhci. In short, only what is strictly required to get all of the RPi hardware up-n-running, and no more. Naively, I expected Feel free to close this issue if you want. Thanks! :-) |
bcmrpi_quick_defconfig is a no modules build, that does include all teh built in hardware, so sounds like what you want. |
OK, I've found the one option that causes this issue. If I initially disabled CMA as I don't need it, and need as much memory on the ARM side. But I'm happy to let CMA enabled for now, if it is required. |
Interesting, but I can't see a reason why CMA should be relevant to this panic. |
OK, indeed, I was mislead by another spurious error. My bad, sorry. In fact, the kernel panic occurs with I still have some testing to do, but so far, Edit: Confirmed, only removing |
After further testing, But, another .config based on I'm still trying to narrow down the exact option (or subset of options) that cause the issue... (Damn, I've spent my whole WE on this...). |
Can you post both the crashing config and the non-crashing config? Pastebin or similar would be preferred due to the usual size. You can force concurrency issues if you enable stuff like lock debugging. Mainly as a result of USB driver interactions and badly timed interrupts. |
Here's the crashing .config : http://code.bulix.org/vv3a58-84120?raw Note that they are not related. This crashing one was build up from scratch, while the working one is a trimmed-down bcmrpi_defconfig. I'll look at the enabling/disabling debugging stuff, and see if I can find the one(s) that exhibits the problem. Thanks for the hint! :-) Edit: paste correct working .config (the previous one was a defconfig). |
Back with some more news: |
Can you retry again with both the master and BRANCH=next firmware (provided by rpi-update)? See http://www.raspberrypi.org/forum/viewtopic.php?f=28&t=70437 for info. |
EDIT: forgot to add some context. I've tried raspbian and archlinux
Results:
but in both cases there are a lot of NYET + some callbacks missed overall: massive improvement, at least the system does not crash. I would like to test the full thing, but bluetooth seems to have some issues in 3.10 which are fixed in 3.13 (I connect a PS3 controller, but /dev/input/js0 is deleted shortly after is created). How do I try a FIQ-firmware for 3.13? [ 73.618205] Transfer to device 4 endpoint 0x2 failed - FIQ reported NYET. Data may have been lost. here is dwc in dmesg [ 1.403936] dwc_otg: version 3.00a 10-AUG-2012 (platform bus) pi@raspberrypi ~ $ /opt/vc/bin/vcgencmd version |
Hello! Thanks for the feedback. I'm currently using the frimware from cset a0eb067 (latest on branch master) and kernel 3.12.7 cset 7b3d622 and I have no problem with this combination. It is rock-solid in my experience. I'm not in a position to test further for now, but I'll try to test your suggested firmware later in the week-end. Cheers, |
Hello again! So, here's the feedback I promised. Using the latest firmware (cset a0eb067 on branch master), and latest csets from those branches:
I was able to stress the network without any oops on the RPi, with even more http sessions in parallel (up to ~500). So, for all intents, I consider this bug to be resolved, as I can't reproduce it. You may close it if you also consider this to be resolved on your side. Thank you! :-) Cheers, |
I finally managed to run everything I recompiled kernel 3.13.y-next with FIQ (using bitmask 0x7) I am now able to run internal audio and bluetooth at the same time. and other than a lot of NYET and "dwc_otg_hcd_handle_hc_fsm: 38 callbacks suppressed", which seem to be harmless, it is all working. so this so far fixes my original issue (internal audio + bluetooth). |
These fixes will be merged in due course. Thanks for re-testing. |
Add a test case which replaces an active ingress qdisc while keeping the miniq in-tact during the transition period to the new clsact qdisc. # ./vmtest.sh -- ./test_progs -t tc_link [...] ./test_progs -t tc_link [ 3.412871] bpf_testmod: loading out-of-tree module taints kernel. [ 3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #332 tc_links_after:OK #333 tc_links_append:OK #334 tc_links_basic:OK #335 tc_links_before:OK #336 tc_links_chain_classic:OK #337 tc_links_chain_mixed:OK #338 tc_links_dev_chain0:OK #339 tc_links_dev_cleanup:OK #340 tc_links_dev_mixed:OK #341 tc_links_ingress:OK #342 tc_links_invalid:OK #343 tc_links_prepend:OK #344 tc_links_replace:OK #345 tc_links_revision:OK Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Cc: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
[ Upstream commit 5f1d18d ] Add a test case which replaces an active ingress qdisc while keeping the miniq in-tact during the transition period to the new clsact qdisc. # ./vmtest.sh -- ./test_progs -t tc_link [...] ./test_progs -t tc_link [ 3.412871] bpf_testmod: loading out-of-tree module taints kernel. [ 3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #332 tc_links_after:OK #333 tc_links_append:OK #334 tc_links_basic:OK #335 tc_links_before:OK #336 tc_links_chain_classic:OK #337 tc_links_chain_mixed:OK #338 tc_links_dev_chain0:OK #339 tc_links_dev_cleanup:OK #340 tc_links_dev_mixed:OK #341 tc_links_ingress:OK #342 tc_links_invalid:OK #343 tc_links_prepend:OK #344 tc_links_replace:OK #345 tc_links_revision:OK Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Cc: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 5f1d18d ] Add a test case which replaces an active ingress qdisc while keeping the miniq in-tact during the transition period to the new clsact qdisc. # ./vmtest.sh -- ./test_progs -t tc_link [...] ./test_progs -t tc_link [ 3.412871] bpf_testmod: loading out-of-tree module taints kernel. [ 3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #332 tc_links_after:OK #333 tc_links_append:OK #334 tc_links_basic:OK #335 tc_links_before:OK #336 tc_links_chain_classic:OK #337 tc_links_chain_mixed:OK #338 tc_links_dev_chain0:OK #339 tc_links_dev_cleanup:OK #340 tc_links_dev_mixed:OK #341 tc_links_ingress:OK #342 tc_links_invalid:OK #343 tc_links_prepend:OK #344 tc_links_replace:OK #345 tc_links_revision:OK Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Cc: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Using kernel rpi-3.6.y as of cset 245f716, I get repeatable kernel panics due to the kernel being "Unable to handle kernel NULL pointer dereference at virtual address 0000000d" :
To reproduce this, I simply run busybox' httpd applet to serve my /boot directory:
Then, from another machine on the same LAN segment, I fire a lot of concurrent downloads of my zImage file:
Then, sooner or later (rather sooner than later, in fact), I get the above kernel panic.
Here is my kernel's .config: http://code.bulix.org/vv3a58-84120?raw
The text was updated successfully, but these errors were encountered: