Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
surface3-spi: workaround: disable DMA mode to avoid crash by default
On Arch Linux kernel at least after 4.19, touch input is broken after suspend (s2idle). kern :err : [ +0.203408] Surface3-spi spi-MSHW0037:00: SPI transfer timed out On recent stable Arch Linux kernel (at least after 5.1), touch input will crash after the first touch. kern :err : [ +0.203592] Surface3-spi spi-MSHW0037:00: SPI transfer timed out kern :err : [ +0.000173] spi_master spi1: failed to transfer one message from queue I found on an affected system (Arch Linux kernel, etc.), the touchscreen driver uses DMA mode by default. Then, we found some kernels with different kernel config (5.1 kernel config from Jakeday [1] or Chromium OS kernel chromeos-4.19 [2]) will use PIO mode by default and no such issues there. So, this commit disables DMA mode on the touchscreen driver side as a quick workaround to avoid touch input crash. We may need to properly set up DMA mode to use the touchscreen driver with DMA mode. You can still switch DMA/PIO mode if you want: switch to DMA mode (maybe broken) echo 1 | sudo tee /sys/module/surface3_spi/parameters/use_dma back to PIO mode echo 0 | sudo tee /sys/module/surface3_spi/parameters/use_dma Link to issue: jakeday/linux-surface#596 References: [1] https://github.com/jakeday/linux-surface/blob/master/configs/5.1/config [2] https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19 Tested on Arch Linux 5.4.1 with Surface 3, which will use DMA by default. This commit made the driver use PIO by default and no touch input crash. Also tested on chromeos-4.19 4.19.90 with Surface 3, which will use PIO by default even without this commit. After this commit, it still uses PIO and confirmed no functional changes regarding touch input. More details: We can confirm which mode the touchscreen driver uses; first, enable debug output: echo "file drivers/spi/spi-pxa2xx.c +p" | sudo tee /sys/kernel/debug/dynamic_debug/control echo "file drivers/input/touchscreen/surface3_spi.c +p" | sudo tee /sys/kernel/debug/dynamic_debug/control Then, try to make a touch input and see dmesg log (On Arch Linux kernel, uses DMA) kern :debug : [ +0.006383] Surface3-spi spi-MSHW0037:00: 7692307 Hz actual, DMA kern :debug : [ +0.000495] Surface3-spi spi-MSHW0037:00: surface3_spi_irq_handler received -> ff ff ff ff a5 5a e7 7e 01 d2 00 80 01 03 03 18 00 e4 01 00 04 1a 04 1a e3 0c e3 0c b0 00 c5 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff (On the kernels I referenced above, uses PIO) kern :debug : [ +0.009260] Surface3-spi spi-MSHW0037:00: 7692307 Hz actual, PIO kern :debug : [ +0.001105] Surface3-spi spi-MSHW0037:00: surface3_spi_irq_handler received -> ff ff ff ff a5 5a e7 7e 01 d2 00 80 01 03 03 24 00 e4 01 00 58 0b 58 0b 83 12 83 12 26 01 95 01 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
- Loading branch information
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which DMA driver is in use? dw_dmac ?
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe, according to the following output:
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anybody try this touchscreen (w/o above mentioned patch!) on recent (v5.5-rc6) kernels? There were few changes to LPSS power handling between v4.19 and v5.5.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the info! I tried v5.5-rc6 with this Arch Linux 5.4.1-arch1 based config (and w/o the above patch), but still reproducible.
I'll try to report this issue (maybe Bugzilla?) later.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel BugZilla works for me. Thanks!
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Oh, I saw you committed surface3-wmi driver before!)
OK, I'll tell you when I submitted a bugreport to Bugzilla.
By the way, Interestingly, I can fix the touch input crash with DMA mode by reloading only
spi_pxa2xx_platform
module after (and only after) the crash(with causing
kernel NULL pointer dereference
).log: reloading only `spi_pxa2xx_platform` after touch input crash (plaintext)
log with color (html)
reloading-only-spi_pxa2xx_platform-after-touch-input-crash.html.gz
When I reload
spi_pxa2xx_platform
module withsurface3_spi
like this (after touch input crash), I can avoid thekernel NULL pointer dereference
,but the touch input crash is still reproducible after this reload.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel crash happens due to error in the error handling while removal process is ongoing.
I'm wondering if below patch helps:
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried with the patch (plus debug print).
This time, caused another kernel NULL pointer dereference:
log: reload-with-patch-after-touch-input-crash
log with color (html)
reload-with-patch-after-touch-input-crash.html.gz
Anyway, this dereference will happen only after the touch input crash happened. It may become no problem after fixing the touch input crash.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are two different problems (I know about crash one, but wouldn't be able to reproduce). It's good you may reproduce both.
I'll try to come up with the patch to fix this one, so, we will have one problem less to fix.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an attempt to fix crash (as far as I understood its root cause)
https://github.com/andy-shev/linux/tree/topic/spi/reload
P.S. there is a last patch for that.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work.
By the way, what distro are you using on your Surface 3? Maybe Fedora?
I may need to check on the distro, too.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and seems that you're working on DMI matching, too.
Have you ever heard of corrupted DMI table?
Some owners of Surface 3 (including me) are affected the corruption and have to add the
corrupted DMI info for quirks: #29
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that part work now as expected (disregard DMA issue)?
I'm using Buildroot in kexec mode.
I even published a branch (https://github.com/andy-shev/buildroot/tree/intel) what we are using in our team.
It would be hard for mere users.
Nope. But I have to monitor from now on since you mentioned it.
I commented there my opinion.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I need to apply only this commit (andy-shev/linux@e7b3040)?
I applied the commit. This time, breaks
spi_pxa2xx_platform
module unloading even when the touch input crash is not happening:log: WIP-spi-pxa2xx-Try-to-avoid-race-at-shutdown
WIP-spi-pxa2xx-Try-to-avoid-race-at-shutdown.html.gz (log with color)
Understood.
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for testing! I have updated it.
P.S. Meanwhile I'm trying to enable necessary options in kernel to have some possibilities to test on MS Surface 3 (Unfortunately due to circumstances it has no touchscreen, it had been broken in order to connect device to be available remotely, Perhaps I may use something else for SPI test)
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorry that I can't test the updated patch yet! I can test that in this weekend.
and I'll try to submit bugreport to bugzilla in this weekend, too.)
66c8401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for waiting!
I submittied a bugreport: https://bugzilla.kernel.org/show_bug.cgi?id=206403
I also tried your new commit: andy-shev/linux@5c1b31e, still causing the kernel NULL pointer dereference:
WIP-spi-pxa2xx-Try-to-avoid-race-at-shutdown-5c1b31e55026eb7987809d5f345cf4e23358c85f
WIP-spi-pxa2xx-Try-to-avoid-race-at-shutdown-5c1b31e55026eb7987809d5f345cf4e23358c85f.html.gz