Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO slows down everything else #48

Open
denysvitali opened this issue Mar 19, 2024 · 8 comments
Open

IO slows down everything else #48

denysvitali opened this issue Mar 19, 2024 · 8 comments

Comments

@denysvitali
Copy link

The SPX, running the latest kernel Linux surface-pro-x 6.6.8-1, is extremely slow when a lot of IO is being performed.

The device becomes incredibly unresponsive in read or write intensive operations such as:

  1. Network download
  2. Package update / System update
  3. Copying files between media

After a small analysis I was able to determine the reason: the IO seems to be causing the whole system to hang.
To reproduce the issue, one can start an SSD benchmark (e.g: via the gnome-disks application) and look at the CPU usage / IO usage.

When such a test is performed, the disk is quite efficient (140MB/s in read), but the whole system start lagging:
image

During testing, glxgears hangs (or is extremely slow), the CPU is not under pressure and I/O is very high.
This results in a very slow system (everything freezes).

The queue scheduler currently used is none:

$ cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline kyber bfq

Switching this makes things even worse.

Does anyone have any clue on what's going on here? I don't think IO should cause a whole system freeze

@jglathe
Copy link

jglathe commented Mar 19, 2024

Could it be a similar issue like with sc8280xp? We need to have arm64.nopauth as boot parameter to avoid performance issues. I actually watched a similar behaviour on the WDK without it. And @jhovold has said a bit about it in his X13s presentation. Complete cmdline parameters: pd_ignore_unused clk_ignore_unused arm64.nopauth efi=noruntime
Ref, I just dd the internal nvme to a new one (via USB-C nvme enclosure), and it gives ~600MB/s transfer rate. And no stalling of other USB usage.

@denysvitali
Copy link
Author

So, the arm64.nopauth didn't really change anything on my side - but it was a good pointer (pun intended) because it led me to the OpenSUSE page mentioning that.

After adding arm64.nopauth iommu.passthrough=0 iommu.strict=0 I managed to partially solve the problem and get 1.1+ GB/s of transfer with the SSD:
image

The SSD operations do not seem to block the whole system anymore. Thanks!
I'll be curious to test if this affect the Wi-Fi too (#45) - but I don't have my super-fast Wi-Fi network right now, so I can't really test.

@jhovold
Copy link

jhovold commented Mar 19, 2024

Yeah, arm64.nopauth has nothing to do with performance and is only needed to work around a bug in the Lenovo firmware which prevents the X13s to boot.

The iommu parameters were also only used as a workaround until the underlying issue, which turned out to be a display driver bug, was fixed.

Please make sure to check out my wiki at:

https://github.com/jhovold/linux/wiki/X13s

for the current mainline status for the X13s (and sc8280xp). It should always be more up to date compared to the distribution wikis that use it as a source.

@jglathe
Copy link

jglathe commented Mar 19, 2024

So this looks like surface-pro-x kernel has similar issues than the X13s one had a while back. Maybe worth a look to compare patches.

@qzed
Copy link
Member

qzed commented Mar 19, 2024

6.6 is kind of old by now, so hopefully things work better with 6.8. I tried updating the patches yesterday but a simple rebase broke the display... so I need a bit more time for that (hopefully I'll have that by next week though). And I'll also have a look at which of the sc8280xp/lenovo patches could be helpful.

@denysvitali
Copy link
Author

Is there any plan to upstream our patches, so that we can run linux mainline in the future?

@qzed
Copy link
Member

qzed commented Mar 19, 2024

@denysvitali As soon as I find the time for it... I need to debug a (likely) uefisecapp crash first though... (that's already upstream but we think it seems to be a bit picky about using any memory for DMA with the trust-zone). And I think there's still a bit of clean-up required. But yeah, I hope that we can get all of this upstream eventually.

@qzed
Copy link
Member

qzed commented Mar 28, 2024

I have some preliminary patches for v6.8 at https://github.com/linux-surface/kernel/tree/spx/v6.8 if any one wants to try those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants