Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid polling and other inefficiencies #9

Open
danielzgtg opened this issue Mar 8, 2021 · 3 comments
Open

Avoid polling and other inefficiencies #9

danielzgtg opened this issue Mar 8, 2021 · 3 comments

Comments

@danielzgtg
Copy link

danielzgtg commented Mar 8, 2021

I found that the userspace drivers are wasting a lot of CPU (~3%) while waiting, even when the touchscreen is completely idle. The only way userspace becomes knowledgeable of changes is by polling get_device_ready and get_doorbell every millisecond, and that is wasting power. Decreasing the polling frequency isn't a solution because that would only introduce latency.

There are many better alternatives to polling:

  • Have get_device_ready and get_doorbell block until new data is ready
  • Support actual nonblocking IO such as select, poll, and epoll
  • Have those be blocking character devices

Additionally, I'm not sure how efficient copying the buffers to userspace is. Performance might be improved if we allow userspace to mmap the 16 buffers instead of reading from them

@qzed
Copy link
Member

qzed commented Mar 8, 2021

That's a bit tricky. IIRC we can't do anything other than polling the hardware by checking the doorbell due to how it used to work before Intel restructured their firmware (@StollD can give you more details here, that's a part that I'm still not quite familiar with). Either in user-space (AFAIK the decision for that was because it's easier to play around with) or in the kernel driver (which might be better for performance and user-friendliness, as that would essentially allow emulation of polling interfaces).

I agree that there are usually many better alternatives than polling, but unfortunately if the hardware doesn't give you a clue (i.e. some sort of interrupt) that there's new data available, polling is the only option. Although that 3% seems a bit much, for me iptsd isn't noticeable when idle (read 0.0% in htop, it's ~3% when I touch something).

@StollD can probably tell you more about all the wonderful details of the IPTS/ME hardware.

Regarding mmaping: I think there always has to be one copy. Either DMA buffer to mmaped buffer or DMA buffer to read buffer. Don't think that makes much difference, maybe that could save a small bit of validation overhead for the kernel-to-user copy of the read buffer, but I wouldn't expect much.

@StollD
Copy link
Member

StollD commented Mar 9, 2021

That's a bit tricky. IIRC we can't do anything other than polling the hardware by checking the doorbell due to how it used to work before Intel restructured their firmware (@StollD can give you more details here, that's a part that I'm still not quite familiar with).

Yeah, pretty much. The doorbell is basically a leftover of how IPTS worked with GuC submission. The doorbell is just a u32, but incrementing it triggers an interrupt in the GuC firmware, so that it can schedule the new data to be processed.

Since IPTS only gives us the doorbell, all we can do is poll. Either in the kernel or outside of it. And because I'd like to keep the driver as simple as possible, I moved the responsibility for polling out of it.

iptsd tries to mitigate it a bit by using a high polling frequency only if it received data in the last 5 seconds. If no data comes in for longer than that it will lower the frequency and poll less.

I wanted to look into mmap in the past but havent found time for playing around with it yet.

@danielzgtg
Copy link
Author

iptsd tries to mitigate it a bit by using a high polling frequency only if it received data in the last 5 seconds

Oh, I didn't notice that as I only looked through ipts-dbg.

read 0.0% in htop, it's ~3% when I touch something

iptsd was 3% a few months ago. I just checked today in htop and it's down to 0% like you said.

I just implemented a backoff in my own driver and it's also down to 0% now. Unfortunately, it's not as optimized, so it still goes up to 12% when touched.

If no data comes in for longer than that it will lower the frequency and poll less

I guessing that excludes the type 3 size 64 messages we get every second?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants