Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Use a mutex guard for split shared memory #16647

Merged
merged 1 commit into from
Apr 19, 2022

Conversation

KarlK90
Copy link
Member

@KarlK90 KarlK90 commented Mar 14, 2022

Description

...on platforms that support it, which is only ChibiOS at this point. Further more the wrong usage of the API would result in lockups on the RP2040 QMK port. With these changes ChibiOS no longer halts with a critical system error when enabling the state check.

The problem:

The shared memory on the ChibiOS platform can be accessed concurrently on the secondary device. This happens when the serial transaction thread reacts to transactions requests by the main side of the keyboard. This can lead to race conditions and corrupted transactions.

The solution:

The access to the split shared memory is made exclusive at any given time by guarding it with a mutex.

The previous solution using a critical section didn't really work out because serial functions like send and receive would disable and enable interrupts on their own, thus leaving the system with enabled interrupts after the first transaction. The same was true for slave transaction handlers for e.g. sync timers, those functions effectively broke the critical section as well.

Tested succesfully with RP2040 and STM32F411 splits using the usart and bitbang driver

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Keyboard (addition or update)
  • Keymap/layout/userspace (addition or update)
  • Documentation

Issues Fixed or Closed by This PR

Checklist

  • My code follows the code style of this project: C, Python
  • I have read the PR Checklist document and have made the appropriate changes.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

@github-actions github-actions bot added the core label Mar 14, 2022
...on platforms that support it, which is only ChibiOS at this point.

The problem:

The shared memory on the ChibiOS platform can be accessed concurrently
on the secondary device. This happens when the serial transaction thread,
reacts to transactions requests by the main side of the keyboard. This
can lead to race conditions and corrupted transactions.

The solution:

The access to the split shared memory is made exclusive at any given
time by guarding it with a mutex.

The previous solution using a critical section didn't really work out
because serial functions like send and receive would disable and enable
interrupts on their own, thus leaving the system with enabled interrupts
after the first transaction. The same was true for slave transaction
handlers for e.g. sync timers, those functions effectively broke the
critical section as well.
@drashna
Copy link
Member

drashna commented Mar 16, 2022

Seems to work well on the blackpill f411 and the f303. Or at least, not cause any apparent issuse.

However, this removes the atomic option for avr, without implementing it elsewhere?

@KarlK90
Copy link
Member Author

KarlK90 commented Mar 17, 2022

However, this removes the atomic option for avr, without implementing it elsewhere?

This is true, but I don't see a good clean way to implement this feature on AVR with the lack of a Mutex primitive.

On the other hand with the removal of the critical sections aka. ATOMIC_BLOCK nothing really changes from the status quo on the AVR platform though. This is because of the serial driver and timer implementations that are called in the handlers did leave the critical section unlocked before the handler finished anyways.

So the before the call order was:

  1. enter critical section (disable interrupts)
  2. enter split handler
  3. call timer or serial function in the handler
  4. enter critical section again in the serial or timer function (disable already disabled interrupts again)
  5. do timer or serial stuff
  6. leave critical section in the serial or timer function (enable interrupts again). Here starts the area where race conditions could occur, because now any read /write access to the shared memory could be interrupted by the receiving thread which in turn writes to the shared memory while a write or read hasn't been finished.
  7. leave split handler
  8. leave critical section (enable already enabled interrupts again)

So not having the code at all for AVR makes it a little bit worse in theory.

@daskygit
Copy link
Member

leave critical section in the serial or timer function (enable interrupts again). Here starts the area where race conditions could occur, because now any read /write access to the shared memory could be interrupted by the receiving thread which in turn writes to the shared memory while a write or read hasn't been finished.

I don't think it's actually enabling interrupts when leaving the timer stuff at least. I had to manually enable interrupts for the timer to update. daskygit@e4ad2bd?w=1 (this fixes i2c split usb enumeration when only using a single side). The few people that have tested it didn't report any issues.

I'm assuming the changes above will also allow i2c splits to work correctly.

@drashna
Copy link
Member

drashna commented Mar 21, 2022

Been using this for a few days, and it seems to cause debouncing issues on my tractyl manuform (blackpill f411). Not sure if that's a bug caused by this code, or because it's handling it properly, basically (especially since most but not all of the issues appear to be on the secondary half).

@KarlK90
Copy link
Member Author

KarlK90 commented Mar 21, 2022

Been using this for a few days, and it seems to cause debouncing issues on my tractyl manuform (blackpill f411). Not sure if that's a bug caused by this code, or because it's handling it properly, basically (especially since most but not all of the issues appear to be on the secondary half).

Hmmn, I'll think about a way to measure unusual long locks of the mutex. With debouncing issues you mean that keys get stuck or that input is not registered?

@drashna
Copy link
Member

drashna commented Mar 21, 2022

Specifically, they get registered repeatedly, like a bad joint type behavior. But removing this code change (and just this change) fixes the issue.

@KarlK90
Copy link
Member Author

KarlK90 commented Mar 21, 2022

Do you have link to your current firmware build? I don't have the pwm sensors but a F411 test bed, I'll try to reproduce and investigate the issues.

@drashna
Copy link
Member

drashna commented Mar 21, 2022

Yup: https://github.com/drashna/qmk_firmware/tree/mutex_test/keyboards/handwired/tractyl_manuform/5x6_right/f411

Re-added the code change to the branch I was using, and split it off.

Copy link
Member

@drashna drashna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues were related to my board and all the code that is running on it. Some perf tweaks (matrix scan rate ~300 -> 1500), and the issue goes away.

@KarlK90
Copy link
Member Author

KarlK90 commented Mar 21, 2022

Issues were related to my board and all the code that is running on it. Some perf tweaks (matrix scan rate ~300 -> 1500), and the issue goes away.

Thanks for the feedback 🙂

@drashna drashna requested a review from a team March 21, 2022 20:28
@tzarc tzarc merged commit 7712a28 into qmk:develop Apr 19, 2022
0xcharly pushed a commit to Bastardkb/bastardkb-qmk that referenced this pull request Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants