Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARMv8M: Ensure SysTick timer is disabled when modifying SysTick regs #660

Closed
wants to merge 2 commits into from

Conversation

paulbartell
Copy link
Contributor

Description

The ARMv8M port currently modifies SysTick registers while the SysTick timer is enabled. This can lead to a number of unexpected side effects depending on the quality of the hardware or simulator implementation.

In particular, qemu fixed a bug that this use triggers in commit 77cd997. This change set addresses the bug present in qemu versions prior to v7.0.

Repro / Test Steps

Attempt to start the kernel on qemu-system-arm v6.2.0 ( the version currently available for ubuntu 22.04 ).

qemu-system-arm -machine mps2-an505 -m 16M -cpu cortex-m33 -nographic -semihosting -monitor null --semihosting-config enable=on,target=native -kernel mps2_an505_cm33_blink_demo.elf -d guest_errors

Note the following warning printed to stderr when the kernel attempts to set the clock source of the SysTick timer and start the SysTick timer with one register write:

Timer with period zero, disabling

Note that SysTick_Handler is never called in this situations.

Apply the patch in this PR and verify that the kernel now runs as expected.

Checklist:

  • I have tested my changes. No regression in existing tests.
  • [ N/A ] I have modified and/or added unit-tests to cover the code changes in this Pull Request.

Related Issue

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ick regs.

Works around bug fixed by qemu commit 77cd997.
@paulbartell paulbartell requested a review from a team as a code owner April 7, 2023 00:11
@paulbartell paulbartell changed the title RMv8M: Ensure SysTick timer is disabled when modifying SysTick regs ARMv8M: Ensure SysTick timer is disabled when modifying SysTick regs Apr 7, 2023
@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (1b8a424) 94.46% compared to head (897e9a5) 94.46%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #660   +/-   ##
=======================================
  Coverage   94.46%   94.46%           
=======================================
  Files           6        6           
  Lines        2422     2422           
  Branches      594      594           
=======================================
  Hits         2288     2288           
  Misses         85       85           
  Partials       49       49           
Flag Coverage Δ
unittests 94.46% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

n9wxu
n9wxu previously approved these changes Apr 7, 2023
@@ -393,7 +393,7 @@ PRIVILEGED_DATA static volatile uint32_t ulCriticalNesting = 0xaaaaaaaaUL;
#if ( configUSE_TICKLESS_IDLE == 1 )
__attribute__( ( weak ) ) void vPortSuppressTicksAndSleep( TickType_t xExpectedIdleTime )
{
uint32_t ulReloadValue, ulCompleteTickPeriods, ulCompletedSysTickDecrements, ulSysTickDecrementsLeft;
uint32_t ulCtrlRegValue, ulReloadValue, ulCompleteTickPeriods, ulCompletedSysTickDecrements, ulSysTickDecrementsLeft;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable names should be written out in full, not abbreviated (as per the other variables declared here).

@@ -422,7 +422,7 @@ PRIVILEGED_DATA static volatile uint32_t ulCriticalNesting = 0xaaaaaaaaUL;
* is accounted for as best it can be, but using the tickless mode will
* inevitably result in some tiny drift of the time maintained by the
* kernel with respect to calendar time. */
portNVIC_SYSTICK_CTRL_REG = ( portNVIC_SYSTICK_CLK_BIT_CONFIG | portNVIC_SYSTICK_INT_BIT );
portNVIC_SYSTICK_CTRL_REG &= ~( portNVIC_SYSTICK_ENABLE_BIT );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code:

  • writes 0 to the ENABLE bit, disabling the timer, which is the desired behaviour.
  • writes 0 to the systick interrupt enable bit, preventing a systick interrupt from becoming pended should the systick value count down to zero.
  • writes 0 to the COUNTFLAG bit, which leaves the COUNTFLAG unchanged, which is necessary as it is checked later to determine if the systick value counted to zero while its interrupt was disabled.
  • writes 0 to the CLKSOURCE bit, the effect of which is implementation defined, so should be reconsidered.

The new code:

  • performs a read-modify-write operation, so reads the COUNTFLAG, potentially clearing it, which I belive introduced a bug as the COUNTFLAG is checked later in the code.

Therefore I think the original code is (more) correct, and the updated code is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RichardBarry The remaining bits of SYSTICK_CTRL are labeled RES0. According to the arm glossary it is implementation-defined whether or not writes or zero or one to these bits have any side effects.

Given this, the read-modify-write operations better comply with ARM's recommendations in the Armv8-M Architecture Reference Manual and Glossary.

* ulCtrlRegValue to ensure portNVIC_SYSTICK_COUNT_FLAG_BIT is taken
* into account later on.
*/
ulCtrlRegValue = portNVIC_SYSTICK_CTRL_REG;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a race condition here? Reading portNVIC_DYDTICK_CTRL_REG clears the portNVIC_SYSTICK_COUNT_FLAG_BIT, which might have otherwise been set between this line executing and the second line down where the saved value is written back to the control register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race condition is addressed on line 511.

* from portNVIC_SYSTICK_LOAD_REG immediately instead of at the next
* cycle of the other clock. Then portNVIC_SYSTICK_LOAD_REG is ready
* to receive the standard value immediately. */
* cycle of the external clock. Then portNVIC_SYSTICK_LOAD_REG is ready
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive the other clock can also be internal - if a peripheral time is considered internal - but I may be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you prefer the term "implementation-defined clock?"

Copy link
Contributor Author

@paulbartell paulbartell Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ARM documentation refers to this as the "IMPLEMENTATION DEFINED external reference clock."

portNVIC_SYSTICK_CTRL_REG = portNVIC_SYSTICK_CLK_BIT_CONFIG | portNVIC_SYSTICK_INT_BIT | portNVIC_SYSTICK_ENABLE_BIT;

/* Enable SysTick */
ulCtrlRegValue |= ( portNVIC_SYSTICK_ENABLE_BIT | portNVIC_SYSTICK_INT_BIT );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code sequence needs considering carefully as it's potentially using an outdated ulCtrlRegValue value.

portable/ARMv8M/non_secure/port.c Show resolved Hide resolved
@jefftenney
Copy link
Contributor

The original code manages subtle race conditions (as noted by @RichardBarry). Since the issue addressed by this PR is limited to QEMU versions prior to 7.0.0, would it be better to minimize the code changes? See this commit for what I believe to be the minimum change required. (Untested.)

For reference here is a link to the QEMU bug description and fix.

@paulbartell
Copy link
Contributor Author

@jefftenney That would address the immediate issue for cases where Tickless Idle is disabled. As I mentioned in another comment, clearing RES0 bits can result in undefined behavior, so the current setup which clobbers all of the RES0 bits in SYSTICK_CTRL could result in such undefined behavior.

portNVIC_SYSTICK_CTRL_REG = ulCtrlRegValue;

/* Update ulCtrlRegValue */
ulCtrlRegValue |= portNVIC_SYSTICK_CTRL_REG;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the register is re-read to prevent a race condition while portNVIC_SYSTICK_ENABLE_BIT is still set.

@jefftenney
Copy link
Contributor

clearing RES0 bits can result in undefined behavior

That isn't right. The Arm Glossary and the ARMv8-M Architecture Reference Manual both agree that writes to RES0 bits have no (material) effect. Either they are hardwired to zero and ignore writes (case 1 below) or the value of the bit has no effect on the operation of the core (case 2 below).

image

In the case of SysTick's CSR register, RMW operations require caution because reading the register actually clears the COUNTFLAG bit. And of course RMW operations are slower.

@paulbartell
Copy link
Contributor Author

@jefftenney Perhaps my terminology is not exact enough.

Quoting from the ARM glossary R:

RES0

A reserved bit or field with Should-Be-Zero-or-Preserved (SBZP) behavior. 

Used for fields in register descriptions. 

Also used for fields in architecturally defined data structures that are held in memory, for example in translation table descriptors. 

RES0 is not used in descriptions of instruction encodings. 

Within the architecture, there are some cases where a register bit or bitfield can be RES0 in some defined architectural context, but can have different defined behavior in a different architectural context. 

For any RES0 bit or field, software must not rely on the bit or field reading as 0, and must use an SBZP policy to write to the bit or field.

And ARM glossary S:

SBZP, Should Be Zero or Preserved

The Large Physical Address Extension modifies the definition of SBZP for register bits that are reallocated by the extension, and as a result are SBZP in some but not all contexts. 

The generic definition of SBZP given here applies only to bits that are not affected by this modification. 

Hardware must ignore writes to the field. 

When writing this field, software must either write all 1s to this field, or, if the register is being restored from a previously read state, this field must be written previously read value. 

If this is not done, then the result is UNPREDICTABLE.

This description can apply to a single bit that should be written as its preserved value or as 0, or to a field that should be written as its preserved value or as all 0s.

So I guess we could write 1s to all RES0 bits and then be in compliance with the spec?

@jefftenney
Copy link
Contributor

@paulbartell No, you wouldn't "blindly" write all 1s to a SBZP field. You can only write 1's if you are preserving what you read. For example in an RMW of some other field in the same register. SBZP is a write policy that says you can always write 0s, and you can also write 1s if you are simply preserving what you read.

However I do see the source of confusion. There is a copy/paste error in the SBZP glossary entry. The author must have copied from SBOP and forgot to make a critical change. Here is the offending sentence in the SBOP entry:

When writing this field, software must either write all 1s to this field, or, if the register is being restored from a previously read state, this field must be written with the previously read value.

It should have been modified as follows for SBZP:

When writing this field, software must either write all 0s to this field, or, if the register is being restored from a previously read state, this field must be written with the previously read value.

Note that the final statement was copy/pasted correctly, from this for SBOP:

This description can apply to a single bit that should be written as its preserved value or as 1, or to a field that should be written as its preserved value or as all 1s.

to this for SBZP:

This description can apply to a single bit that should be written as its preserved value or as 0, or to a field that should be written as its preserved value or as all 0s.

So in the end, the code is safe to write zeros to RES0 fields.

@paulbartell
Copy link
Contributor Author

@jefftenney Thanks so much for the clarification. This makes much more sense now.

I'll rework this PR to minimize the number of changes.

@paulbartell paulbartell marked this pull request as draft April 25, 2023 18:32
@Mancent Mancent linked an issue May 21, 2023 that may be closed by this pull request
@jefftenney
Copy link
Contributor

@paulbartell Would it be helpful for me to open a new PR with the workaround for the qemu bug? I have tested it against paulbartell/add-generic-arm-mps-project, with and without tickless, and it works. The workaround is required for anyone wanting to use QEMU for ARMv8-M while staying with Ubuntu LTS 22.04 (so, "stuck" with QEMU 6.2). Plus the workaround is zero risk for all the normal silicon targets. It would save you the hassle of reworking this PR, but no worries either way.

@aggarg
Copy link
Member

aggarg commented Jul 26, 2023

As per my understanding, the bug in QEMU triggers when you try to change the clock source and enable the SysTick at the same time (i.e. change CLKSOURCE and set ENABLE to 1 in "SysTick Control and Status Register" at the same time).

The following are the places where we do so -

I guess you are suggesting that we assume that portNVIC_SYSTICK_CLK_BIT and portNVIC_SYSTICK_CLK_BIT_CONFIG are same (i.e. configSYSTICK_CLOCK_HZ is not defined). This should be a reasonable assumption as we are just providing a workaround for a bug in older versions of QEMU.

I would suggest to use portNVIC_SYSTICK_CLK_BIT_CONFIG and update the comment a bit -

/* Stop and reset the SysTick.
 *
 * QEMU versions older than 7.0.0 contain a bug which causes an error if
 * we try to change the clock source and enable the SysTick at the same
 * time (i.e. change CLKSOURCE and set ENABLE to 1 in "SysTick Control and
 * Status Register" at the same time). We set the CLKSOURCE here so that
 * when we enable SysTick below, the CLKSOURCE remains the same and does
 * not need to change. This workaround avoids triggering the bug in QEMU
 * versions older than 7.0.0.
 *
 * Please use core clock for clocking SysTick (i.e. do not define
 * configSYSTICK_CLOCK_HZ) if you are using QEMU version older than 7.0.0 to
 * avoid triggering the above mentioned QEMU bug in tickless idle code.
 */
portNVIC_SYSTICK_CTRL_REG = portNVIC_SYSTICK_CLK_BIT_CONFIG;

@jefftenney
Copy link
Contributor

I like the suggestion to use portNVIC_SYSTICK_CLK_BIT_CONFIG instead of portNVIC_SYSTICK_CLK_BIT in the workaround. Makes it easier to skim over quickly without worrying about a possible error in the code. Plus no need to explain in the comments why the temporary setting doesn't impact anything if it doesn't match the configured setting.

But no we don't need to assume portNVIC_SYSTICK_CLK_BIT_CONFIG and portNVIC_SYSTICK_CLK_BIT are equal on QEMU. And I also should clarify the QEMU bug. There's no risk of the QEMU bug being triggered in the tickless-idle code.

I'll raise a new PR. I'll explain how I understand it and we can discuss there on a clean slate.

@aggarg
Copy link
Member

aggarg commented Jul 27, 2023

Thanks @jefftenney for clarifying. This was the bit that I was missing - It's important to note that the bug is not triggered by merely changing the CLKSOURCE bit while simultaneously enabling SysTick. Instead, the bug is triggered by changing from an invalid clock to a valid clock while simultaneously enabling SysTick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

> - [ ] ```
5 participants