v2.6: Backport latest fixes #16934

krish2718 · 2024-08-20T15:02:48Z

No description provided.

[SHEL-2947] QoS null frame based legacy power save support. Signed-off-by: Ajay Parida <[email protected]>

[SHEL-2947] Changing naming of power save mode to mechanism(Text change). Signed-off-by: Ajay Parida <[email protected]>

FMAC relies on these callbacks to perform a RPU recovery i.e., coldboot the device in a clean way, this is achieved by performing an interface down and then up, this properly cleans up the driver, performs a cold boot and either through NET_IF events (for scan only) or WPA_S events (for full Wi-Fi) notifies the applications. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

In case we get multiple watchdog timers in succession, we need to sequentialize the recovery. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

In some scenarios esp. while debugging nRF70, this feature should be disabled, so, provide a feature flag and mark it as experimental. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

This delay ensures that the applications have enough time to perform any cleanup and be prepared once the RPU is powered on again. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

This is necessary as recovery involves calling down and up in a rapid succession. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

This helps us verify the recovery mechanism. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

During watchdog (or any) interrupt processing, RPU accesses are being made and they assert the wakeup_now flag this causes RPU recovery to not trigger. New false or true recovery detection algo: Check the time difference b/w last de-assert and assert, and if it exceeds minimum time needed for RPU to enter sleep, then not the timestamp. This timestamp will be used to compare when a watchdog interrupt is received and see if during the last window if host has given a chance for RPU to attempt sleep, if yes, then attempt recovery else ignore watchdog. Also, add a Kconfig for the 10s active time that triggers recovery, this needs to be passed to the FW (once we have enough patch memory). Also, add a Kconfig for the minimum time needed for RPU to attempt sleep in positive case. Also, add a new _ms API for time stamp fetch, this is to avoid precision loss when converting to and from ms to us and also makes code readable by avoiding *1000 and /1000. Signed-off-by: Chaitanya Tata <[email protected]>

In case RPU is stuck and need a recovery, the failures in interface down should be ignored as they are expected and we should proceed with device removal that in turn removes power to the RPU. TODO: This works for single VIF, but needs more thought for multi-VIF. Signed-off-by: Chaitanya Tata <[email protected]>

Before proceeding with RPU bringup, do a sanity check by reading a known signature to make sure the Host-RPU comms are operational. Signed-off-by: Chaitanya Tata <[email protected]>

These are helpful for debugging RPU recovery only. Signed-off-by: Chaitanya Tata <[email protected]>

In order for the interface down to propagate and cleanup it needs more time, using Shell 10ms was working due to human delay, but programatically this needs higher delay. Signed-off-by: Chaitanya Tata <[email protected]>

RPU is only providing the per-wiphy (RPU) extended capabilities, so, remove storing of per-VIF extended capabilities. BTW, there is a memory leak here when doing interface down and up. Fixes SHEL-2738. Signed-off-by: Chaitanya Tata <[email protected]>

The extended capabilities are not freed causing a leak on interface down and up. Fixes SHEL-2738. Signed-off-by: Chaitanya Tata <[email protected]>

During recovery we might get further watchdog interrupts causing multiple recovery requests, ignore them if a recovery is already in progress. Signed-off-by: Chaitanya Tata <[email protected]>

This is to avoid successive recoveries in case we get successive watchdog interrupts from the RPU. Signed-off-by: Chaitanya Tata <[email protected]>

Check for RPU context as well. To fix this properly we need more fixes to be backported, but this should suffice for now. Signed-off-by: Chaitanya Tata <[email protected]>

In case RPU is stuck in consecutive recovery over a time period then that means it's not recoverable through RPU recovery, only thing left to do is to trigger a system reboot. This feature is disabled by default, so, either application can do their own implementatio or enable this feature in the driver along with configurable retries and window period. Signed-off-by: Chaitanya Tata <[email protected]>

Though this is no-op for now, it would lead to crash if BAL de-init is called which will be in the upcoming commits. Signed-off-by: Chaitanya Tata <[email protected]>

BAL de-init was never called, so, these weren't caught. In upcoming commits BAL de-init will be used, so, assign here to avoid crashes. Signed-off-by: Chaitanya Tata <[email protected]>

This can lead to crash in case driver initialization fails e.g. flashing wrong build (5340 on 7002) or if this API is called too early before the driver is initialized. Fixes SHEL-2576. Signed-off-by: Chaitanya Tata <[email protected]>

The QSPI dev context has it's own structure, so, need to be extract the QSPI dev ops from the context, this has been implemented improperly, but as it's not been used till date hadn't caused any problems. Signed-off-by: Chaitanya Tata <[email protected]>

Fix RPU recovery protection to solve build failures when RPU recovery is disabled. As recovery is primarily based on power-management, add a Kconfig dependency to enforce, this simplies the macros to protect the code. Signed-off-by: Chaitanya Tata <[email protected]>

During interface down in case TX has pending buffers in either TXQ or Pending_Q then they are not freed instead the Q itself is freed. Fix by traversing the Q and freeing all members. Signed-off-by: Chaitanya Tata <[email protected]>

This library should be used samples to manage Wi-Fi usage dynamically. Signed-off-by: Chaitanya Tata <[email protected]>

Use Wi-Fi ready library to manage Wi-Fi. Signed-off-by: Chaitanya Tata <[email protected]>

These are very frequent, so, a separate debug is added for debugging host RPU recovery logic. Signed-off-by: Chaitanya Tata <[email protected]>

This is useful to understand the reason for comms trigger b/w host and RPU. Signed-off-by: Chaitanya Tata <[email protected]>

Mention RPU recovery feature, there are no docs yet, so, no links. Signed-off-by: Chaitanya Tata <[email protected]>

With this offload, host doesn't need to manage RX buffers for management frames, and this saves Host-RPU comms and thus giving RPU to sleep more often and is essential to test RPU recovery. Signed-off-by: Chaitanya Tata <[email protected]>

NordicBuilder · 2024-08-20T15:03:23Z

The following west manifest projects have been modified in this Pull Request:

Name	Old Revision	New Revision	Diff
nrfxlib	nrfconnect/sdk-nrfxlib@`4bd894a`	nrfconnect/sdk-nrfxlib@`3cb1a19` (v2.6-branch)	nrfconnect/[email protected]

Note: This message is automatically posted and updated by the Manifest GitHub Action.

NordicBuilder · 2024-08-20T15:36:02Z

You can find the documentation preview for this PR at this link. It will be updated about 10 minutes after the documentation build succeeds.

Note: This comment is automatically posted by the Documentation Publishing GitHub Action.

NordicBuilder · 2024-08-21T14:24:21Z

Test specification

CI/Jenkins/NRF

Integration Platforms

CI/Jenkins/integration

Test Module	File based changes	Manually selected	West overwrite
test-fw-nrfconnect-boot	X
test-fw-nrfconnect-chip	X
test-sdk-wifi	X

Detailed information of selected test modules

Note: This message is automatically posted and updated by the CI

In crowded environments RPU is active for more than 10s due to too many retries and this triggers a false RPU recovery. To avoid this, increase the default to 50s to handle corner cases, as this will only impact the recovery triggered case, higher timeout doesn't have any impact in normal cases. Signed-off-by: Chaitanya Tata <[email protected]>

To handle interoperability issue with few APs, add a feature to keep sending keepalive frames periodically to avoid AP disconnecting the STA. This is disabled by default to avoid unnecessary power consumption as it's only seen with few old APs. Signed-off-by: Chaitanya Tata <[email protected]>

Pull latest fixes backported to 2.6 branch. Signed-off-by: Chaitanya Tata <[email protected]>

ajayparida and others added 30 commits August 20, 2024 19:20

drivers: wifi: Option for Qos NULL frame based power save

cd24570

[SHEL-2947] QoS null frame based legacy power save support. Signed-off-by: Ajay Parida <[email protected]>

drivers: wifi: Change mode to mechanism

6676187

[SHEL-2947] Changing naming of power save mode to mechanism(Text change). Signed-off-by: Ajay Parida <[email protected]>

drivers: wifi: Implement locking for recovery

4f82b1d

In case we get multiple watchdog timers in succession, we need to sequentialize the recovery. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add a feature flag for RPU recovery

9379fec

In some scenarios esp. while debugging nRF70, this feature should be disabled, so, provide a feature flag and mark it as experimental. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add configuration option for propagation delay

d852a65

This delay ensures that the applications have enough time to perform any cleanup and be prepared once the RPU is powered on again. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Increase net management events

f038e13

This is necessary as recovery involves calling down and up in a rapid succession. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add a test command for RPU recovery

4771c8a

This helps us verify the recovery mechanism. Implements SHEL-2726. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add a sanity check for RPU comms

29881b9

Before proceeding with RPU bringup, do a sanity check by reading a known signature to make sure the Host-RPU comms are operational. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add support for separate debugs for RPU recovery

58a3b1f

These are helpful for debugging RPU recovery only. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Increase the propogation delay

fd768b6

In order for the interface down to propagate and cleanup it needs more time, using Shell 10ms was working due to human delay, but programatically this needs higher delay. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Fix memory leak

9a06523

The extended capabilities are not freed causing a leak on interface down and up. Fixes SHEL-2738. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Ignore parallel recovery requests

8112f34

During recovery we might get further watchdog interrupts causing multiple recovery requests, ignore them if a recovery is already in progress. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add quiet period for RPU recovery

cb8df06

This is to avoid successive recoveries in case we get successive watchdog interrupts from the RPU. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Fix the NULL check

ad8d230

Check for RPU context as well. To fix this properly we need more fixes to be backported, but this should suffice for now. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add de-initialization for SPI

865f59e

Though this is no-op for now, it would lead to crash if BAL de-init is called which will be in the upcoming commits. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Assign de-initialization implementation ops

7da1e05

BAL de-init was never called, so, these weren't caught. In upcoming commits BAL de-init will be used, so, assign here to avoid crashes. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Fix NULL check for FMAC context

af4a8ed

This can lead to crash in case driver initialization fails e.g. flashing wrong build (5340 on 7002) or if this API is called too early before the driver is initialized. Fixes SHEL-2576. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Fix TX buffers leak

c0ef6ab

During interface down in case TX has pending buffers in either TXQ or Pending_Q then they are not freed instead the Q itself is freed. Fix by traversing the Q and freeing all members. Signed-off-by: Chaitanya Tata <[email protected]>

net: lib: Add Wi-Fi ready lib

ed65987

This library should be used samples to manage Wi-Fi usage dynamically. Signed-off-by: Chaitanya Tata <[email protected]>

samples: wifi: sta: Use Wi-Fi ready lib

6896ec4

Use Wi-Fi ready library to manage Wi-Fi. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add PS state debugs

f993249

These are very frequent, so, a separate debug is added for debugging host RPU recovery logic. Signed-off-by: Chaitanya Tata <[email protected]>

drivers: wifi: Add command and event logging

66be9cf

This is useful to understand the reason for comms trigger b/w host and RPU. Signed-off-by: Chaitanya Tata <[email protected]>

doc: changelog: Add an entry for RPU recovery

d77bb28

Mention RPU recovery feature, there are no docs yet, so, no links. Signed-off-by: Chaitanya Tata <[email protected]>

krish2718 requested a review from krga2022 August 20, 2024 15:02

github-actions bot added doc-required PR must not be merged without tech writer approval. manifest labels Aug 20, 2024

NordicBuilder added manifest-nrfxlib DNM labels Aug 20, 2024

krish2718 marked this pull request as ready for review August 21, 2024 14:15

krish2718 requested review from rlubos, D-Triveni, bama-nordic, sachinthegreen, rado17, carlescufi and tejlmand as code owners August 21, 2024 14:15

sachinthegreen approved these changes Aug 23, 2024

View reviewed changes

krish2718 added 3 commits August 28, 2024 15:49

manifest: nrfxlib: Pull latest fixes

6e8d788

Pull latest fixes backported to 2.6 branch. Signed-off-by: Chaitanya Tata <[email protected]>

krish2718 force-pushed the pull_latest_wifi_fixes branch from 569d12a to 6e8d788 Compare August 28, 2024 11:27

NordicBuilder removed the DNM label Aug 28, 2024

krish2718 requested a review from udaynordic August 28, 2024 11:28

udaynordic approved these changes Aug 28, 2024

View reviewed changes

bama-nordic approved these changes Aug 29, 2024

View reviewed changes

carlescufi merged commit 49fec46 into nrfconnect:v2.6-branch Aug 29, 2024
14 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.6: Backport latest fixes #16934

v2.6: Backport latest fixes #16934

krish2718 commented Aug 20, 2024

NordicBuilder commented Aug 20, 2024 •

edited

Loading

NordicBuilder commented Aug 20, 2024

NordicBuilder commented Aug 21, 2024 •

edited

Loading

v2.6: Backport latest fixes #16934

v2.6: Backport latest fixes #16934

Conversation

krish2718 commented Aug 20, 2024

NordicBuilder commented Aug 20, 2024 • edited Loading

NordicBuilder commented Aug 20, 2024

NordicBuilder commented Aug 21, 2024 • edited Loading

Test specification

CI/Jenkins/NRF

CI/Jenkins/integration

NordicBuilder commented Aug 20, 2024 •

edited

Loading

NordicBuilder commented Aug 21, 2024 •

edited

Loading