Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash if shutting down after sending CommissioningComplete but before getting response #22293

Closed
bzbarsky-apple opened this issue Aug 31, 2022 · 0 comments · Fixed by #22294
Closed
Labels

Comments

@bzbarsky-apple
Copy link
Contributor

Problem

Before #21256 AutoCommissioner used the operational proxy if it existed at all. This could happen even if it was disconnected, as long as it had been connected at some point in the past.

This was accidentally changed to "use the operational proxy only if it's connected" in #21256.

This can lead to a crash, as described in #22268 (comment), if shutdown happens after the operational proxy is connected but before we get a response to CommissioningComplete. In that case, we will evict our CASE session, which will error out the CommissioningComplete command we sent and try to clean up, but it will select the (now dangling!) mCommissioneeDeviceProxy instead of correctly selecting mOperationalDeviceProxy, because the mOperationalDeviceProxy no longer has a session at that point.

Proposed Solution

The fix is to check for an "initialized" (in the sense that it has a valid peer node id) mOperationalDeviceProxy instead of checking for a connected one. This matches the semantics of the check we used to have before #21256. This should be very safe and fix this crash.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Aug 31, 2022
Before project-chip#21256
AutoCommissioner used the operational proxy if it existed at all.  This could
happen even if it was disconnected, as long as it had been connected at some
point in the past.

This was accidentally changed to "use the operational proxy only if it's
connected" in project-chip#21256.

This can lead to a crash, as described in
project-chip#22268 (comment),
if shutdown happens after the operational proxy is connected but before we get
a response to CommissioningComplete.  In that case, we will evict our CASE
session, which will error out the CommissioningComplete command we sent and try
to clean up, but it will select the (now dangling!) mCommissioneeDeviceProxy
instead of correctly selecting mOperationalDeviceProxy, because the
mOperationalDeviceProxy no longer has a session at that point.

The fix is to check for an "initialized" (in the sense that it has a valid peer
node id) mOperationalDeviceProxy instead of checking for a connected one.  This
matches the semantics of the check we used to have before
project-chip#21256.

Fixes project-chip#22293
bzbarsky-apple added a commit that referenced this issue Aug 31, 2022
Before #21256
AutoCommissioner used the operational proxy if it existed at all.  This could
happen even if it was disconnected, as long as it had been connected at some
point in the past.

This was accidentally changed to "use the operational proxy only if it's
connected" in #21256.

This can lead to a crash, as described in
#22268 (comment),
if shutdown happens after the operational proxy is connected but before we get
a response to CommissioningComplete.  In that case, we will evict our CASE
session, which will error out the CommissioningComplete command we sent and try
to clean up, but it will select the (now dangling!) mCommissioneeDeviceProxy
instead of correctly selecting mOperationalDeviceProxy, because the
mOperationalDeviceProxy no longer has a session at that point.

The fix is to check for an "initialized" (in the sense that it has a valid peer
node id) mOperationalDeviceProxy instead of checking for a connected one.  This
matches the semantics of the check we used to have before
#21256.

Fixes #22293
isiu-apple pushed a commit to isiu-apple/connectedhomeip that referenced this issue Sep 16, 2022
Before project-chip#21256
AutoCommissioner used the operational proxy if it existed at all.  This could
happen even if it was disconnected, as long as it had been connected at some
point in the past.

This was accidentally changed to "use the operational proxy only if it's
connected" in project-chip#21256.

This can lead to a crash, as described in
project-chip#22268 (comment),
if shutdown happens after the operational proxy is connected but before we get
a response to CommissioningComplete.  In that case, we will evict our CASE
session, which will error out the CommissioningComplete command we sent and try
to clean up, but it will select the (now dangling!) mCommissioneeDeviceProxy
instead of correctly selecting mOperationalDeviceProxy, because the
mOperationalDeviceProxy no longer has a session at that point.

The fix is to check for an "initialized" (in the sense that it has a valid peer
node id) mOperationalDeviceProxy instead of checking for a connected one.  This
matches the semantics of the check we used to have before
project-chip#21256.

Fixes project-chip#22293
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant