Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TC-SU-2.3 OTA-R unable to start new CASE session with OTA-P #18484

Closed
jrhees-cae opened this issue May 16, 2022 · 3 comments · Fixed by #18642
Closed

TC-SU-2.3 OTA-R unable to start new CASE session with OTA-P #18484

jrhees-cae opened this issue May 16, 2022 · 3 comments · Fixed by #18642
Assignees
Labels

Comments

@jrhees-cae
Copy link
Contributor

Problem

An intended fix for timing out a CASE session in SWU was here: #15939
However, it appears the fix is not effective. Even though we see the following in the DUT (OTA-R) logs:

I: 1237047 [DMG]Time out! failed to receive invoke command response from Exchange: 4289i
E: 1237048 [SWU]Received QueryImage failure response: 32
E: 1237048 [SWU]CASE session may be invalid, tear down session

... subsequent requests to the OTA-P from the OTA-R are still attempting to use the same session ID.

<what's wrong or missing, please include any applicable:

  • expected behavior
    If OTA-P is restarted, OTA-R should time out its session and use a new session for subsequent requests.

  • actual behavior
    Subsequent requests (even after a timeout) are continuing to attempt to use the same session.

  • steps to reproduce

  1. OTA-R makes a request to OTA-P (successful)
  2. Restart OTA provider
  3. OTA-R makes a request to OTA-P (unsuccessful)
  4. OTA-R eventually times out and prints in the log: [SWU]CASE session may be invalid, tear down session
  5. OTA-R makes a request to OTA-P (unsuccessful)
  6. ...
@isiu-apple isiu-apple added the ota label May 16, 2022
@isiu-apple isiu-apple self-assigned this May 17, 2022
@isiu-apple
Copy link
Contributor

isiu-apple commented May 17, 2022

Discussed with @mrjerryjohns who mentioned that #16202 is pertinent to this issue.

@isiu-apple
Copy link
Contributor

isiu-apple commented May 19, 2022

Traced the code in debugger and found that when OTA-R fails to communicate with OTA-P when it is killed, OTA-R ends up calling:

DefaultOTARequestor::DisconnectFromProvider()

Which calls: mCASESessionManager->ReleaseSession(fabricInfo->GetPeerIdForNode(mProviderLocation.Value().providerNodeID));

And eventually successfully releases the session object in: CASESessionManager::ReleaseSession(OperationalDeviceProxy * session)

However, when OTA-R performs another query image when OTA-P has been restarted back up again OTA-R thinks there is a valid case session and attempts to use it. We see this in the log:

CHIP: [CTL] Found an existing secure session to [07E64AC5BF927255-000000000000ABCD]!

So in DefaultOTARequestor::ConnectToProvider(), OTA-R tries to find a valid session:

mCASESessionManager->FindOrEstablishSession(fabricInfo->GetPeerIdForNode(mProviderLocation.Value().providerNodeID),
                                                    &mOnConnectedCallback, &mOnConnectionFailureCallback);

Which ends up in this case:

case State::NeedsAddress:
        isConnected = AttachToExistingSecureSession();

And it finds a valid session here:

auto sessionHandle =
        mInitParams.sessionManager->FindSecureSessionForNode(peerNodeId, MakeOptional(Transport::SecureSession::Type::kCASE));

@isiu-apple
Copy link
Contributor

isiu-apple commented May 19, 2022

From @mrjerryjohns: CASESessionManager is responsible for managing OperationalDeviceProxy instances, which are responsible for managing all the state machinery needed to do operational discovery + session establishment there after. To initiate an interaction to a peer, you're expected to retrieve an OperationalDeviceProxy instance for that peer, and retrieve the embedded SecureSessionHandle to then interact with it.
CASESessionManager exists to manage the allocation/de-allocation of proxy instances, as well as ensuring that there is only one such instance to a given Node.
All of this is separate from the actual pool of SecureSession instances that actually contain the state for a given established, secure session.
OperationalDeviceProxy has within it, a weak-reference to the actual SecureSession .
All ReleaseSession does is just de-allocate the OperationalDeviceProxy instance, but not the actual underlying SecureSession since we just have a weak-reference to it.
When you attempt to re-establish CASE by calling FindOrEstablishSession, it allocates a new OperationalDeviceProxy instance, but as part of its initialization, it attempts to find an existing SecureSession (if one exists) to the specified peer. If one does exist (which in this case it does), it will attach to that. This is not what you want.
What you need to do instead is call SessionManager:: ExpireAllPairings(const ScopedNodeId &node). That correctly expires all sessions associated with that Node, and ensures that all OperationalDeviceProxy instances currently referencing that will lose their reference, and correctly re-establish CASE. It will also correctly expire any server-side CASE sessions as well.

Actually, it would be preferable to call OperationalDeviceProxy::Disconnect() - that ensures it only evicts the session that you have conclusive evidence for indicating a lack of response from the peer.
Calling ExpireAllPairings will over-eagerly evict all sessions to that peer, even if some were newer and perhaps, functional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants