Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnFirstMessageDeliveryFailed no longer performs address lookup after 21256 #21639

Closed
tehampson opened this issue Aug 4, 2022 · 6 comments · Fixed by #21768
Closed

OnFirstMessageDeliveryFailed no longer performs address lookup after 21256 #21639

tehampson opened this issue Aug 4, 2022 · 6 comments · Fixed by #21768
Assignees
Labels

Comments

@tehampson
Copy link
Contributor

Problem

After #21256 OnFirstMessageDeliveryFailed no longer performs address lookup on established sessions. This is because previously OperationalDeviceProxy would stick throughout the lifetime of the establish connection and it would receive the event and perform the address lookup. With OperationalSessionSetup now being ephemeral and only living for the lifetime of the session setup, it is no longer is around when OnFirstMessageDeliveryFailed event is dispatched.

Proposed Solution

Here are 3 potential solutions:

  1. OnFirstMessageDeliveryFailed is removed from SessionDelegate. ReliableMessageMgr now notifies the SessionManager about first message delivery failure

    • Pros:
      • Should be one of the easier solutions to implement.
    • Cons:
      • (small) Arguably SessionManager should only manage active connection, having it perform an address lookup is somewhat outside the scope of it's role and is starting to creep
      • (small) The Address lookup code is still needed in OperationalSessionSetup so need to figure out a clean spot for this code to live in one spot functionally be used by both OperationalSessionSetup and SessionManager.
  2. Create a completely new SessionDelegate derived object that will perform address lookup

    • Pros:
      • Likely the cleanest implementation
    • Cons:
      • Currently I personally don't see where this fits nicely
      • Likely will need to allocate this object for every established session, which might not be desirable.
  3. Remove OnFirstMessageDeliveryFailed entirely. The spec only mentions that we should do this, so it is not a requirement.

    • For this solution:
      • Some argue that the chances address lookup will actually do anything that allows a session to recover is very low. Most DHCP renewals end up with the device getting the same address.
    • Against this Solution:
      • Some argue that established sessions are open on a magnitude of days, while the chance is low, this is a convenience feature that maybe help in some particular environments.
@tehampson
Copy link
Contributor Author

Solution 4 (which I still have not looked into so I don't have pros/cons) UpdateDevice could be made to do the lookup

@tehampson tehampson self-assigned this Aug 4, 2022
@bzbarsky-apple
Copy link
Contributor

Just to be clear, a spec SHOULD means "it's expected that any good implementation will do this; not doing it must be grounded in very good reasons".

So yes, it's technically spec-compliant to not do it, but not doing it is also second-guessing the intent of the spec and needs to be very firmly grounded. I don't think we have such firm grounding here, and should be implementing this, esp. since it is working right now and we are breaking it.

In terms of where to do it... We could add some way to run the OperationalSessionSetup state machine partway against an existing session, which is how this used to be done (as in, create a new OperationalSessionSetup with the existing session, then run it through only the first few states). Or we could have a small class for just doing this; that does involve a separate pool and whatnot.....

@mrjerryjohns
Copy link
Contributor

My vote would be for 1. I think it has the least number of changes.

@tehampson
Copy link
Contributor Author

At this point I am just documenting things at they come. I haven't decided on which approach yet.

The issue with reusing OperationalSessionSetup seems to be that ReliableMessageMgr doesn't have access to CASESessionManager and doing so will create a circular build dependency. Arguably we could do a delegate or callback type thing. Based on how I am looking at it, that plumbing would take a little bit of effort.

Originally when I first wrote solution 1, I had the idea that the lookup would happen entirely within SessionManager, but the issue with that is we would only be able to do 1 look up at a time, which makes that a show stopper as I think a requirement would be that we could handle more than 1 parallel look up at a time.

Right now I am investigating if the ExchangeContext which already has it's own SessionHolderWithDelegate could handle the address look up itself. On the off chance someone else knows why that cannot happen feel free to chime in

@bzbarsky-apple
Copy link
Contributor

The only things I can think of there are:

  1. There might still be circular dep issues.
  2. It would likely involve making ExchangeContext bigger, right?

@tehampson
Copy link
Contributor Author

tehampson commented Aug 9, 2022

There might still be circular dep issues.

I don't think there is (but I will double check)

It would likely involve making ExchangeContext bigger, right?

Yes, there might also be a couple other drawbacks listed below


After spending most of yesterday looking into the possible solutions here are the two that I think are the most feasible. I switched to letters as to not confuse them with the numbered solutions mentioned earlier in this issue.

Solution A:

  • OnFirstMessageDeliveryFailed is removed from SessionDelegate and becomes it's own delegate. For now lets call it MessageDelieveryFailedDelegate which will have just the one function that will need to be overridden OnFirstMessageDeliveryFailed.
  • CASESessionManager will implement MessageDelieveryFailedDelegate
  • During CASESessionManager::Init it will register it's MessageDelieveryFailedDelegate with ReliableMessageMgr. We have access to ReliableMessageMgr, and from what I see, initialization of CASESessionManager happens after initialization of the things we need to get ReliableMessageMgr from already provided CASESessionManagerConfig argument.
  • Pros:
    • We are able to reuse most of OperationalSessionSetup to do the lookup for us.
  • Cons:
    • At this point we will not have the exact Session that requires a lookup. Arguably updating all active Sessions that match the ScopedNodeID should be updated anyways so this might actually be beneficial, but listed as a con right now since the behaviour is different than what we used to have.
    • Extra care will need to be taken in case we Allocate a OperationalSessionSetup specifically for the lookup in the corner case where, some other entity does FindOrEstablishSession during the lookup.
      • It's not that it cannot be done, just something to be aware of, and I think I have a solution for this already involving adding a state to the OperationalSessionSetup state machine (but there might be something else that comes to me as I work on it).

Solution B:

  • ExchangeContext which already has it's own SessionHolderWithDelegate and can just implement OnFirstMessageDeliveryFailed
  • Pros:
    • Overall logic should be more simple then Solution A since Solution A likely involves changing the OperationalSessionSetup state machine (IMO changes to state machine does make it harder for someone unfamiliar to the code to take a look at it for the first time to understand what is going on).
  • Cons:
    • It is possible there are multiple ExchangeContext that will hit OnFirstMessageDeliveryFailed for the same Session, creating multiple DNS lookups.
      • This can be mitigated against by also adding something to Session to identify if there is an existing lookup already happening.
    • ExchangeContext will need to add AddressResolve::NodeLookupHandle
    • The DNS lookup time to complete might be longer then the life of the ExchangeContext since that is controlled by the application.

I think it is likely apparent in my write-up I have biased towards Solution A, but wanted to post this in case anyone disagrees or has something that I overlooked. Both of these solutions allow for parallel lookup on different Sessions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants