Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign AMQP RequestResponseChannel Recovery #33378

Closed
anuchandy opened this issue Feb 7, 2023 · 2 comments
Closed

Redesign AMQP RequestResponseChannel Recovery #33378

anuchandy opened this issue Feb 7, 2023 · 2 comments
Assignees
Labels
amqp Label for tracking issues related to AMQP Azure.Core azure-core Event Hubs Service Bus

Comments

@anuchandy
Copy link
Member

anuchandy commented Feb 7, 2023

  1. Implement RequestResponseChannelCache by following the ReactorConnectionCache design Reference.
  2. RequestResponseChannelCache "recovery-support" terminates when 1. terminated is true (as a result of closing cache) 2. Or when the hosting connection is disposed.
  3. RequestResponseChannelClosedException seems to be the right one to emit if there is a cache access attempt post the "recovery-support" termination.
  4. Awaiting for the activation before loading to cache (note: activation await is a prerequisite for every sendWithAck)
  5. Delete our legacy AmqpChannelProcessor type (completion of 33224 and cache implementation means we no longer use it).
  6. Remove witRetry applied to request-link, response-link activation in sendWithAck, instead use timeout. Why? Suppose there is a race between sendWithAck and RequestResponseChannel disposal due to an error in one of the links. In that case, resubscribing to the endpoint of the same link object after a backoff (from withRetry) will get the same error (via replay) for all recurring retries in the retry cycle. It will delay/block the consumer/producer recovery for a time equivalent to the complete retry cycle duration.
  7. (Nice to have) sendWithAck has an entry check to see if RequestResponseChannel is disposed. We set this atomic flag in the close-async. The close-async is invoked post terminating the unconfirmed sends. Each such termination is a call to a downstream subscriber. An invocation of sendWithAck (new one or as a result of a retry from a just terminated unconfirmed send) while still iterating the unconfirmed send won't see the disposed flag as set yet. We can improve the entry check as in (isDisposed || hasError || pendingLinkTerminations <= 1).
  8. Review the threading (hopping, type of threads- parallel, Qpid-event-loop) associated with all the routes that lead to cache refresh. And compare it with how those routes were refreshing the legacy AmqpChannelProcessor's channel.
  9. Sketch any relevant digrams for library developer reference.
  10. ...[ identify and add]
@anuchandy anuchandy added Event Hubs Service Bus amqp Label for tracking issues related to AMQP labels Feb 7, 2023
@anuchandy anuchandy self-assigned this Feb 7, 2023
@anuchandy anuchandy added the Azure.Core azure-core label Feb 7, 2023
@anuchandy
Copy link
Member Author

anuchandy commented Feb 14, 2023

Outlining design being coded -

  1. RequestResponseChannelCache::Ctr to take ReactorConnection.
  2. The source or provider chain for cache resource (inside RRChannelCache) flow as -
    • Mono.defer - check "recovery-support" is not terminated, then calls connection.createSession (awaits connection to active, session to active).
    • New up RequestResponseChannel.
  3. RequestResponseChannelCache to implement dispose(), which marks terminated flag and delegates to RequestResponseChannel closeAsync.
  4. ReactorConnection::getOrCreateCBSNode to create CBS RequestResponseChannelCache and pass it to ClaimsBasedSecurityChannel
  5. ReactorConnection::getOrCreateCBSNode is synchronized; the API is in a hot path, so reduce current synchronization there via DCL. [Changes legacy design choice]
  6. When connection closes ClaimsBasedSecurityChannel (CBSC), CBSC to close owned CBS RequestResponseChannelCache.
  7. ReactorConnection owns only ClaimsBasedSecurityChannel, not its backing CBS RequestResponseChannelCache. ReactorConnection.close -> ClaimsBasedSecurityChannel.close -> CBS RequestResponseChannelCache.close [Changes legacy design choice].
  8. Today, ReactorConnection closeAsync does the following
    cbsCloseOperation = cbsChannelProcessor.flatMap(channel -> channel.closeAsync()); which may lead to an unnecessary subscription chain execution attempting to create a CBS session only to find that the connection endpoint is in an errored state. With the new design, this logic is no longer exists i.e. no such chain triggering.

@anuchandy
Copy link
Member Author

Implemented in the PR #34854

@conniey conniey closed this as completed Jan 24, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
amqp Label for tracking issues related to AMQP Azure.Core azure-core Event Hubs Service Bus
Projects
Status: Done
Development

No branches or pull requests

2 participants