Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable manually triggering keylist updates during connection #1851

Merged

Conversation

dbluhm
Copy link
Contributor

@dbluhm dbluhm commented Jul 1, 2022

This PR is a significant refactor of routing for both Mediation and Multi-tenancy, impacting Connections, Out-of-band, and DID Exchange managers, with the goal of eliminating a race condition possible when using mediation. Additionally, this PR also simplifies routing setup significantly for both mediation and multi-tenancy.

Mediation Race Condition during Connection Formation

During connection formation, the current implementation of mediation support triggers a keylist update message to be sent just prior to sending the connection request or response (or a DID Exchange request or response) without waiting for confirmation from the mediator that the keylist update was received and processed successfully. This can result in a scenario where the connection request or response is received and responded to before the mediator has processed the keylist update. Typically, the keylist update process is quick and, especially when using the default HTTP transport mechanisms, the chances of things getting out of order are very slim. However, when introducing outbound queuing or another transport, this race condition has proven to be frequent.

Mediation Client as Invitee

image

Mediation Client as Inviter

image

Solution

To prevent this race condition, this PR adds the /mediation/update-keylist/{conn-id} Admin endpoint that is called after receive-invitation or receive-request and before accept-invitation or accept-request. This endpoint creates connection keys and sends those keys in a keylist update to the specified mediator. When combined with the changes in #1769, the controller can then wait for the keylist update response received webhook, match the thread_id to the id of the keylist update returned from /mediation/update-keylist/{conn-id}, and then proceed with the next step in the connection.

image

image

This should look familiar; this is more or less how the old establish-inbound endpoint worked. However, a key difference and the next major addition of this PR is the new RouteManager abstract base class. Two concretions are included in this PR: one for standard Coordinate Mediation v1 route management and one for Multi-tenancy + Mediation, which handles base wallet route creation and routing through a base wallet mediator and/or a sub wallet mediator.

Using the RouteManager to abstract away routing details, the Connection, OOB, and DIDExchange managers have each been significantly simplified (IMHO 🙂).

Some additional notes:

  • This is a backwards compatible change. If /mediation/update-keylist/{conn-id} is not used, forming connections through a mediator should behave exactly as before.
  • This PR also corrects a slight misbehavior where both the base wallet's and sub wallet's mediator were informed of new connection keys being formed, even though the base wallet mediator was the only mediator in that chain that would have seen those keys.

dbluhm added 28 commits May 25, 2022 16:21
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
@swcurran
Copy link
Contributor

swcurran commented Jul 1, 2022

Best PR ever submitted -- great job! @andrewwhitehead @shaangill025 @TimoGlastra and others, please review.

I'll try to contribute by testing this PR using the existing AATH tests -- ACA-Py to ACA-Py.

@dbluhm -- we frequently see a problem with the first credential offer after a connection is established. Could this race condition be a cause for that problem?

Thanks!!

@dbluhm
Copy link
Contributor Author

dbluhm commented Jul 1, 2022

@swcurran if the connection made it all the way to active/completed state, I would not expect this particular race condition to be the cause of a lost credential offer right after connection establishment 🤔

The way this one plays out, it's usually exhibited by the connection never fully advancing to active/completed

in update-keylist for connection

Signed-off-by: Daniel Bluhm <[email protected]>
@dbluhm
Copy link
Contributor Author

dbluhm commented Jul 6, 2022

@TimoGlastra @andrewwhitehead I'm interested to hear your thoughts on these changes: sicpa-dlab/aries-cloudagent-python@feature/manual-keylist-update...dbluhm:aries-cloudagent-python:feature/route-manager-provider

On that branch, I implemented Timo's suggestion to simply obtain the correct instance of the RouteManager from the injection context. I completely agree that it makes for a cleaner solution. The one thing I've done that I'm curious to hear opinions about is I've added the current profile to the injection context as well. This makes calling the route manager cleaner by not having to pass the profile to every method that needs it; however, to this point, that's generally exactly how we've handled things.

Curious to hear your thoughts. If we like it, I'm happy to merge into this PR. If not, I'll adjust the route manager methods to accept a profile.

@dbluhm
Copy link
Contributor Author

dbluhm commented Jul 6, 2022

As an aside, I intend to follow up this PR with a refactor on the locations of a lot of these components. I expect that to be tedious and verbose so I've not done any of that with this PR to make these additions clearer.

That refactor may include:

  • Moving route manager base (and provider) to aries_cloudagent/routing or similar
  • Moving mediation record and route record to aries_cloudagent/routing or similar
  • Split default mediator storage and retrieval to a separate component from the coordinate medation v1 manager
  • Move associated tests

These changes would make it so the route manager is not strictly dependent on the coordinate mediation v1 protocol or the vestiges of the old routing v1 protocol. It also moves records used by multiple components to a common location, which better reflects their common usage, IMO.

@dbluhm
Copy link
Contributor Author

dbluhm commented Jul 6, 2022

The one thing I've done that I'm curious to hear opinions about is I've added the current profile to the injection context as well. This makes calling the route manager cleaner by not having to pass the profile to every method that needs it; however, to this point, that's generally exactly how we've handled things.

I've just discovered the profile level bind_providers call; I'll see if I can bind the RouteManagerProvider there instead of adding the profile to the injection context 🙂

@dbluhm
Copy link
Contributor Author

dbluhm commented Jul 6, 2022

I opted to just accept profiles on the methods. I obviously considered a number of alternatives but I think this one requires the least change and has the fewest implications for everything outside of these changes.

@dbluhm dbluhm requested a review from TimoGlastra July 6, 2022 19:37
@codecov-commenter
Copy link

codecov-commenter commented Jul 6, 2022

Codecov Report

Merging #1851 (9b870be) into main (cb584c0) will decrease coverage by 0.01%.
The diff coverage is 99.60%.

@@            Coverage Diff             @@
##             main    #1851      +/-   ##
==========================================
- Coverage   93.80%   93.79%   -0.02%     
==========================================
  Files         536      538       +2     
  Lines       33966    34015      +49     
==========================================
+ Hits        31863    31904      +41     
- Misses       2103     2111       +8     

@dbluhm dbluhm force-pushed the feature/manual-keylist-update branch from 67c3cd0 to d1ea9dc Compare July 6, 2022 19:56
@dbluhm dbluhm force-pushed the feature/manual-keylist-update branch from d1ea9dc to 822fdfb Compare July 6, 2022 20:13
@dbluhm dbluhm requested a review from andrewwhitehead July 20, 2022 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants