-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: did-rotate #2816
feat: did-rotate #2816
Conversation
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
From record_keys_for_public_did. Peer DIDs are resolvable but not necessarily public. Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Daniel Bluhm <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Stephen Curran <[email protected]>
Tagging @dbluhm for review, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Some comments below. A general comment: we need integration tests for this protocol.
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
Signed-off-by: Akiff Manji <[email protected]>
""" | ||
# TODO it would be better to include the DIDs of the connection in the | ||
# target cache key This solution only works when using whole cluster | ||
# caching or have only a single instance with local caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should discuss this point; if we don't make this change now, there can be unexpected behavior in clusters. Additionally, once the rotation is completed, there is no way to handle any potentially out of order messages. If we did make this change, the cached value for the previous DIDs would be around until the cache expires them. There would still be issues in clustered environments without whole cluster caching, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can live with this for now, I think this PR is in a good state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm not entirely familiar with how the different caching mechanisms work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm feeling like the right option right now is probably to log an issue to address handling potentially out of order messages received after a rotation. It's an unlikely edge case and I think it will be more valuable to keep this PR moving for now.
Signed-off-by: Akiff Manji <[email protected]>
Quality Gate passedIssues Measures |
Not sure why this PR is getting so many random integration test fails. Not the same as the ones as I fixed recently. Might try an look at these other ones at some point. Re-running again. |
@jamshale Yeah the tests seem so finicky. |
They seem to get worse sometimes and are better sometimes. The last fail was the one I thought I had fixed. Hadn't seen it in a while. Not sure if adding more retries might help or not. |
Taken from #2494 (c/o @dbluhm)
This PR is an early implementation of the RFC 0794 DID Rotate protocol. This was an exercise to feel out any potential issues with the protocol as defined.
This is not yet a complete implementation. Notably missing right now is a way to actually trigger a DID rotation through an Admin API call. I also need to finish wiring up the handlers to the manager. And who could forget about tests. 🙂
Overall, the protocol is solid. I'll report my thoughts on the protocol on the PR to the Aries RFCs.
An ACA-Py specific concern I discovered in this process that probably deserves some discussion: invalidating the connection target cache when the DID Rotation is committed. As currently structured, we never really expect the connection targets for a connection to change after the protocol establishing it concludes. DID Rotation shakes that expectation up.
Right now, in my changes, I've added a clear_connection_targets_cache helper method to partially address this. It's partial because it will only work for ACA-Py when run as a single instance with the default in memory cache or if you're using Indicio's redis cache plugin for whole cluster shared caching.
I think it would be better for ACA-Py to use a value including both DIDs of the connection in the cache key as a more complete solution. The problem with this and why I've not done it on a first pass is that there would need to be some restructuring of how we use the Responder classes so that the send methods expect a whole ConnRecord and not just a connection_id (or individually a their_did and my_did, I suppose) so we can extract the DIDs from it. 9 times out of 10, I would say that connection record is already available to the caller of the send method and my hypothesis is that in the remaining 1 time out of 10, it will not dramatically impact performance of critical operations (it is not common for us to know the connection_id and not also have a connection record -- maybe only a few places in the Admin API might be like this).
I'll do some more investigation on these points. Might be good to talk implications at the next ACA-PUG call.
cc @swcurran @TelegramSam
FYI @Jsyro this has ties back to updating the DIDs associated with connections that we've talked about recently.