Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add anoncreds upgrade via api endpoint #2840

Closed
wants to merge 7 commits into from

Conversation

jamshale
Copy link
Contributor

@jamshale jamshale commented Mar 15, 2024

Have added an option to demo to upgrade in multi-tenant mode (non-multi-tenant mode currently requires restart --> see below). Also added an integration test for multi-tenant mode that does object creation and revocation stuff, upgrades, and then does it again with anoncreds.

Have added a decent amount of unit tests. Some harder to test stuff hasn't been covered.

Implementation:

See https://github.com/jamshale/aries-cloudagent-python/blob/feat/2808/docs/design/UpgradeViaApi.md for basic design diagram.

  • Created an endpoint /anoncreds/wallet/upgrade in the wallet routes to trigger the upgrade. For safety it requires adding the wallet name as a parameter.
  • Added a middleware that checks the is_upgrading record in the wallet(DB) and blocks traffic if ti exists..
  • Uses a DB record to persist an upgrade and restart upgrades in the case of an agent restart.
  • The upgrade is all or nothing with a single transaction. There is a retry mechanism.
  • For subwallets, you can currently upgrade to anoncreds even if the base wallet is askar. Might want to force them to upgrade the base wallet first. A multi-tenant base wallet only changes the storage type and profile which doesn't make much difference. The subwallet will change profile types on the fly. Doesn't require agent restart.
  • For standalone agent and mutitenant admin wallets the agent will shutdown after the upgrade, and then require the wallet-type config to be changed to askar-anoncreds. Maybe there's a more streamline way to do this but I think it could be another ticket if we want to do this.
  • For the upgrade of schema's and cred def's I wasn't able to find all the required anoncred data in the existing storage and resorted to getting it from the ledger. An example is the attrNames in schema. Not sure if this exists somewhere in storage I don't know about.

I don't think the upgrade should take very long for any agents or subwallets as the DB changes are just removing and replacing records.

I changed the txn_submit interface in anoncreds to take a BaseLedger instance. I was sometimes getting injection errors because the weak_ref was expired. Not really sure why?

@jamshale jamshale changed the title Feat/2808 Add anoncreds upgrade via api endpoint Mar 15, 2024
@jamshale jamshale force-pushed the feat/2808 branch 11 times, most recently from 6a3c56b to 01dfd04 Compare March 22, 2024 18:28
@jamshale jamshale marked this pull request as ready for review March 22, 2024 18:40
@ianco
Copy link
Contributor

ianco commented Mar 26, 2024

Overall looks good. I had one minor comment (a duplicate line of code I think) and we could use some docs (at a minimum just add the description of this PR to an .md doc somewhere)

I'm going to check out the code and do some testing before I give the PR an official approval.

@jamshale
Copy link
Contributor Author

jamshale commented Apr 8, 2024

I'm back and going to address the documentation in an additional ticket #2875

@jamshale jamshale requested review from dbluhm, ianco, amanji and esune April 8, 2024 15:23
Copy link
Member

@esune esune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good overall, I flagged a couple items that need to be looked at.
Also wondering if it would make sense (and there is a way) to test a failed upgrade and how to recover from that?

aries_cloudagent/admin/server.py Show resolved Hide resolved
aries_cloudagent/messaging/schemas/routes.py Outdated Show resolved Hide resolved
demo/features/steps/0586-sign-transaction.py Outdated Show resolved Hide resolved
@jamshale
Copy link
Contributor Author

jamshale commented Apr 8, 2024

Code looks good overall, I flagged a couple items that need to be looked at. Also wondering if it would make sense (and there is a way) to test a failed upgrade and how to recover from that?

I'm not sure of how I can force an upgrade to fail with the integration tests. I can look into making sure it's covered by unit tests.

My idea is to suggest using the wallet-type which you can get via the api to handle both endpoints when doing an upgrade. In that case if it fails then it will continue working as before. If it succeeds the controller can remove the old endpoints in a future release. This is basically how the integration test controller is implemented now. I think it will be clear once I finish the upgrade documentation.

Will address the other comments and make sure upgrade fail is covered by some type of test.

@esune
Copy link
Member

esune commented Apr 8, 2024

Code looks good overall, I flagged a couple items that need to be looked at. Also wondering if it would make sense (and there is a way) to test a failed upgrade and how to recover from that?

I'm not sure of how I can force an upgrade to fail with the integration tests. I can look into making sure it's covered by unit tests.

My idea is to suggest using the wallet-type which you can get via the api to handle both endpoints when doing an upgrade. In that case if it fails then it will continue working as before. If it succeeds the controller can remove the old endpoints in a future release. This is basically how the integration test controller is implemented now. I think it will be clear once I finish the upgrade documentation.

Will address the other comments and make sure upgrade fail is covered by some type of test.

Depending on effort, if it doesn't too much coordinating to handle a fail it would be useful. I was wondering situations such as corrupted records that would cause trouble (is it even possible?) or the service being terminated mid-way, specifically. likely easier to deal with in unit test than integration tests, I think 😅

@jamshale jamshale force-pushed the feat/2808 branch 3 times, most recently from 3d23722 to 75beafb Compare April 10, 2024 15:44
@ianco
Copy link
Contributor

ianco commented Apr 10, 2024

  • Added a middleware that checks the singleton and blocks traffic for the particular wallet if upgrade is in progress.

Will this work in a scaled environment? I.e. should a multi-instance environment scale down to a single instance before upgrading wallets?

@ianco
Copy link
Contributor

ianco commented Apr 10, 2024

I don't think the upgrade should take very long for any agents or subwallets as the DB changes are just removing and replacing records.

... as long as we're not updating records for individual credentials (OrgBook has millions of credentials in it's wallet)

@swcurran
Copy link
Contributor

swcurran commented Apr 10, 2024

As I understand it, the plan is that the controller is responsible for scaling down, then start back up and calling this endpoint during controller initialization before proceeding. So the ACA-Py instance doesn’t have to scale down as that is impossible to manage. But the controller does.

ianco
ianco previously approved these changes Apr 10, 2024
Copy link
Contributor

@ianco ianco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of comments but I think (a) the upgrade status is stored in the wallet (I wasn't sure but I think it is?) and (b) it doesn't look like there are any "high volume" records to be upgraded (not sure how many rev reg's there would be but I suspect not a huge number)

aries_cloudagent/messaging/schemas/routes.py Outdated Show resolved Hide resolved
aries_cloudagent/admin/server.py Outdated Show resolved Hide resolved
@ianco
Copy link
Contributor

ianco commented Apr 10, 2024

As I understand it, the plan is that the controller is responsible for scaling down, then start back up and calling this endpoint during controller initialization before proceeding. So the ACA-Py instance doesn’t have to scale down as that is impossible to manage. But the controller does.

We need a document :-P (I didn't notice when I approved the PR but went back and checked after @swcurran 's comment) Doesn't need to be part of this PR but we will def need one.

@jamshale
Copy link
Contributor Author

I don't think the upgrade should take very long for any agents or subwallets as the DB changes are just removing and replacing records.

... as long as we're not updating records for individual credentials (OrgBook has millions of credentials in it's wallet)

Only the schema, cred_def and revocation objects are getting updated. The revocation objects could potentially take some time. I don't have much experience with this in heavy loads. The individual credentials aren't effected.

@jamshale
Copy link
Contributor Author

As I understand it, the plan is that the controller is responsible for scaling down, then start back up and calling this endpoint during controller initialization before proceeding. So the ACA-Py instance doesn’t have to scale down as that is impossible to manage. But the controller does.

We need a document :-P (I didn't notice when I approved the PR but went back and checked after @swcurran 's comment) Doesn't need to be part of this PR but we will def need one.

Yes. I'm still working on a detailed document with the upgrade process and the migration steps via #2875. Should be ready soon.

@jamshale
Copy link
Contributor Author

jamshale commented Apr 10, 2024

@ianco I'm going to create a separate document for the design of this with some diagrams so it's clear and can be referenced in the future. It will be separate from the migration guide.

@jamshale
Copy link
Contributor Author

@ianco @swcurran I'm removing the in memory singleton part of this, and instead using the wallet(DB) value in the middleware. I hadn't considered the scaled up, multi instance use case correctly.

I've already made the changes and am testing. The down side is there will be a small performance hit I wanted to avoid.

I wanted to add a DB hook to maintain the in memory singleton mechanism between instances. But after researching it I can't see this done anywhere in the project and it looks like it would be a substantial effort to implement.

@ianco
Copy link
Contributor

ianco commented Apr 12, 2024

@ianco @swcurran I'm removing the in memory singleton part of this, and instead using the wallet(DB) value in the middleware. I hadn't considered the scaled up, multi instance use case correctly.

I've already made the changes and am testing. The down side is there will be a small performance hit I wanted to avoid.

I wanted to add a DB hook to maintain the in memory singleton mechanism between instances. But after researching it I can't see this done anywhere in the project and it looks like it would be a substantial effort to implement.

This update looks good, but I'm wondering also about the performance hit. There is a possibility of using a redis cache (https://github.com/Indicio-tech/aries-acapy-cache-redis - not sure if this one has made it to the plug=ins repo) to keep the state in memory across instances (of course in that case aca-py would have to be configured with that plug-in).

Another option is to add an aca-py flag as to whether to do the "upgrade check" or not, so we can turn it on when we know the upgrade is potential, and then turn it off afterwards.

Of course the other option is to do no checks at all and rely totally on the controller to ramp down and make no requests during the upgrade.

@swcurran your thoughts?

@ianco
Copy link
Contributor

ianco commented Apr 12, 2024

Also regarding the "upgrade check" do we need to check each time we process a received message as well? I.e. if we receive a message and the respective wallet is in the middle of an upgrade should we just leave the message in the queue?

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@jamshale
Copy link
Contributor Author

I'm going to close this PR and open a new PR to clean it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants