-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to anoncreds via api endpoint #2922
Conversation
67cc007
to
1e7008f
Compare
Got a couple of errors while testing this with the alice/faber demo (errors on the alice side). I think the response from the upgrade should include some kind of status re what was upgraded, currently the response is status 200 with an empty My testing: Ran local postgres instance:
Ran alice and faber: ``
$ AGENT_PORT_OVERRIDE=8010 POSTGRES=1 ./run_demo run faber --wallet-type askar --revocation --multitenant
Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last):
15 After receiving credential offer, send credential request Credential exchange abandoned
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got some errors testing using the alice/faber demo, see other comments.
Love the idea of a tutorial for doing this using Alice/Faber. Is that possible/reasonable/helpful? You do need to do something first (populate the storage) and then do an upgrade. Even if it does nothing, I think it would be helpful to demonstrate? |
Tried another test, upgrading faber (and not alice) and everything works fine (can issue and request proof after the upgrade). Then upgraded alice - same errors as above. |
Hmm. I remember seeing that error a long time ago and thought I had fixed it. Possibly an issue with postgres. I'll look into it. |
It's easy to demonstrate with with alice/faber when using multitenancy (since you don't have to restart the agent). Not sure how to verify that the database is upgraded, probably the For single tenancy we need to provide an option in the demo to restart the agent after it's been upgraded. |
I didn't test with sqlite (which is what is used by default for the demo) |
... and if it's a single-tenant upgrade we can include a message that "Agent shut down need to restart" ... |
Remind me why do we need a restart in single-tenant mode, but not in multi-tenant mode? And is there any way to eliminate the requirement? Maybe we ignore the start up option and use what the storage says? |
The restart is because we need to reload the context (all the anoncreds modules, the anoncreds profile). Currently, it is kind of ignoring the wallet-type configuration on the restart start-up. It's reloading the context and anoncreds modules. However, it seems safer to force the restart for an agent which has an active context. Has already injected other classes into the profile and so on. That being said, it might be possible to reload the context like it's doing on startup now when the upgrade completion is detected. I didn't do it, because I didn't think a restart was an issue. |
Haven't been able to reproduce that error yet with the demo and postgres. Also, the integration test |
You can do this with the |
OK we can jump on a call if we need to ... |
@ianco I'll call you on Monday. I can't replicate it still, but not sure I'm using postgres correctly. I will want the morning to figure it out if it is a problem with actual upgrade and the blinded secret. I know I had this problem before when I was re-creating the cred defs private and key proof data during the upgrade, but thought it was fixed after I changed the upgrade to use the existing values. |
My expectation of the “upgrade” process for an operating ACA-Py instance in a Kubernetes environment is:
If the upgrade has to be done by a special instance of the code that has to restart different things, it seems like it is much harder to manage. That approach also works fine for a non-Kubernetes environment. |
I think it works the way you are describing. The controller doesn't have to manage anything. The acapy agents just need to restart after they detect the upgrade completed, like if they crashed. This would be managed by any type of cloud deployment, like kubernetes. |
I tested again this morning, a few different scenarios. If I create the wallet (alice) and then immediately upgrade, then there is no problem and I can issue and request proofs. If I first issue and revoke some credentials (issued 4 revoked 3) and then upgrade alice, then I get the above error when I try to request a proof. |
Looks like there's a small change in the Existing record was saved as json value {"value":{"ms":"71064680940360014629618879654672290185096545699652080871503389385314050854890"}} but in anoncreds it's expecting to be saved directly as a string Hoping it should be a trivial fix. Thanks for finding this. |
@ianco This has been updated. There was two issues when upgrading a multitenant holder.
Expanded the integration test to cover upgrading the holder subwallet and re-issuing a credential. |
Also added a small response payload to the upgrade endpoint. |
Looks good!
|
Some success! I was able to upgrade both of the alice and faber wallets according to my test scenario above (start alice and faber with askar wallet and in multitenant mode with revocation enabled, issue and revoke a bunch of credentials, then upgrade alice and then faber wallets). However all proof requests from faber to alice return false. I thought it might be related to the other reported bug (#2934) so I deleted all alice credentials from her wallet using the api, issued a new credential (not revoked) and then tried another proof request and it still returned false. With the askar wallet the proof would still pass as long as there was at least one valid credential, regardless of how many were revoked. So I suspect the issue is in either the anoncreds-rs code or in the aca-py anoncreds-related code. |
Yes. I think this is the same issue I reported here. I hadn't tried other scenario you described. I thought it was some logic getting the most valid credential, but looks like that might not be the case. I don't think it's related to the upgrade at all, as I can replicate it with fresh anoncreds agents. I will test some other cases as well but when I hadn't revoked any credentials, the proof verified correctly after the upgrade. It was only when any credentials had been revoked and the wallet was anoncreds that the proof always returned false. |
Signed-off-by: jamshale <[email protected]>
Signed-off-by: jamshale <[email protected]>
Signed-off-by: jamshale <[email protected]>
Signed-off-by: jamshale <[email protected]>
Signed-off-by: jamshale <[email protected]>
Signed-off-by: jamshale <[email protected]>
Quality Gate passedIssues Measures |
If it's an existing issue we should probably merge this PR and then deal with it as a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There is an issue with validating proofs in anoncreds but that's independent of these changes.
async def get_subwallet_profiles_from_storage(root_profile: Profile) -> list[Profile]: | ||
"""Get subwallet profiles from storage.""" | ||
subwallet_profiles = [] | ||
base_storage_search = root_profile.inject(BaseStorageSearch) | ||
search_session = base_storage_search.search_records( | ||
type_filter=WalletRecord.RECORD_TYPE, page_size=10 | ||
) | ||
while search_session._done is False: | ||
wallet_storage_records = await search_session.fetch() | ||
for wallet_storage_record in wallet_storage_records: | ||
wallet_record = WalletRecord.from_storage( | ||
wallet_storage_record.id, | ||
json.loads(wallet_storage_record.value), | ||
) | ||
subwallet_profiles.append( | ||
await MultitenantManager(root_profile).get_wallet_profile( | ||
base_context=root_profile.context, | ||
wallet_record=wallet_record, | ||
) | ||
) | ||
return subwallet_profiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jamshale - we're experiencing some issues with the check_for_wallet_upgrades_in_progress
method.
This method gets called upon startup, and if we have existing wallets in the DB, then we get the following stacktrace:
Traceback (most recent call last):
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/askar/store.py\", line 154, in open_store
store = await Store.open(
File \"/home/aries/.local/lib/python3.9/site-packages/aries_askar/store.py\", line 352, in open
return Store(await bindings.store_open(uri, key_method, pass_key, profile), uri)
File \"/home/aries/.local/lib/python3.9/site-packages/aries_askar/bindings/__init__.py\", line 84, in store_open
return await invoke_async(
File \"/home/aries/.local/lib/python3.9/site-packages/aries_askar/bindings/lib.py\", line 362, in invoke_async
return await self.loaded.invoke_async(
File \"/usr/local/lib/python3.9/asyncio/futures.py\", line 284, in __await__
yield self # This tells Task to wait for completion.
File \"/usr/local/lib/python3.9/asyncio/tasks.py\", line 328, in __wakeup
future.result()
File \"/usr/local/lib/python3.9/asyncio/futures.py\", line 201, in result
raise self._exception
aries_askar.error.AskarError: Error fetching store configuration
Caused by: error returned from database: Request returned an error: database \"472436df-6126-4083-85fb-1eacc0ecfcf5\" does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/commands/start.py\", line 72, in init
await startup
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/commands/start.py\", line 29, in start_app
await conductor.start()
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/core/conductor.py\", line 522, in start
await self.check_for_wallet_upgrades_in_progress()
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/core/conductor.py\", line 846, in check_for_wallet_upgrades_in_progress
subwallet_profiles = await get_subwallet_profiles_from_storage(
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/utils/profiles.py\", line 54, in get_subwallet_profiles_from_storage
await MultitenantManager(root_profile).get_wallet_profile(
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/multitenant/manager.py\", line 85, in get_wallet_profile
profile, _ = await wallet_config(context, provision=provision)
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/config/wallet.py\", line 54, in wallet_config
profile = await mgr.open(context, profile_cfg)
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/askar/profile.py\", line 300, in open
opened = await store_config.open_store(provision=False)
File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/askar/store.py\", line 168, in open_store
raise ProfileError(\"Error opening store\") from err
aries_cloudagent.core.error.ProfileError: Error opening store"
I don't quite understand what could be going wrong, because that error:
Request returned an error: database \"472436df-6126-4083-85fb-1eacc0ecfcf5\" does not exist.
is referring to a wallet_id that does exist in the DB. Maybe it's just trying to read it incorrectly?
We're using askar wallet_type, and postgres_storage. (edit: note, we're testing with the most recent nightly build)
It starts up successfully from a clean start, with no existing wallets, but then restarting after creating wallets leads to the above error.
Does any of this look familiar to you, or do you have any idea what might be going wrong? Any help will be appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen this before. This method is simply trying to get all subwallets from the DB (wallet).
I think maybe the store needs to be provisioned. The MultitenantManager(root_profile).get_wallet_profile()
method call has a default to the provision argument as False. It seems like one or more of the subwallets is in a different state than expected. Or maybe I can skip getting the subwallet if it's in this state.
Thanks for finding this with your testing. I will try to replicate and fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the feedback!
One thing that may make our situation unique is the use of Amazon RDS Proxy, for the DB.
I'm personally not so informed on our infra config, but our SRE mentioned that there's a quirk with bootstrapping the DB. We modify the ACAPY_WALLET_STORAGE_CONFIG
after an initialisation, to point to an RDS Proxy URL.
I'll need to gather more details to correctly relay the specifics... but that's one factor that may make our setup different.
It sounds like provision=True may be necessary in this case? If you think that might be it, I could try modify that on a fork and test on our end. But only by tomorrow 👌
This was originally opened as #2840 but this addresses some comments and adds improvements, mostly with handling multi instance aca-py scenarios.
The upgrading middleware will now use a cache during upgrades and after the upgrade has been completed. Agents that have the upgrade middleware but never upgrade will still have a very minor performance hit, due to needing to query the is_upgrading record.
There is a sequence diagram at
docs/design/UpgradeViaApi.md
for describing the design and flow.In multi instance scenarios the instances which did not receive the upgrade request will now detect a upgrade has been initiated from another instance and begin a polling process to check for when the upgrade completes.
In stand alone agents (single tenant) or mutlitenant admin agents any instance that detects the upgrade has completed will still shutdown and be required to start-up again. This could possibly still be avoided, but I think it's best to reload the context from scratch after the upgrade. However, instead of failing after the upgrade restart, due to the
storage-type
beingaskar-anoncreds
and thewallet-type
config beingaskar
. It will instead get a warning to change the configuration, reload the context as anoncreds and successfully launch. I think this is important to prevent accidents where the configuration isn't getting updated immediately after upgrading.Both these improvements mean the controller no longer has to be concerned about scaling down the aca-py instances and co-coordinating the
wallet-type
configuration change. The controller operator simply scales down the instances of controller, upgrades itself to handle the anoncreds endpoints and then hits the upgrade endpoints. As long as the aca-py agent restarts itself after shutting down, the rest will happen automatically.I looked into doing a block on the webhook queue, but I don't think it's necessary. The incoming webhooks will be blocked by the upgrade middleware and outgoing messages shouldn't cause issues with agents. My understanding of the webhooks, and everything they might do during an upgrade, is limited though, so I might be off on that.
See also #2881 for the migration guide.