Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to anoncreds via api endpoint #2922

Merged
merged 6 commits into from
May 8, 2024

Conversation

jamshale
Copy link
Contributor

@jamshale jamshale commented Apr 29, 2024

This was originally opened as #2840 but this addresses some comments and adds improvements, mostly with handling multi instance aca-py scenarios.

The upgrading middleware will now use a cache during upgrades and after the upgrade has been completed. Agents that have the upgrade middleware but never upgrade will still have a very minor performance hit, due to needing to query the is_upgrading record.

There is a sequence diagram at docs/design/UpgradeViaApi.md for describing the design and flow.

In multi instance scenarios the instances which did not receive the upgrade request will now detect a upgrade has been initiated from another instance and begin a polling process to check for when the upgrade completes.

In stand alone agents (single tenant) or mutlitenant admin agents any instance that detects the upgrade has completed will still shutdown and be required to start-up again. This could possibly still be avoided, but I think it's best to reload the context from scratch after the upgrade. However, instead of failing after the upgrade restart, due to the storage-type being askar-anoncreds and the wallet-type config being askar. It will instead get a warning to change the configuration, reload the context as anoncreds and successfully launch. I think this is important to prevent accidents where the configuration isn't getting updated immediately after upgrading.

Both these improvements mean the controller no longer has to be concerned about scaling down the aca-py instances and co-coordinating the wallet-type configuration change. The controller operator simply scales down the instances of controller, upgrades itself to handle the anoncreds endpoints and then hits the upgrade endpoints. As long as the aca-py agent restarts itself after shutting down, the rest will happen automatically.

I looked into doing a block on the webhook queue, but I don't think it's necessary. The incoming webhooks will be blocked by the upgrade middleware and outgoing messages shouldn't cause issues with agents. My understanding of the webhooks, and everything they might do during an upgrade, is limited though, so I might be off on that.

See also #2881 for the migration guide.

@jamshale jamshale force-pushed the feat/2808 branch 5 times, most recently from 67cc007 to 1e7008f Compare April 30, 2024 14:52
@jamshale jamshale marked this pull request as ready for review April 30, 2024 14:52
@jamshale jamshale changed the title Upgrade to anoncreds via api enpoint Upgrade to anoncreds via api endpoint Apr 30, 2024
@ianco
Copy link
Contributor

ianco commented May 3, 2024

Got a couple of errors while testing this with the alice/faber demo (errors on the alice side).

I think the response from the upgrade should include some kind of status re what was upgraded, currently the response is status 200 with an empty {}. For example we could return {"success":true,"rec_count":3} or whatever ...

My testing:

Ran local postgres instance:

$ docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres -c 'log_statement=all' -c 'logging_collector=on' -c 'log_destination=stderr'

Ran alice and faber:

``
$ AGENT_PORT_OVERRIDE=8010 POSTGRES=1 ./run_demo run faber --wallet-type askar --revocation
$ POSTGRES=1 ./run_demo run alice --wallet-type askar


Connect, issue and revoke several credentials.

Upgrade each wallet using the endpoint.

- Response is status 200 with body `{}`

Agents were shut down and the demo doesn't have an option to restart.

So ...  try again with multitenancy.

Restart postgres and then start alice/faber with multitenancy:

$ AGENT_PORT_OVERRIDE=8010 POSTGRES=1 ./run_demo run faber --wallet-type askar --revocation --multitenant
$ POSTGRES=1 ./run_demo run alice --wallet-type askar --multitenant


Connect, issue and revoke several credentials (issue 4 and revoke 3).

Upgrade the alice wallet using the endpoint.

- Response is status 200 with body `{}`

Try a proof request, got an error in alice:

Traceback (most recent call last):
File "/home/aries/demo/runners/support/agent.py", line 1085, in admin_request
resp.raise_for_status()
File "/home/aries/.local/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Error creating presentation. Error converting link secret: Internal OpenSSL error: OpenSSL error.', url=URL('http://host.docker.internal:8031/present-proof-2.0/records/33bd849c-cfc5-4363-a04c-2b61390deed3/send-presentation')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/aries/demo/runners/agent_container.py", line 579, in handle_present_proof_v2_0
await self.admin_POST(
File "/home/aries/demo/runners/support/agent.py", line 1185, in admin_POST
Alice | 2024-05-03 17:51:23,684 aries_cloudagent.admin.server ERROR Handler error with exception: Error creating presentation. Error converting link secret: Internal OpenSSL error: OpenSSL error.
Alice | Traceback (most recent call last):
Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 545, in create_presentation
Alice | presentation = await asyncio.get_event_loop().run_in_executor(
Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in await
Alice | yield self # This tells Task to wait for completion.
Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
Alice | future.result()
Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result
Alice | raise self._exception
Alice | File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Alice | result = self.fn(*self.args, **self.kwargs)
Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/types.py", line 616, in create
Alice | bindings.create_presentation(
Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 763, in create_presentation
Alice | do_call(
Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 477, in do_call
Alice | raise get_current_error(True)
Alice | anoncreds.error.AnoncredsError: Error converting link secret: Internal OpenSSL error: OpenSSL error
Alice |
Alice | The above exception was the direct cause of the following exception:
Alice |
Alice | Traceback (most recent call last):
Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/routes.py", line 1204, in present_proof_send_presentation
Alice | pres_ex_record, pres_message = await pres_manager.create_pres(
Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/manager.py", line 275, in create_pres
Alice | pres_tuple = await pres_exch_format.handler(self._profile).create_pres(
Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/formats/indy/handler.py", line 162, in create_pres
Alice | return await self.anoncreds_handler.create_pres(
Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/formats/anoncreds/handler.py", line 164, in create_pres
Alice | indy_proof = await indy_handler.return_presentation(
Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/anoncreds/pres_exch_handler.py", line 258, in return_presentation
Alice | indy_proof_json = await self.holder.create_presentation(
Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 562, in create_presentation
Alice | raise AnonCredsHolderError("Error creating presentation") from err
Alice | aries_cloudagent.anoncreds.holder.AnonCredsHolderError: Error creating presentation
Alice |
Alice | The above exception was the direct cause of the following exception:
Alice |
Alice | Traceback (most recent call last):
Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 188, in ready_middleware
Alice | return await handler(request)
Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 259, in debug_middleware
Alice | return await handler(request)
Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 430, in check_multitenant_authorization
Alice | return await handler(request)
Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 495, in setup_context
Alice | return await task
Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in await
Alice | yield self # This tells Task to wait for completion.
Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
Alice | future.result()
Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result
Alice | raise self._exception
Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
Alice | result = coro.send(None)
Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 222, in upgrade_middleware
Alice | return await handler(request)
Alice | File "/home/aries/.local/lib/python3.9/site-packages/aiohttp_apispec/middlewares.py", line 45, in validation_middleware
Alice | return await handler(request)
Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/routes.py", line 1222, in present_proof_send_presentation
Alice | await report_problem(
Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/init.py", line 61, in report_problem
Alice | raise http_error_class(reason=err.roll_up) from err
Alice | aiohttp.web_exceptions.HTTPBadRequest: Error creating presentation. Error converting link secret: Internal OpenSSL error: OpenSSL error.
Alice |


Upgrade the faber wallet using the endpoint.

- Response is status 200 with body `{}`

Try issuing a credential, results in an error on Alice:

15 After receiving credential offer, send credential request
Alice | Credential: state = request-sent, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e
Alice | Credential: state = credential-received, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e
Alice | 2024-05-03 17:54:55,035 aries_cloudagent.protocols.issue_credential.v2_0.formats.anoncreds.handler ERROR Error storing credential: None - Error processing received credential
Alice | 2024-05-03 17:54:55,035 aries_cloudagent.messaging.base_handler ERROR Error storing issued credential
Alice | Traceback (most recent call last):
Alice | Credential: state = abandoned, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e

Credential exchange abandoned
Alice | Problem report message: Error processing received credential. Input error [missing field link_secret_blinding_data at line 1 column 839].
Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 197, in store_credential
Alice | cred_recvd = await asyncio.get_event_loop().run_in_executor(
Alice | Credential: state = done, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e
Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in await
Alice | yield self # This tells Task to wait for completion.
Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
Alice | future.result()
Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result
Alice | raise self._exception
Alice | File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Alice | result = self.fn(*self.args, **self.kwargs)
Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/types.py", line 286, in process
Alice | cred_req_metadata = CredentialRequestMetadata.load(cred_req_metadata)
Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/types.py", line 146, in load
Alice | bindings._object_from_json(
Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 568, in _object_from_json
Alice | do_call(method, encode_bytes(value), byref(result))
Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 477, in do_call
Alice | raise get_current_error(True)
Alice | anoncreds.error.AnoncredsError: Input error [missing field link_secret_blinding_data at line 1 column 839]
Alice |
Alice | The above exception was the direct cause of the following exception:
Alice |
Alice | Traceback (most recent call last):
Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/handlers/cred_issue_handler.py", line 73, in handle
Alice | cred_ex_record = await cred_manager.store_credential(cred_ex_record)
Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/manager.py", line 625, in store_credential
Alice | await cred_format.handler(self.profile).store_credential(
Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/formats/indy/handler.py", line 500, in store_credential
Alice | return await self.anoncreds_handler.store_credential(
Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/formats/anoncreds/handler.py", line 441, in store_credential
Alice | raise e
Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/formats/anoncreds/handler.py", line 421, in store_credential
Alice | cred_id_stored = await holder.store_credential(
Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 206, in store_credential
Alice | raise AnonCredsHolderError("Error processing received credential") from err
Alice | aries_cloudagent.anoncreds.holder.AnonCredsHolderError: Error processing received credential

Copy link
Contributor

@ianco ianco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some errors testing using the alice/faber demo, see other comments.

@swcurran
Copy link
Contributor

swcurran commented May 3, 2024

Love the idea of a tutorial for doing this using Alice/Faber. Is that possible/reasonable/helpful? You do need to do something first (populate the storage) and then do an upgrade. Even if it does nothing, I think it would be helpful to demonstrate?

@ianco
Copy link
Contributor

ianco commented May 3, 2024

Tried another test, upgrading faber (and not alice) and everything works fine (can issue and request proof after the upgrade).

Then upgraded alice - same errors as above.

@jamshale
Copy link
Contributor Author

jamshale commented May 3, 2024

Hmm. I remember seeing that error a long time ago and thought I had fixed it. Possibly an issue with postgres. I'll look into it.

@ianco
Copy link
Contributor

ianco commented May 3, 2024

Love the idea of a tutorial for doing this using Alice/Faber. Is that possible/reasonable/helpful? You do need to do something first (populate the storage) and then do an upgrade. Even if it does nothing, I think it would be helpful to demonstrate?

It's easy to demonstrate with with alice/faber when using multitenancy (since you don't have to restart the agent). Not sure how to verify that the database is upgraded, probably the GET /multitenancy/wallet/{} endpoint?

For single tenancy we need to provide an option in the demo to restart the agent after it's been upgraded.

@ianco
Copy link
Contributor

ianco commented May 3, 2024

Hmm. I remember seeing that error a long time ago and thought I had fixed it. Possibly an issue with postgres. I'll look into it.

I didn't test with sqlite (which is what is used by default for the demo)

@ianco
Copy link
Contributor

ianco commented May 3, 2024

I think the response from the upgrade should include some kind of status re what was upgraded, currently the response is status 200 with an empty {}. For example we could return {"success":true,"rec_count":3} or whatever ...

... and if it's a single-tenant upgrade we can include a message that "Agent shut down need to restart" ...

@swcurran
Copy link
Contributor

swcurran commented May 3, 2024

Remind me why do we need a restart in single-tenant mode, but not in multi-tenant mode?

And is there any way to eliminate the requirement?

Maybe we ignore the start up option and use what the storage says?

@jamshale
Copy link
Contributor Author

jamshale commented May 3, 2024

Remind me why do we need a restart in single-tenant mode, but not in multi-tenant mode?

And is there any way to eliminate the requirement?

Maybe we ignore the start up option and use what the storage says?

The restart is because we need to reload the context (all the anoncreds modules, the anoncreds profile).

Currently, it is kind of ignoring the wallet-type configuration on the restart start-up. It's reloading the context and anoncreds modules. However, it seems safer to force the restart for an agent which has an active context. Has already injected other classes into the profile and so on.

That being said, it might be possible to reload the context like it's doing on startup now when the upgrade completion is detected. I didn't do it, because I didn't think a restart was an issue.

@jamshale
Copy link
Contributor Author

jamshale commented May 3, 2024

Haven't been able to reproduce that error yet with the demo and postgres. Also, the integration test ACA-Py Anoncreds Upgrade was supposed to be testing this. Will try and reproduce it a bit more but might need @ianco to show me.

@jamshale
Copy link
Contributor Author

jamshale commented May 3, 2024

Love the idea of a tutorial for doing this using Alice/Faber. Is that possible/reasonable/helpful? You do need to do something first (populate the storage) and then do an upgrade. Even if it does nothing, I think it would be helpful to demonstrate?

It's easy to demonstrate with with alice/faber when using multitenancy (since you don't have to restart the agent). Not sure how to verify that the database is upgraded, probably the GET /multitenancy/wallet/{} endpoint?

You can do this with the /settings endpoint as well now. The wallet.type will be askar-anoncreds after the upgrade has completed.

@ianco
Copy link
Contributor

ianco commented May 3, 2024

Haven't been able to reproduce that error yet with the demo and postgres. Also, the integration test ACA-Py Anoncreds Upgrade was supposed to be testing this. Will try and reproduce it a bit more but might need @ianco to show me.

OK we can jump on a call if we need to ...

@jamshale
Copy link
Contributor Author

jamshale commented May 3, 2024

@ianco I'll call you on Monday. I can't replicate it still, but not sure I'm using postgres correctly. I will want the morning to figure it out if it is a problem with actual upgrade and the blinded secret.

I know I had this problem before when I was re-creating the cred defs private and key proof data during the upgrade, but thought it was fixed after I changed the upgrade to use the existing values.

@swcurran
Copy link
Contributor

swcurran commented May 3, 2024

That being said, it might be possible to reload the context like it's doing on startup now when the upgrade completion is detected. I didn't do it, because I didn't think a restart was an issue.

My expectation of the “upgrade” process for an operating ACA-Py instance in a Kubernetes environment is:

  • Scale down all current Controller instances using the old CredX implementation.
  • Scale up new Controller instances with updated code to use AnonCreds-RS implementation.
    • The first thing the updated Controller code does is invoke the upgrade endpoint.
    • In practice, of course, only the first instance to start does the actual upgrade. The rest check (possibly waiting…) and discover that the upgrade is complete and carry on without doing anything.
    • As Controller instances come and go (as they do in a Kubebernetes environment), every new instance would invoke the upgrade endpoint.

If the upgrade has to be done by a special instance of the code that has to restart different things, it seems like it is much harder to manage.

That approach also works fine for a non-Kubernetes environment.

@jamshale
Copy link
Contributor Author

jamshale commented May 3, 2024

I think it works the way you are describing.

The controller doesn't have to manage anything. The acapy agents just need to restart after they detect the upgrade completed, like if they crashed. This would be managed by any type of cloud deployment, like kubernetes.

@ianco
Copy link
Contributor

ianco commented May 6, 2024

I tested again this morning, a few different scenarios.

If I create the wallet (alice) and then immediately upgrade, then there is no problem and I can issue and request proofs.

If I first issue and revoke some credentials (issued 4 revoked 3) and then upgrade alice, then I get the above error when I try to request a proof.

@jamshale
Copy link
Contributor Author

jamshale commented May 6, 2024

Looks like there's a small change in the link_secret record's formatting I never noticed.

Existing record was saved as json value

{"value":{"ms":"71064680940360014629618879654672290185096545699652080871503389385314050854890"}}

but in anoncreds it's expecting to be saved directly as a string 71064680940360014629618879654672290185096545699652080871503389385314050854890

Hoping it should be a trivial fix. Thanks for finding this.

@jamshale
Copy link
Contributor Author

jamshale commented May 7, 2024

@ianco This has been updated. There was two issues when upgrading a multitenant holder.

  1. Anoncreds expected the link_secret record to be in a slightly different format. The conversion was added to the upgrade process.
  2. When upgrading a subwallet with a credential in the cache, it was using the cache after the upgrade. Causing the new credential of the same type to go into the indycredx path. I flushed the cache during upgrade and it fixed this issue.

Expanded the integration test to cover upgrading the holder subwallet and re-issuing a credential.

@jamshale
Copy link
Contributor Author

jamshale commented May 7, 2024

Also added a small response payload to the upgrade endpoint.

@ianco
Copy link
Contributor

ianco commented May 7, 2024

@jamshale there are conflicts, I think because I merged the other PR - #2861

@jamshale
Copy link
Contributor Author

jamshale commented May 7, 2024

@jamshale there are conflicts, I think because I merged the other PR - #2861

No problem. I thought there might have been more. I'll rebase it in a bit.

@ianco
Copy link
Contributor

ianco commented May 7, 2024

Also added a small response payload to the upgrade endpoint.

Looks good!

{
  "success": true,
  "message": "Upgrade to anoncreds has been triggered for wallet alice1"
}

@ianco
Copy link
Contributor

ianco commented May 7, 2024

Some success! I was able to upgrade both of the alice and faber wallets according to my test scenario above (start alice and faber with askar wallet and in multitenant mode with revocation enabled, issue and revoke a bunch of credentials, then upgrade alice and then faber wallets). However all proof requests from faber to alice return false. I thought it might be related to the other reported bug (#2934) so I deleted all alice credentials from her wallet using the api, issued a new credential (not revoked) and then tried another proof request and it still returned false.

With the askar wallet the proof would still pass as long as there was at least one valid credential, regardless of how many were revoked. So I suspect the issue is in either the anoncreds-rs code or in the aca-py anoncreds-related code.

@jamshale
Copy link
Contributor Author

jamshale commented May 7, 2024

Some success! I was able to upgrade both of the alice and faber wallets according to my test scenario above (start alice and faber with askar wallet and in multitenant mode with revocation enabled, issue and revoke a bunch of credentials, then upgrade alice and then faber wallets). However all proof requests from faber to alice return false. I thought it might be related to the other reported bug (#2934) so I deleted all alice credentials from her wallet using the api, issued a new credential (not revoked) and then tried another proof request and it still returned false.

With the askar wallet the proof would still pass as long as there was at least one valid credential, regardless of how many were revoked. So I suspect the issue is in either the anoncreds-rs code or in the aca-py anoncreds-related code.

Yes. I think this is the same issue I reported here. I hadn't tried other scenario you described. I thought it was some logic getting the most valid credential, but looks like that might not be the case. I don't think it's related to the upgrade at all, as I can replicate it with fresh anoncreds agents.

I will test some other cases as well but when I hadn't revoked any credentials, the proof verified correctly after the upgrade. It was only when any credentials had been revoked and the wallet was anoncreds that the proof always returned false.

Copy link

sonarqubecloud bot commented May 7, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@ianco
Copy link
Contributor

ianco commented May 8, 2024

... but looks like that might not be the case. I don't think it's related to the upgrade at all, as I can replicate it with fresh anoncreds agents.

If it's an existing issue we should probably merge this PR and then deal with it as a separate issue.

Copy link
Contributor

@ianco ianco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There is an issue with validating proofs in anoncreds but that's independent of these changes.

demo/features/upgrade.feature Show resolved Hide resolved
@jamshale jamshale merged commit 84fc383 into openwallet-foundation:main May 8, 2024
8 checks passed
Comment on lines +39 to +59
async def get_subwallet_profiles_from_storage(root_profile: Profile) -> list[Profile]:
"""Get subwallet profiles from storage."""
subwallet_profiles = []
base_storage_search = root_profile.inject(BaseStorageSearch)
search_session = base_storage_search.search_records(
type_filter=WalletRecord.RECORD_TYPE, page_size=10
)
while search_session._done is False:
wallet_storage_records = await search_session.fetch()
for wallet_storage_record in wallet_storage_records:
wallet_record = WalletRecord.from_storage(
wallet_storage_record.id,
json.loads(wallet_storage_record.value),
)
subwallet_profiles.append(
await MultitenantManager(root_profile).get_wallet_profile(
base_context=root_profile.context,
wallet_record=wallet_record,
)
)
return subwallet_profiles
Copy link
Contributor

@ff137 ff137 Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jamshale - we're experiencing some issues with the check_for_wallet_upgrades_in_progress method.
This method gets called upon startup, and if we have existing wallets in the DB, then we get the following stacktrace:

Traceback (most recent call last):
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/askar/store.py\", line 154, in open_store
    store = await Store.open(
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_askar/store.py\", line 352, in open
    return Store(await bindings.store_open(uri, key_method, pass_key, profile), uri)
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_askar/bindings/__init__.py\", line 84, in store_open
    return await invoke_async(
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_askar/bindings/lib.py\", line 362, in invoke_async
    return await self.loaded.invoke_async(
  File \"/usr/local/lib/python3.9/asyncio/futures.py\", line 284, in __await__
    yield self  # This tells Task to wait for completion.
  File \"/usr/local/lib/python3.9/asyncio/tasks.py\", line 328, in __wakeup
    future.result()
  File \"/usr/local/lib/python3.9/asyncio/futures.py\", line 201, in result
    raise self._exception
aries_askar.error.AskarError: Error fetching store configuration
Caused by: error returned from database: Request returned an error: database \"472436df-6126-4083-85fb-1eacc0ecfcf5\" does not exist.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/commands/start.py\", line 72, in init
    await startup
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/commands/start.py\", line 29, in start_app
    await conductor.start()
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/core/conductor.py\", line 522, in start
    await self.check_for_wallet_upgrades_in_progress()
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/core/conductor.py\", line 846, in check_for_wallet_upgrades_in_progress
    subwallet_profiles = await get_subwallet_profiles_from_storage(
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/utils/profiles.py\", line 54, in get_subwallet_profiles_from_storage
    await MultitenantManager(root_profile).get_wallet_profile(
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/multitenant/manager.py\", line 85, in get_wallet_profile
    profile, _ = await wallet_config(context, provision=provision)
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/config/wallet.py\", line 54, in wallet_config
    profile = await mgr.open(context, profile_cfg)
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/askar/profile.py\", line 300, in open
    opened = await store_config.open_store(provision=False)
  File \"/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/askar/store.py\", line 168, in open_store
    raise ProfileError(\"Error opening store\") from err
aries_cloudagent.core.error.ProfileError: Error opening store"

I don't quite understand what could be going wrong, because that error:
Request returned an error: database \"472436df-6126-4083-85fb-1eacc0ecfcf5\" does not exist.
is referring to a wallet_id that does exist in the DB. Maybe it's just trying to read it incorrectly?

We're using askar wallet_type, and postgres_storage. (edit: note, we're testing with the most recent nightly build)
It starts up successfully from a clean start, with no existing wallets, but then restarting after creating wallets leads to the above error.

Does any of this look familiar to you, or do you have any idea what might be going wrong? Any help will be appreciated!

Copy link
Contributor Author

@jamshale jamshale Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen this before. This method is simply trying to get all subwallets from the DB (wallet).

I think maybe the store needs to be provisioned. The MultitenantManager(root_profile).get_wallet_profile() method call has a default to the provision argument as False. It seems like one or more of the subwallets is in a different state than expected. Or maybe I can skip getting the subwallet if it's in this state.

Thanks for finding this with your testing. I will try to replicate and fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the feedback!

One thing that may make our situation unique is the use of Amazon RDS Proxy, for the DB.

I'm personally not so informed on our infra config, but our SRE mentioned that there's a quirk with bootstrapping the DB. We modify the ACAPY_WALLET_STORAGE_CONFIG after an initialisation, to point to an RDS Proxy URL.

I'll need to gather more details to correctly relay the specifics... but that's one factor that may make our setup different.

It sounds like provision=True may be necessary in this case? If you think that might be it, I could try modify that on a fork and test on our end. But only by tomorrow 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants