-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Askar wallet sessions result in a deadlock #1417
Comments
To bump this thread a bit :) After stress testing Aca-Py application (Askar wallet type with the underlying Postgres storage), we found some potential issues with open sessions. When accessing the wallet, sessions in several places tend to stay open for a longer period than needed. One case where this could become problematic is when a user tries to pack messages - https://github.com/hyperledger/aries-cloudagent-python/blob/984fa194cd952b760d96bca7d83a219a0c96a8fa/aries_cloudagent/wallet/askar.py#L656. Current implementation keeps the session open when retrieving the key and also when executing pack function with that key. It's similar with the unpacking process where the session is not opened only for key retrieval but for the whole process. The proposed solution above is a bit more complicated in this case. We would need to extract a lot of logic out of this module. It would basically mean a rewrite of this layer. Was there any discussion on how to separate/extract these parts? |
Hi @MaticDiba , no we haven't had any discussions yet regarding a rewrite of this module. FYI I have some other outstanding changes to fix askar deadlocks in other areas (see #1450). One thing I've found is that the deadlocks occur when running against a SQLite wallet but not against Postgres. Have you tried stress testing using a Postgres wallet? |
Correct. We have tested this using Askar wallet type and Postgres as a storage. |
Hi @MaticDiba I agree with your proposed solution, please proceed with any changes you need to make to eliminate the deadlocks. FYI I ran into one scenario where I had to create a session at a higher level and then pass through to the lower level (see this commit: b25ae4f). This was required due to some fairly complicated logic around updating revocation status - I don't think this is a good solution in general (I prefer your approach) but we may need to do this in some limited situations. |
When using Askar wallet type combined with Postgres storage and calling some endpoints at the same time, deadlocks happen. The issue is that the session (DB connection) is acquired at the start of the request and is not dropped until the end. That would be ok (not performance-wise, since this limits the number of concurrent requests to the number of max DB connections) but there are some cases where the second session is created while the first one is still active.
The issue arises if X concurrent requests acquire the first connection to the database but are unable to get the second one. In this case, all requests will be stuck waiting for an available connection (where X is at least the number of max database connections).
Example
create_request handler is an example. At the initialization, a session is created and another session is created inside get_default_mediator function, while the first session is still active.
Another example is receive_invitation handler where new session is created inside the send_reply function (when auto_accept is enabled).
Proposed solution
Due to the performance and deadlock issues, the proposed solution is to create sessions only on demand in the lower scope. The session should also be closed right after the wallet query. The mentioned approach is already used by some managers, for example in the Mediation manager
The text was updated successfully, but these errors were encountered: