Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 DIDRotate Hangup can sporadically raise 500 error #3306

Closed
ff137 opened this issue Oct 22, 2024 · 0 comments · Fixed by #3310
Closed

🐛 DIDRotate Hangup can sporadically raise 500 error #3306

ff137 opened this issue Oct 22, 2024 · 0 comments · Fixed by #3310

Comments

@ff137
Copy link
Contributor

ff137 commented Oct 22, 2024

The hangup method in the DIDRotateManager is defined as follows:

    async def hangup(self, conn: ConnRecord) -> Hangup:
        """Hangup the connection.

        Args:
            conn (ConnRecord): The connection to hangup.
        """

        hangup = Hangup()

        async with self.profile.session() as session:
            await conn.delete_record(session)

        responder = self.profile.inject(BaseResponder)
        await responder.send(hangup, connection_id=conn.connection_id)

        return hangup

As you can see, we call await conn.delete_record(session) before we submit the hangup request to the Responder. As far as I can tell, deleting the record before hanging up is the cause of the following unhandled exception:

Handler error: BaseConnectionManager.get_connection_targets
Handler error: upgrade_middleware
Handler error with exception: Record not found: connection/df4accad-96d9-41fa-b534-a6400966d270
=================
Traceback (most recent call last):
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/admin/server.py", line 141, in ready_middleware
    return await handler(request)
    ^^^^^^^^^^^^^^^^^^^^^^
...
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/protocols/did_rotate/v1_0/routes.py", line 101, in hangup
    msg = await did_rotate_mgr.hangup(conn)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/protocols/did_rotate/v1_0/manager.py", line 86, in hangup
    await responder.send(hangup, connection_id=conn.connection_id)
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/messaging/responder.py", line 104, in send
    return await self.send_outbound(
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/admin/server.py", line 108, in send_outbound
    return await self._send(profile, message)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/core/conductor.py", line 665, in outbound_message_router
	raise StorageNotFoundError(f"Record not found: {record_type}/{record_id}")
aries_cloudagent.storage.error.StorageNotFoundError: Record not found: connection/df4accad-96d9-41fa-b534-a6400966d270
Error handling request

This internal server error occurs sporadically / inconsistently, and so it appears to be a race condition based on how quickly the deleted record reflects in storage, after which it cannot be used in outbound_message_router to complete the hangup.

A subsequent retry of the hangup, after the 500, will then yield a 404, as record has been deleted.

The method should be made resilient so that exceptions are handled, and retrying is possible if necessary.

My guess is that await conn.delete_record(session) should just be moved after the hangup is sent to the responder, but that may be naive, as I'm not sure. I'll test a solution in the next couple days, but feedback is welcome in the meantime.

Edit: first observed with v1.0.0, but also with latest release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant