🐛 DIDRotate Hangup can sporadically raise 500 error #3306

ff137 · 2024-10-22T11:13:48Z

The hangup method in the DIDRotateManager is defined as follows:

    async def hangup(self, conn: ConnRecord) -> Hangup:
        """Hangup the connection.

        Args:
            conn (ConnRecord): The connection to hangup.
        """

        hangup = Hangup()

        async with self.profile.session() as session:
            await conn.delete_record(session)

        responder = self.profile.inject(BaseResponder)
        await responder.send(hangup, connection_id=conn.connection_id)

        return hangup

As you can see, we call await conn.delete_record(session) before we submit the hangup request to the Responder. As far as I can tell, deleting the record before hanging up is the cause of the following unhandled exception:

Handler error: BaseConnectionManager.get_connection_targets
Handler error: upgrade_middleware
Handler error with exception: Record not found: connection/df4accad-96d9-41fa-b534-a6400966d270
=================
Traceback (most recent call last):
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/admin/server.py", line 141, in ready_middleware
    return await handler(request)
    ^^^^^^^^^^^^^^^^^^^^^^
...
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/protocols/did_rotate/v1_0/routes.py", line 101, in hangup
    msg = await did_rotate_mgr.hangup(conn)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/protocols/did_rotate/v1_0/manager.py", line 86, in hangup
    await responder.send(hangup, connection_id=conn.connection_id)
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/messaging/responder.py", line 104, in send
    return await self.send_outbound(
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/admin/server.py", line 108, in send_outbound
    return await self._send(profile, message)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aries/.local/lib/python3.12/site-packages/aries_cloudagent/core/conductor.py", line 665, in outbound_message_router
	raise StorageNotFoundError(f"Record not found: {record_type}/{record_id}")
aries_cloudagent.storage.error.StorageNotFoundError: Record not found: connection/df4accad-96d9-41fa-b534-a6400966d270
Error handling request

This internal server error occurs sporadically / inconsistently, and so it appears to be a race condition based on how quickly the deleted record reflects in storage, after which it cannot be used in outbound_message_router to complete the hangup.

A subsequent retry of the hangup, after the 500, will then yield a 404, as record has been deleted.

The method should be made resilient so that exceptions are handled, and retrying is possible if necessary.

My guess is that await conn.delete_record(session) should just be moved after the hangup is sent to the responder, but that may be naive, as I'm not sure. I'll test a solution in the next couple days, but feedback is welcome in the meantime.

Edit: first observed with v1.0.0, but also with latest release

The text was updated successfully, but these errors were encountered:

This was referenced Oct 24, 2024

🐛 fix DidRotate Hangup error didx-xyz/aries-cloudapi-python#1138

Merged

🐛 Rearrange connection record deletion after hangup #3310

Merged

jamshale closed this as completed in #3310 Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 DIDRotate Hangup can sporadically raise 500 error #3306

🐛 DIDRotate Hangup can sporadically raise 500 error #3306

ff137 commented Oct 22, 2024 •

edited

Loading

🐛 DIDRotate Hangup can sporadically raise 500 error #3306

🐛 DIDRotate Hangup can sporadically raise 500 error #3306

Comments

ff137 commented Oct 22, 2024 • edited Loading

ff137 commented Oct 22, 2024 •

edited

Loading