Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

[Spaces] Handle and log when space summary requests time out instead of dumping a traceback #10026

Closed
Ezwen opened this issue May 20, 2021 · 6 comments · Fixed by #10662
Closed
Assignees
Labels
P3 (OBSOLETE: use S- labels.) Approved backlog: not yet scheduled, will accept patches T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@Ezwen
Copy link

Ezwen commented May 20, 2021

Description

I am hosting my own homeserver, while a colleague of mine is using matrix.org's server.

Yesterday, I have accepted an invite to join a Space sent by this colleague. This sort-of worked, but Element told me that Spaces were not enabled on my homeserver. I went and added experimental_features: { spaces_enabled: true } to my synapse, restarted, and the message was gone.

However, I then found out the following errors in my homeserver's logs:

mai 20 09:10:47 hallownest b3685b2aadd9[424]: 2021-05-20 09:10:47,170 - synapse.federation.federation_client - 547 - WARNING - POST-607- Failed to fetch space summary via matrix-client.matrix.org
mai 20 09:10:47 hallownest b3685b2aadd9[424]: Traceback (most recent call last):
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 511, in _send_request
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     response = await request_deferred
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1443, in _inlineCallbacks
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     result = current_context.run(result.throwExceptionIntoGenerator, g)
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     return g.throw(self.type, self.value, self.tb)
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/federation/matrix_federation_agent.py", line 189, in request
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     res = yield make_deferred_yieldable(
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1443, in _inlineCallbacks
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     result = current_context.run(result.throwExceptionIntoGenerator, g)
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     return g.throw(self.type, self.value, self.tb)
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/federation/matrix_federation_agent.py", line 295, in _do_connect
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     raise first_exception
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/federation/matrix_federation_agent.py", line 281, in _do_connect
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     result = await make_deferred_yieldable(
mai 20 09:10:47 hallownest b3685b2aadd9[424]: twisted.internet.error.TimeoutError: User timeout caused connection failure.
mai 20 09:10:47 hallownest b3685b2aadd9[424]: 
mai 20 09:10:47 hallownest b3685b2aadd9[424]: The above exception was the direct cause of the following exception:
mai 20 09:10:47 hallownest b3685b2aadd9[424]: 
mai 20 09:10:47 hallownest b3685b2aadd9[424]: Traceback (most recent call last):
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_client.py", line 517, in _try_destination_list
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     return await callback(destination)
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_client.py", line 1104, in send_request
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     res = await self.transport_layer.get_space_summary(
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/transport/client.py", line 1010, in get_space_summary
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     return await self.client.post_json(
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 821, in post_json
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     response = await self._send_request(
mai 20 09:10:47 hallownest b3685b2aadd9[424]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/matrixfederationclient.py", line 515, in _send_request
mai 20 09:10:47 hallownest b3685b2aadd9[424]:     raise RequestSendFailed(e, can_retry=True) from e
mai 20 09:10:47 hallownest b3685b2aadd9[424]: synapse.api.errors.RequestSendFailed: Failed to send request: TimeoutError: User timeout caused connection failure.
mai 20 09:10:47 hallownest b3685b2aadd9[424]: 2021-05-20 09:10:47,173 - synapse.handlers.space_summary - 297 - WARNING - POST-607- Unable to get summary of !JVMcvzCIGCAMPYpUgY:matrix.org via federation: 502: Failed to fetch space summary via any server

At the same time I realized that while the list of rooms of the space is shown correctly, the list of members of the space only shows my account, and not the account of my colleague who created the space. Not sure this is related though.

Steps to reproduce

  • from a custom homeserver, join a space created on matrix.org
  • enable the spaces feature on the custom homeserver
  • restart

Version information

  • Homeserver: my own custom homeserver (I can give the address in private if needed)

If not matrix.org:

  • Version: v1.34.0

  • Install method: docker image

  • Platform: debian 10
@erikjohnston
Copy link
Member

Hmm, that error looks like its due to matrix.org server being overloaded. Does the request eventually succeed?

At the same time I realized that while the list of rooms of the space is shown correctly, the list of members of the space only shows my account, and not the account of my colleague who created the space. Not sure this is related though.

I don't think that would be related to that stack trace.

@erikjohnston erikjohnston added the X-Needs-Info This issue is blocked awaiting information from the reporter label May 20, 2021
@Ezwen
Copy link
Author

Ezwen commented May 21, 2021

Hmm, that error looks like its due to matrix.org server being overloaded. Does the request eventually succeed?

Hard to tell. I suppose it does, since I can list the rooms of the space and join them.

In that case it's probably as you say, with matrix.org being simply overloaded. Should maybe the error be clearer in the logs then? A stack trace can be frightening :).

At the same time I realized that while the list of rooms of the space is shown correctly, the list of members of the space only shows my account, and not the account of my colleague who created the space. Not sure this is related though.

I don't think that would be related to that stack trace.

OK, I'll open another issue for that other problem then.

@callahad
Copy link
Contributor

Should maybe the error be clearer in the logs then? A stack trace can be frightening :).

Agreed :)

@callahad callahad added P3 (OBSOLETE: use S- labels.) Approved backlog: not yet scheduled, will accept patches and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels May 27, 2021
@callahad callahad changed the title [Spaces] "Failed to fetch space summary via matrix-client.matrix.org" [Spaces] Handle and log when space summary requests time out instead of dumping a traceback May 27, 2021
@clokep
Copy link
Member

clokep commented May 27, 2021

I think this is in general how all of our federation client code works (it doesn't handle timeouts well)? Not 100% sure on that though.

@erikjohnston erikjohnston added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Jul 26, 2021
@erikjohnston
Copy link
Member

It should just be a case of catching RequestSendFailed and HttpResponseException in the appropriate place in space_summary

@clokep
Copy link
Member

clokep commented Aug 19, 2021

It should just be a case of catching RequestSendFailed and HttpResponseException in the appropriate place in space_summary

I think that _try_destination_list is meant to do this for you?

This error is appearing from this line (specifically the exc_info=True):

logger.warning(
"Failed to %s via %s", description, destination, exc_info=True
)

This was added in #3638, not sure why we need the exception info here though? Probably to try to understand the cause of unknown (programming) errors?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P3 (OBSOLETE: use S- labels.) Approved backlog: not yet scheduled, will accept patches T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
4 participants