Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event subscription handler hangs after a while #8

Closed
Tracked by #381
snarfed opened this issue Aug 29, 2023 · 8 comments
Closed
Tracked by #381

Event subscription handler hangs after a while #8

snarfed opened this issue Aug 29, 2023 · 8 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Aug 29, 2023

At least a few hours. Not sure why, probably websocket subscriptions something or other.

@snarfed
Copy link
Owner Author

snarfed commented Sep 2, 2023

Couldn't find anything relevant in logs, surprisingly. Still don't know what the root cause here is, but I may experiment with multiple workers vs eventlet vs gevent, etc: https://flask-sock.readthedocs.io/en/latest/web_servers.html

@snarfed
Copy link
Owner Author

snarfed commented Mar 7, 2024

Looking at this a bit more now. Here's an example from today, two things happened:

  1. websocket request to /xrpc/com.atproto.sync.subscribeRepos that succeeded, served for 1h+10s, then returned HTTP 101 continue
  2. another websocket request to /xrpc/com.atproto.sync.subscribeRepos that did nothing, hung for 1h, then returned HTTP 504

After that, the Bluesky relay kept making subscribeRepos requests that did the same thing as in snarfed/arroba#2.

I suspect we're not handling something in the initial HTTP 101 code path, and that's killing the serving thread or something similar.

2024-03-06 06:51:48.496 - 07:51:58.540
GET 101 /xrpc/com.atproto.sync.subscribeRepos?cursor=1864

New websocket client for com.atproto.sync.subscribeRepos
com.atproto.sync.subscribeRepos: {'cursor': 1864} None
Running method
subscribeRepos: fetching existing commits from seq 1864
Sending to com.atproto.sync.subscribeRepos websocket client: {'op': 1, 't': '#commit'} {'repo': 'did:plc:sp6aj7mvt7dw26xp6wbvbv6i', 'ops': [{'action': 'update', 'path': 'app.bsky.actor.profile/self', 'cid': CID('base32', 1, 'dag-cbor', '12202e0b1c21d9f0ebeffda410546889e3f4d22d91d2baef7996c9e920cb4c1f2340')}], 'commit': CID('base32', 1, 'dag-cbor', '1220b1ccee20ec7e5ef7aa55752834791e7214bdc7e9ed91ea26351bf72700fb6188'), 'blocks': b":\xa2eroots\x81\xd8*X%\x00\x01q\x12 \xb1\xcc\xee \xec~^\xf7\xaaUu(4y\x1er\x14\xbd\xc7\xe9\xed\x91\xea&5\x1b\xf7'\x00\xfba\x88gversion\x01\x88\x02\x01q\x...
subscribeRepos: serving new commits

2024-03-06 07:51:58.945 - 08:51:58.946
GET 504 /xrpc/com.atproto.sync.subscribeRepos?cursor=1864

@snarfed
Copy link
Owner Author

snarfed commented Mar 28, 2024

miguelgrinberg/flask-sock#7 ?

@snarfed snarfed transferred this issue from snarfed/arroba Mar 29, 2024
@snarfed snarfed changed the title PDS hangs after a while Event subscription handler hangs after a while Mar 29, 2024
@snarfed
Copy link
Owner Author

snarfed commented Mar 29, 2024

I suspect we're not handling something in the initial HTTP 101 code path, and that's killing the serving thread or something similar.

Looking more and more likely. Looking at logs from the most recent time this happened, we served 20 101s and then started serving 504s, and hub.yaml runs gunicorn with --threads 20.

@snarfed
Copy link
Owner Author

snarfed commented Mar 29, 2024

Filed miguelgrinberg/flask-sock#78

snarfed added a commit to snarfed/arroba that referenced this issue Mar 31, 2024
… to constructor

breaking change, backward incompatible! for snarfed/lexrpc#8
@snarfed
Copy link
Owner Author

snarfed commented Apr 1, 2024

snarfed/arroba@9e1a356

snarfed added a commit to snarfed/bridgy-fed that referenced this issue Apr 1, 2024
@snarfed
Copy link
Owner Author

snarfed commented Apr 1, 2024

Deployed. If it stays up past ~1:30p PT tomorrow, I'll declare victory.

@snarfed
Copy link
Owner Author

snarfed commented Apr 5, 2024

Confirmed fixed!

@snarfed snarfed closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant