Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(basichost): Emit Connected events asynchronously #2027

Closed
wants to merge 1 commit into from

Conversation

Wondertan
Copy link
Contributor

@Wondertan Wondertan commented Jan 28, 2023

This is more of a proposal at this point; for additional context, please look at #2026.

I am aware of the discussions in libp2p/go-libp2p-swarm#295 and libp2p/go-libp2p-swarm#104, but I still see that emitting specifically EvtConnectednessChanged and specifically on a new connection is reasonable enough, so let me explain:

  • I really like how @Kubuxu was able to phrase this in here: "because one misbehaving consuming system affects the performance of the whole application"
    • This is precisely what happened to me and my team in celestia-node. One hidden creeper bug exploded in our code. It's a rare case that requires a unique network setup. This bug prevented] reading out the events, which completely stops accepting new streams, halting the whole application. There were literally no other manifestations of the bug besides the client's inability to open any new streams to the buggy server, including the vital identify one, which was a source of our doubts initially, also considering @marten-seemann 's recent reasonable complaints about its complexity.
    • On the other hand, it's improbable that we would find this bug so soon if events were sent asynchronously. However, this will be fixed by feat(eventbus): warn applications whenever events aren't read #2026.
  • Why specifically Connected? Because Disconnected is already spawned in a new routine in here.
    • This PR spawns a routine specifically for event submission, which is a new canonical way to handle libp2p events, but registering a malicious/buggy Notifies.Connected is still possible so that the fix might be moved directly to the swamp code in here.
  • As per send notifications synchronously go-libp2p-swarm#295, the motivation for synchronous notifications was performance. The Connected event seems to be the most prevailing in the libp2p world, and launching them async again may cancel all the original goals. So here we have a classic tradeoff between performance and API safety/hygiene, and it's up to project maintainers to decide what's the best.
    • From one point of view feat(eventbus): warn applications whenever events aren't read #2026 should solve the issue. In case of malicious/buggy callback deployed, the application devs depend on node operators to report the issue if it happens in runtime, so interactively, the issue is fixed just by that PR, keeping performance gains.
    • On the other hand, by sacrificing a bit of performance, the system will continue operating without halts, althoug still expecting node operators to report the issue.

@marten-seemann
Copy link
Contributor

Thank you for opening this PR and discussing the tradeoffs.

I'd disagree with the reasoning though. I understand that the bug you encountered was annoying, and we should do something to make this easier to detect in the future, but I don't think slowing down the system by spawning Go routines is the right thing to do. Notifications should be really cheap to send and to consume, and they're not if we're spawning a new Go routine for every notification. And yes, this means that ideally, we'd also get rid of the Go routine for the Disconnected notification.

So what can we do? 2 things:

  1. Introduce a log message when the queue fills up (feat(eventbus): warn applications whenever events aren't read #2026).
  2. Introduce monitoring of the eventbus (eventbus: expose metrics #2020).

@Wondertan
Copy link
Contributor Author

Ok, performance is a good choice, and I am all for it, so closing this.

Honestly, I expected the following statement 🙂 . Happy to submit a patch for it as well.

And yes, this means that ideally, we'd also get rid of the Go routine for the Disconnected notification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants