Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support waiting until a gossipsub topic has enough peers #454

Closed
mvdan opened this issue Sep 23, 2021 · 9 comments · Fixed by #452
Closed

support waiting until a gossipsub topic has enough peers #454

mvdan opened this issue Sep 23, 2021 · 9 comments · Fixed by #452

Comments

@mvdan
Copy link
Contributor

mvdan commented Sep 23, 2021

Right now, most tests that build a gossipsub mesh use sleeps to wait for the mesh to build:

// wait for heartbeats to build mesh
time.Sleep(time.Second * 2)

The indexer-reference-provider codebase has the same problem:

https://github.com/filecoin-project/indexer-reference-provider/blob/3ec1151ca682a437c6e38727aa00b1087e5a0276/core/engine/engine_test.go#L186-L188

It seems like it would be enough to wait for enough peers to be connected before publishing advertisements.
Right now, the following options are available:

  1. Sleep. This is what all the tests do, but one has to use a very long sleep to reduce the chances of flakes. Even with multi-second sleeps, the CI over at indexer-reference-provider still ran into failures at times.

  2. Poll PubSubRouter.EnoughPeers until it returns true. Would work, but just like any other form of polling, it's a tradeoff between slowness and overhead.

  3. Poll Topic.ListPeers until its length reaches a minimum. Same polling problems as 1.

  4. Use WithReadiness when publishing. Unfortunately, this option only works when WithDiscovery is used, and the indexer doesn't use that. The option seems to be a no-op in those cases.

I think we should support some way to do this without sleeping nor polling, nor without requiring WithDiscovery. Here are two proposed API additions:

  1. Modify WithReadiness so that it works in all configurations. Without WithDiscovery, it would simply make Topic.Publish block until enough peers are connected (or until Publish's ctx is cancelled).

  2. Add another method, such as func (*Topic) WaitEnoughPeers(context.Context, int) bool, which would block until enough peers are connected or ctx is cancelled.

5 seems preferrable to 6, for the sake of making existing APIs more consistently useful and not adding redundant APIs. However, I'm not familiar enough with WithReadiness to tell if it's the best approach internally.

@vyzo
Copy link
Collaborator

vyzo commented Sep 23, 2021 via email

@mvdan
Copy link
Contributor Author

mvdan commented Sep 23, 2021

Sounds good, thanks. I'll update the existing PR to do 5 and let you know.

@mvdan
Copy link
Contributor Author

mvdan commented Sep 24, 2021

I'm not sure how I can implement 5 in a way that doesn't poll.

With func WithReadiness(ready RouterReady) PubOpt, I need to supply a func(rt PubSubRouter, topic string) (bool, error). There's MinTopicSize, but that uses EnoughPeers, so that would get us back to polling at intervals, I think.

I see Topic has its own events like PeerJoin and PeerLeave, but if all I have is PubSubRouter and the topic string, it's not clear to me how to then make use of the Topic events.

I might just implement this with polling PubSubRouter.EnoughPeers for now, to get the API right, but it's certainly not the right internal implementation. Any hints appreciated, because I'm not sure how to do that.

@vyzo
Copy link
Collaborator

vyzo commented Sep 24, 2021 via email

@vyzo
Copy link
Collaborator

vyzo commented Sep 24, 2021 via email

@mvdan
Copy link
Contributor Author

mvdan commented Sep 24, 2021

Hmm, I can't really see how I can obtain a *Topic from a PubSubRouter. *PubSub does have all the topics, but RouterReady doesn't have a *PubSub.

I've pushed a polling version of MinTopicSize at #452, and switched a few tests over to it. Notice the TODO added to one of the tests, because I think I'm getting confused there.

@vyzo
Copy link
Collaborator

vyzo commented Sep 24, 2021 via email

@vyzo
Copy link
Collaborator

vyzo commented Sep 24, 2021 via email

@vyzo
Copy link
Collaborator

vyzo commented Sep 24, 2021

Yeah, it's definitely not thread-safe.

So the way to go here would be to extend the signature of the option to receive the pubsub object as well, so that you can use it.

Then, if we end up polling (again, perfectly acceptable for initial implementation) use this contraption to call EnoughPeers in a safe context:

res := make(chan bool, 1)
ps.eval <- func() { res <- ps.rt.EnoughPeers() }

@BigLep BigLep linked a pull request Oct 24, 2021 that will close this issue
@vyzo vyzo closed this as completed in #452 Oct 29, 2021
vyzo pushed a commit that referenced this issue Oct 29, 2021
That is, when MinTopicSize is used but not WithDiscovery,
Publish will keep waiting until MinTopicSize's condition is met.

At the moment, this is done by polling every 200ms.
In the future, the mechanism could be optimized to be event-based.
A TODO is left for that purpose.

Fixes #454.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants