Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Fx to start and stop the host, swarm, autorelay and quicreuse #2118

Merged
merged 16 commits into from
Mar 21, 2024
Merged

Conversation

marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Feb 20, 2023

Fixes #2117. Part of #1993. Part of #2514.

config/config.go Show resolved Hide resolved
if err != nil {
return nil, err
}
lifecycle.Append(fx.StartStopHook(ar.Start, ar.Close))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autorelay depends on Identify to start first, how can we enforce that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is guaranteed because we construct the host first, then autorelay (since the autorelay constructor depends on the host). Fx executes startup hooks in the order they were added, so this determines that host.Start is called before autorelay.Start. Now I admit that this is pretty subtle.

The correct solution is to remove the IDService() from the host, and instead pass a reference to identify to the autorelay constructor. That would make the dependency tree more obvious. Let's do that in a follow-up PR?

if err != nil {
return nil, err
}
lifecycle.Append(fx.StopHook(sw.Close))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we closing the swarm twice?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, we are but that's okay since the swarm close is guarded to only close once. The closing twice is useful because each close is logically for a different reason.

  1. Here we are closing because we created the swarm.
  2. Below, in decorate we are closing to ensure we close before the quicreuse conn manager.

By closing twice we prevent these different procedures from being tangled.

p2p/host/autorelay/autorelay_test.go Outdated Show resolved Hide resolved
p2p/transport/quic/virtuallistener.go Outdated Show resolved Hide resolved
This was referenced Apr 5, 2023
@marten-seemann marten-seemann mentioned this pull request Jun 3, 2023
29 tasks
@marten-seemann marten-seemann mentioned this pull request Jul 14, 2023
21 tasks
@marten-seemann marten-seemann mentioned this pull request Aug 15, 2023
23 tasks
Comment on lines +12 to +15
type closableBasicHost struct {
*fx.App
*basichost.BasicHost
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can have just one type which uses host.Host interface instead of a separate type for closableBasicHost and closableRoutedHost

type closableHost struct {
	*fx.App
	host.Host
}

config/config.go Outdated
// should probably fail if listening on *any* addr fails.
return sw.Listen(cfg.ListenAddrs...)
},
OnStop: func(context.Context) error { return sw.Close() },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this call sw.Close twice? One for this and one for the previous option where we do,

	lifecycle.Append(fx.StopHook(sw.Close))

I think this is fine though, swarm.Close is guarded by a sync.Once

config/config.go Outdated Show resolved Hide resolved
config/config.go Show resolved Hide resolved
p2p/test/transport/rcmgr_test.go Outdated Show resolved Hide resolved
@sukunrt
Copy link
Member

sukunrt commented Sep 27, 2023

Thanks for removing AutoRelayHost 🎉

Copy link
Collaborator

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a couple more tests. At least:

  1. A test that asserts the connmanager is closed after we close the host/swarm.
  2. A test that asserts their are no dangling goroutines after starting and stopping a host.

if err != nil {
return nil, err
}
lifecycle.Append(fx.StopHook(sw.Close))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, we are but that's okay since the swarm close is guarded to only close once. The closing twice is useful because each close is logically for a different reason.

  1. Here we are closing because we created the swarm.
  2. Below, in decorate we are closing to ensure we close before the quicreuse conn manager.

By closing twice we prevent these different procedures from being tangled.

To do this, I've had to move a few leaky tests into a separate package.
I've filed a bug for the AutoNAT issue (#2743) but the "error on
startup" issue is going to require some pretty invasive changes (we need
to construct _then_ start).
@Stebalien
Copy link
Member

I've added a dangling goroutine test which should be sufficient. It implicitly asserts that the connection manager is closed as the connection manager launches a background goroutine. Ideally we'd have some way to also assert that there are no scheduled timers, but I don't think we can.

When doing that, I noticed two issues:

  1. AutoNAT fails to close its peerstore #2743.
  2. We don't shut anything down if we fail to start. But, IMO, that's not something to solve in this patch.

(I'm trying to push this forward because I don't want it to develop a bunch of merge conflicts)

@MarcoPolo
Copy link
Collaborator

Thanks @Stebalien! I didn't know about uber/goleak. Looks helpful. I'll try to fix the current leaks.

@Stebalien
Copy link
Member

Hm. That goroutine should have been excluded...

@Stebalien
Copy link
Member

Oh, different nat goroutine.

@Stebalien
Copy link
Member

I'd add github.com/jackpal/go-nat-pmp.(*Client).GetExternalAddress to the exclusion list.

@MarcoPolo
Copy link
Collaborator

A test that asserts the connmanager is closed after we close the host/swarm.

This is effectively covered by checking for dangling goroutines.

@MarcoPolo MarcoPolo linked an issue Mar 21, 2024 that may be closed by this pull request
@MarcoPolo MarcoPolo merged commit 9d149fa into master Mar 21, 2024
11 checks passed
@MarcoPolo MarcoPolo mentioned this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libp2p.Host leaks goroutines quicreuse: is not closed when closing the host
4 participants