Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computed default ResourceMgr limits account for ConnMgr HighWater and are sufficiently high #9545

Closed
3 tasks
Tracked by #9442
BigLep opened this issue Jan 14, 2023 · 4 comments · Fixed by #9555
Closed
3 tasks
Tracked by #9442
Assignees

Comments

@BigLep
Copy link
Contributor

BigLep commented Jan 14, 2023

Done Criteria

The computed default resource manager limits account for how the connection manager is configured and are sufficiently high to allow Kubo to operate. Specifically, this means these limits:

  • Swarm.ResourceMgr.System.ConnsInbound
  • Swarm.ResourceMgr.System.StreamsInbound

must be (~20%?) greater than the max of

  • 2 * Swarm.ConnMgr HighWater OR
  • 800

Swarm.ResourceMgr.System limits are hard system limits and we're ensuring that System.ConnsInbound and System.StreamsInbound are greater than 800 and two times the connection manager's high water mark.

Any user-supplied override limits are applied on top of this.

Why Important

This is footgun causing user confusion and challenge for maintainers in debugging problems.

Swarm.ConnMgr has been present in Kubo for years and users have adjusted this value for their usecses. It is a soft limit though, and having hard limits from Swarm.ResourceMgr that are under the soft values of Swarm.ConnMgr causes unexpected behavior.

In addition, during Kubo 0.17 and the 0.18 RC phase, unexpected failures have occurred when the overall ResourceMgr.System.ConnsInbound hard limits have been set too low. The Computed Default Limits for the ResourceMgr shouldn't trigger this.

Notes

  1. 800 as a lower-bound for Swarm.ResourceMgr.ConnsInbound is based on anecdotal observations from maintainers on how many connections Kubo needs to operate "normally". Enabling the Swarm.ResourceMgr by default (starting in 0.17) was done as DoS defense mechanism. It isn't intended to intrude and cause hard-to-debug issues for "normal" operations.
  2. We set Swarm.ResourceMgr.System.StreamsInbound in addition to Swarm.ResourceMgr.System.Streamsnbound to help ensure at least one stream can be opened per connection. (That said, it is possible for one connection to use up all the available streams within Swarm.ResourceMgr.System.StreamsInbound. The libp2p resource manager doesn't currently provide a way to express "limit each connection to X streams".)
  3. We're not worrying about Swarm.ResourceMgr.System.Conns or Swarm.ResourceMgr.System.ConnsOutbound because we already set them to "infinite" by default.
  4. There is a backlog item in go-libp2p to better connect the connection manager and resource manager: Reconcile Conn manager's hi/lo watermarks and the resource manager's limits libp2p/go-libp2p#1640 . To control the experience, we're solving this at the Kubo level with this issue.
@Jorropo
Copy link
Contributor

Jorropo commented Jan 16, 2023

@Jorropo
Copy link
Contributor

Jorropo commented Jan 16, 2023

Docs update: https://github.com/ipfs/kubo/blob/master/docs/libp2p-resource-management.md

I don't think this is worth it here either, it describe the current high level strategy and then links to rcmgr_defaults.go

With the Swarm.ResourceMgr.MaxMemory and Swarm.ResourceMgr.MaxFileDescriptors inputs defined,
resource manager limits are created at the
system,
transient,
and peer scopes.
Other scopes are ignored (by being set to "~infinity".

The reason these scopes are chosen is because:

  • system - This gives us the coarse-grained control we want so we can reason about the system as a whole.
    It is the backstop, and allows us to reason about resource consumption more easily
    since don't have think about the interaction of many other scopes.
  • transient - Limiting connections that are in process of being established provides backpressure so not too much work queues > up.
  • peer - The peer scope doesn't protect us against intentional DoS attacks.
    It's just as easy for an attacker to send 100 requests/second with 1 peerId vs. 10 requests/second with 10 peers.
    We are reliant on the system scope for protection here in the malicious case.
    The reason for having a peer scope is to protect against unintentional DoS attacks
    (e.g., bug in a peer which is causing it to "misbehave").
    In the unintional case, we want to make sure a "misbehaving" node doesn't consume more resources than necessary.

Within these scopes, limits are just set on
memory,
file descriptors (FD), inbound connections,
and inbound streams.
Limits are set based on the Swarm.ResourceMgr.MaxMemory and Swarm.ResourceMgr.MaxFileDescriptors inputs above.
We trust this node to behave properly and thus don't limit outbound connection/stream limits.
We apply any limits that libp2p has for its protocols/services
since we assume libp2p knows best here.

Source: core/node/libp2p/rcmgr_defaults.go

Jorropo added a commit to Jorropo/go-ipfs that referenced this issue Jan 16, 2023
@lidel
Copy link
Member

lidel commented Jan 16, 2023

@Jorropo fair enough, below section in FAQ needs to be updated tho:

How does the resource manager (ResourceMgr) relate to the connection manager (ConnMgr)?

[..]
If Swarm.ConnMgr.HighWater is greater than Swarm.ResourceMgr.Limits.System.ConnsInbound, existing low priority idle connections can prevent new high priority connections from being established. The ResourceMgr doesn't know that the new connection is high priority and simply blocks it because of the limit its enforcing.

(Replace the last paragraph with info how Kubo autoscales ResourceMgr to be at least 2x ConnMgr.HighWater (and at least DefaultResourceMgrMinConnsInbound) and refuse to start when manually set ResourceMgr.Limits clash with ConnMgr.HighWater)

@thedmdim
Copy link

thedmdim commented Mar 5, 2023

Hello I got exception in related to this topic, #9695, can somebody help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants