fix: bitswap lock contention under high load #817

gammazero · 2025-01-27T22:03:21Z

Summary

Fix runaway goroutine creation under high load. Under high load conditions, goroutines are created faster than they can complete and the more goroutines creates the slower them complete. This creates a positive feedback cycle that ends in OOM. The fix dynamically adjusts message send scheduling to avoid the runaway condition.

Description of Lock Contention under High Load

The peermanager acquires the peermanager mutex, does peermanager stuff, and then acquires the messagequeue mutex for each peer to put wants/cancels on that peer's message queue. Nothing is blocked indefinitely, but all session goroutines wait on the peermanager mutex.

The messagequeue event loop for each peer is always running in a separate goroutine, waking up every time new data is added to the message queue. The messagequeue acquires the messagequeue mutex to check the amount of pending work and send a message if there is enough work.

The frequent lock/unlock of each messagequeue mutex delays each session goroutine from adding items to messagequeues, as they wait to acquire each peer's messagequeue mutex to enqueue a message. These delays cause the peermanager mutex to be held longer by each goroutine. When there are a sufficient number of peers and want requests, goroutines end up waiting on the peermanager mutex for a longer time, on average, that it takes for an additional request to arrive and start another goroutine. This leads to a positive feedback loop where the number of goroutines increases until their number alone is sufficient to cause OOM.

How this PR Fixes this

This PR avoids waking up the messagequeue event loop on every item added to the message queue, thus avoiding the high-frequency messagequeue mutex lock/unlock. Instead, the event loop wakes up after a delay, sends the accumulated work, then goes back to sleep for another delay. During the delay, wants and cancels are accumulated. This allows the session goroutines to add items to message queues without contending with the messagequeue event loop for the messagequeue mtuex.

The delay dynamically adjusts, between 20ms and 1 second, based on the number peers. The delay per peer is configurable, with a default of 1/8 millisecond (125us).

codecov · 2025-01-27T22:07:54Z

Codecov Report

Attention: Patch coverage is 85.18519% with 12 lines in your changes missing coverage. Please review.

Project coverage is 60.48%. Comparing base (62512fb) to head (87ad4a5).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...tswap/client/internal/messagequeue/messagequeue.go	87.50%	7 Missing and 1 partial ⚠️
bitswap/client/client.go	42.85%	4 Missing ⚠️

@@            Coverage Diff             @@
##             main     #817      +/-   ##
==========================================
- Coverage   60.49%   60.48%   -0.01%     
==========================================
  Files         244      244              
  Lines       31079    31100      +21     
==========================================
+ Hits        18800    18810      +10     
- Misses      10603    10615      +12     
+ Partials     1676     1675       -1

Files with missing lines	Coverage Δ
bitswap/message/message.go	`84.42% <100.00%> (+0.55%)`	⬆️
bitswap/client/client.go	`87.58% <42.85%> (+0.96%)`	⬆️
...tswap/client/internal/messagequeue/messagequeue.go	`85.42% <87.50%> (-0.06%)`	⬇️

... and 14 files with indirect coverage changes

lidel

For posterity, staging box looks really promising, the window of time when this was deployed to box 02 was significantly in better shape than kubo 0.32.1 (01 box):

HTTP success rate is higher too:

EOD for me, but I'll do more test tomorrow morning and see if any questions arise. Some quick ones inline.

bitswap/client/internal/messagequeue/messagequeue.go

lidel

Lgtm, this is such improvement we should ship this as a patch release next week.

For posterity: based on our (Shipyard) staging tests the impact on high load providers is significant.

Below is a sample from HTTP gateway processing ~80 requests per second (mirrored organic cache-miss from ipfs.io). "01" is latest Kubo (0.33.0) without this fix, and "02" is with this fix (0.33.1):

bitswap/client/client.go

gammazero added 4 commits January 27, 2025 05:19

messagequeue experiment

d892496

messagequeue sends after delay or max updates

c49194c

Handle leftovers from large messages

c868133

Fewer loops in sendCancels

834d8ab

gammazero added the skip/changelog label Jan 27, 2025

lidel reviewed Jan 29, 2025

View reviewed changes

bitswap/client/internal/messagequeue/messagequeue.go Show resolved Hide resolved

bitswap/client/internal/messagequeue/messagequeue.go Outdated Show resolved Hide resolved

lidel changed the title ~~Bs experiment~~ [WIP] fix: bitswap lock contention under high load Jan 29, 2025

gammazero added 4 commits January 28, 2025 15:03

Merge branch 'main' into bs-experiment

3ae0e57

reduce send delay

a4ffb6f

reduce sendMessageDelay

8a996e9

Do not check pending count when it is already known that work is ready

80a149b

This was referenced Jan 30, 2025

Release 0.34 ipfs/kubo#10685

Open

Version 0.30 - Memory consumption explodes after a number of hours ipfs/kubo#10526

Open

gammazero force-pushed the bs-experiment branch from b8730a5 to c200764 Compare January 30, 2025 04:38

auto-adjust experiment

bee11dc

gammazero force-pushed the bs-experiment branch from c200764 to bee11dc Compare January 30, 2025 04:41

base delay on peer count

8a27e39

gammazero force-pushed the bs-experiment branch from d9d1313 to 8a27e39 Compare January 30, 2025 08:04

gammazero added 8 commits January 29, 2025 22:29

adjust delay

6d43572

remove debug logging

ae578bd

make messagequeue send delay configurable

59d720b

per peer send delay configurable

dc4deed

adjust default send delay

3fc7983

increase max delay

364e1a0

remove unused func

42da465

set max delay to 1 second

024f990

gammazero changed the title ~~[WIP] fix: bitswap lock contention under high load~~ fix: bitswap lock contention under high load Jan 30, 2025

gammazero marked this pull request as ready for review January 30, 2025 19:40

gammazero requested a review from a team as a code owner January 30, 2025 19:40

update changelog

46fb490

gammazero removed the skip/changelog label Jan 31, 2025

Merge branch 'main' into bs-experiment

e0cf85c

lidel approved these changes Jan 31, 2025

View reviewed changes

bitswap/client/client.go Outdated Show resolved Hide resolved

gammazero added 2 commits January 31, 2025 06:45

review change

d0d25d6

Merge branch 'main' into bs-experiment

87ad4a5

gammazero merged commit 0a0b298 into main Jan 31, 2025
15 checks passed

gammazero deleted the bs-experiment branch January 31, 2025 17:12

BrewTestBot mentioned this pull request Feb 4, 2025

ipfs 0.33.1 Homebrew/homebrew-core#206545

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bitswap lock contention under high load #817

fix: bitswap lock contention under high load #817

gammazero commented Jan 27, 2025 •

edited

Loading

codecov bot commented Jan 27, 2025 •

edited

Loading

lidel left a comment

lidel left a comment •

edited

Loading

fix: bitswap lock contention under high load #817

fix: bitswap lock contention under high load #817

Conversation

gammazero commented Jan 27, 2025 • edited Loading

Summary

Description of Lock Contention under High Load

How this PR Fixes this

codecov bot commented Jan 27, 2025 • edited Loading

Codecov Report

lidel left a comment

Choose a reason for hiding this comment

lidel left a comment • edited Loading

Choose a reason for hiding this comment

gammazero commented Jan 27, 2025 •

edited

Loading

codecov bot commented Jan 27, 2025 •

edited

Loading

lidel left a comment •

edited

Loading