Skip to content

Commit

Permalink
runtime: add new mcentral implementation
Browse files Browse the repository at this point in the history
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.

The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.

So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.

As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.

In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.

This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.

Benchmark results for 1 KiB allocation throughput (5 runs each):

name \ MiB/s  go113       go114       gotip       gotip+this-patch
AllocKiB-1    1.71k ± 1%  1.68k ± 1%  1.59k ± 2%      1.71k ± 1%
AllocKiB-2    2.46k ± 1%  2.51k ± 1%  2.54k ± 1%      2.93k ± 1%
AllocKiB-4    4.27k ± 1%  4.41k ± 2%  4.33k ± 1%      5.01k ± 2%
AllocKiB-8    4.38k ± 3%  5.24k ± 1%  5.46k ± 1%      8.23k ± 1%
AllocKiB-12   4.38k ± 3%  4.49k ± 1%  5.10k ± 1%     10.04k ± 0%
AllocKiB-16   4.31k ± 1%  4.14k ± 3%  4.22k ± 0%     10.42k ± 0%
AllocKiB-20   4.26k ± 1%  3.98k ± 1%  4.09k ± 1%     10.46k ± 3%
AllocKiB-24   4.20k ± 1%  3.97k ± 1%  4.06k ± 1%     10.74k ± 1%
AllocKiB-28   4.15k ± 0%  4.00k ± 0%  4.20k ± 0%     10.76k ± 1%

Fixes golang#37487.

Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210
Reviewed-on: https://go-review.googlesource.com/c/go/+/221182
Run-TryBot: Michael Knyszek <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Austin Clements <[email protected]>
  • Loading branch information
mknyszek authored and xujianhai666 committed May 21, 2020
1 parent fa6ee6b commit 7cae940
Show file tree
Hide file tree
Showing 7 changed files with 620 additions and 29 deletions.
27 changes: 15 additions & 12 deletions src/runtime/lockrank.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,9 @@ const (
lockRankRwmutexW
lockRankRwmutexR

lockRankMcentral
lockRankSpine
lockRankMcentral // For !go115NewMCentralImpl
lockRankSpine // For !go115NewMCentralImpl
lockRankSpanSetSpine
lockRankStackpool
lockRankStackLarge
lockRankDefer
Expand Down Expand Up @@ -137,12 +138,13 @@ var lockNames = []string{
lockRankRwmutexW: "rwmutexW",
lockRankRwmutexR: "rwmutexR",

lockRankMcentral: "mcentral",
lockRankSpine: "spine",
lockRankStackpool: "stackpool",
lockRankStackLarge: "stackLarge",
lockRankDefer: "defer",
lockRankSudog: "sudog",
lockRankMcentral: "mcentral",
lockRankSpine: "spine",
lockRankSpanSetSpine: "spanSetSpine",
lockRankStackpool: "stackpool",
lockRankStackLarge: "stackLarge",
lockRankDefer: "defer",
lockRankSudog: "sudog",

lockRankWbufSpans: "wbufSpans",
lockRankMheap: "mheap",
Expand Down Expand Up @@ -214,14 +216,15 @@ var lockPartialOrder [][]lockRank = [][]lockRank{

lockRankMcentral: {lockRankScavenge, lockRankForcegc, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
lockRankSpine: {lockRankScavenge, lockRankCpuprof, lockRankSched, lockRankAllg, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
lockRankStackpool: {lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankHchan, lockRankFin, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankTrace, lockRankTraceStackTab, lockRankNetpollInit, lockRankRwmutexR, lockRankMcentral, lockRankSpine},
lockRankStackLarge: {lockRankAssistQueue, lockRankSched, lockRankItab, lockRankHchan, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankMcentral},
lockRankSpanSetSpine: {lockRankScavenge, lockRankForcegc, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
lockRankStackpool: {lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankHchan, lockRankFin, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankTrace, lockRankTraceStackTab, lockRankNetpollInit, lockRankRwmutexR, lockRankMcentral, lockRankSpine, lockRankSpanSetSpine},
lockRankStackLarge: {lockRankAssistQueue, lockRankSched, lockRankItab, lockRankHchan, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankMcentral, lockRankSpanSetSpine},
lockRankDefer: {},
lockRankSudog: {lockRankNotifyList, lockRankHchan},
lockRankWbufSpans: {lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankSched, lockRankAllg, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankHchan, lockRankNotifyList, lockRankTraceStrings, lockRankMspanSpecial, lockRankProf, lockRankRoot, lockRankDefer, lockRankSudog},
lockRankMheap: {lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan, lockRankMspanSpecial, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankMcentral, lockRankStackpool, lockRankStackLarge, lockRankDefer, lockRankSudog, lockRankWbufSpans},
lockRankMheap: {lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan, lockRankMspanSpecial, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankMcentral, lockRankStackpool, lockRankStackLarge, lockRankDefer, lockRankSudog, lockRankWbufSpans, lockRankSpanSetSpine},
lockRankMheapSpecial: {lockRankScavenge, lockRankCpuprof, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
lockRankGlobalAlloc: {lockRankProf, lockRankSpine, lockRankMheap, lockRankMheapSpecial},
lockRankGlobalAlloc: {lockRankProf, lockRankSpine, lockRankSpanSetSpine, lockRankMheap, lockRankMheapSpecial},

lockRankGFree: {lockRankSched},

Expand Down
8 changes: 7 additions & 1 deletion src/runtime/malloc.go
Original file line number Diff line number Diff line change
Expand Up @@ -1171,10 +1171,16 @@ func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {
// pays the debt down to npage pages.
deductSweepCredit(npages*_PageSize, npages)

s := mheap_.alloc(npages, makeSpanClass(0, noscan), needzero)
spc := makeSpanClass(0, noscan)
s := mheap_.alloc(npages, spc, needzero)
if s == nil {
throw("out of memory")
}
if go115NewMCentralImpl {
// Put the large span in the mcentral swept list so that it's
// visible to the background sweeper.
mheap_.central[spc].mcentral.fullSwept(mheap_.sweepgen).push(s)
}
s.limit = s.base() + size
heapBitsForAddr(s.base()).initSpan(s)
return s
Expand Down
6 changes: 5 additions & 1 deletion src/runtime/mcache.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,11 @@ func (c *mcache) refill(spc spanClass) {
if s.sweepgen != mheap_.sweepgen+3 {
throw("bad sweepgen in refill")
}
atomic.Store(&s.sweepgen, mheap_.sweepgen)
if go115NewMCentralImpl {
mheap_.central[spc].mcentral.uncacheSpan(s)
} else {
atomic.Store(&s.sweepgen, mheap_.sweepgen)
}
}

// Get a new cached span from the central lists.
Expand Down
243 changes: 238 additions & 5 deletions src/runtime/mcentral.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,31 @@ import "runtime/internal/atomic"
type mcentral struct {
lock mutex
spanclass spanClass
nonempty mSpanList // list of spans with a free object, ie a nonempty free list
empty mSpanList // list of spans with no free objects (or cached in an mcache)

// For !go115NewMCentralImpl.
nonempty mSpanList // list of spans with a free object, ie a nonempty free list
empty mSpanList // list of spans with no free objects (or cached in an mcache)

// partial and full contain two mspan sets: one of swept in-use
// spans, and one of unswept in-use spans. These two trade
// roles on each GC cycle. The unswept set is drained either by
// allocation or by the background sweeper in every GC cycle,
// so only two roles are necessary.
//
// sweepgen is increased by 2 on each GC cycle, so the swept
// spans are in partial[sweepgen/2%2] and the unswept spans are in
// partial[1-sweepgen/2%2]. Sweeping pops spans from the
// unswept set and pushes spans that are still in-use on the
// swept set. Likewise, allocating an in-use span pushes it
// on the swept set.
//
// Some parts of the sweeper can sweep arbitrary spans, and hence
// can't remove them from the unswept set, but will add the span
// to the appropriate swept list. As a result, the parts of the
// sweeper and mcentral that do consume from the unswept list may
// encounter swept spans, and these should be ignored.
partial [2]spanSet // list of spans with a free object
full [2]spanSet // list of spans with no free objects

// nmalloc is the cumulative count of objects allocated from
// this mcentral, assuming all spans in mcaches are
Expand All @@ -32,13 +55,151 @@ type mcentral struct {
// Initialize a single central free list.
func (c *mcentral) init(spc spanClass) {
c.spanclass = spc
c.nonempty.init()
c.empty.init()
lockInit(&c.lock, lockRankMcentral)
if go115NewMCentralImpl {
lockInit(&c.partial[0].spineLock, lockRankSpanSetSpine)
lockInit(&c.partial[1].spineLock, lockRankSpanSetSpine)
lockInit(&c.full[0].spineLock, lockRankSpanSetSpine)
lockInit(&c.full[1].spineLock, lockRankSpanSetSpine)
} else {
c.nonempty.init()
c.empty.init()
lockInit(&c.lock, lockRankMcentral)
}
}

// partialUnswept returns the spanSet which holds partially-filled
// unswept spans for this sweepgen.
func (c *mcentral) partialUnswept(sweepgen uint32) *spanSet {
return &c.partial[1-sweepgen/2%2]
}

// partialSwept returns the spanSet which holds partially-filled
// swept spans for this sweepgen.
func (c *mcentral) partialSwept(sweepgen uint32) *spanSet {
return &c.partial[sweepgen/2%2]
}

// fullUnswept returns the spanSet which holds unswept spans without any
// free slots for this sweepgen.
func (c *mcentral) fullUnswept(sweepgen uint32) *spanSet {
return &c.full[1-sweepgen/2%2]
}

// fullSwept returns the spanSet which holds swept spans without any
// free slots for this sweepgen.
func (c *mcentral) fullSwept(sweepgen uint32) *spanSet {
return &c.full[sweepgen/2%2]
}

// Allocate a span to use in an mcache.
func (c *mcentral) cacheSpan() *mspan {
if !go115NewMCentralImpl {
return c.oldCacheSpan()
}
// Deduct credit for this span allocation and sweep if necessary.
spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
deductSweepCredit(spanBytes, 0)

sg := mheap_.sweepgen

traceDone := false
if trace.enabled {
traceGCSweepStart()
}
var s *mspan

// Try partial swept spans first.
if s = c.partialSwept(sg).pop(); s != nil {
goto havespan
}
// Now try partial unswept spans.
for {
s = c.partialUnswept(sg).pop()
if s == nil {
break
}
if atomic.Load(&s.sweepgen) == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
// We got ownership of the span, so let's sweep it and use it.
s.sweep(true)
goto havespan
}
// We failed to get ownership of the span, which means it's being or
// has been swept by an asynchronous sweeper that just couldn't remove it
// from the unswept list. That sweeper took ownership of the span and
// responsibility for either freeing it to the heap or putting it on the
// right swept list. Either way, we should just ignore it (and it's unsafe
// for us to do anything else).
}
// Now try full unswept spans, sweeping them and putting them into the
// right list if we fail to get a span.
for {
s = c.fullUnswept(sg).pop()
if s == nil {
break
}
if atomic.Load(&s.sweepgen) == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
// We got ownership of the span, so let's sweep it.
s.sweep(true)
// Check if there's any free space.
freeIndex := s.nextFreeIndex()
if freeIndex != s.nelems {
s.freeindex = freeIndex
goto havespan
}
// Add it to the swept list, because sweeping didn't give us any free space.
c.fullSwept(sg).push(s)
}
// See comment for partial unswept spans.
}
if trace.enabled {
traceGCSweepDone()
traceDone = true
}

// We failed to get a span from the mcentral so get one from mheap.
s = c.grow()
if s == nil {
return nil
}

// At this point s is a span that should have free slots.
havespan:
if trace.enabled && !traceDone {
traceGCSweepDone()
}
n := int(s.nelems) - int(s.allocCount)
if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
throw("span has no free objects")
}
// Assume all objects from this span will be allocated in the
// mcache. If it gets uncached, we'll adjust this.
atomic.Xadd64(&c.nmalloc, int64(n))
usedBytes := uintptr(s.allocCount) * s.elemsize
atomic.Xadd64(&memstats.heap_live, int64(spanBytes)-int64(usedBytes))
if trace.enabled {
// heap_live changed.
traceHeapAlloc()
}
if gcBlackenEnabled != 0 {
// heap_live changed.
gcController.revise()
}
freeByteBase := s.freeindex &^ (64 - 1)
whichByte := freeByteBase / 8
// Init alloc bits cache.
s.refillAllocCache(whichByte)

// Adjust the allocCache so that s.freeindex corresponds to the low bit in
// s.allocCache.
s.allocCache >>= s.freeindex % 64

return s
}

// Allocate a span to use in an mcache.
//
// For !go115NewMCentralImpl.
func (c *mcentral) oldCacheSpan() *mspan {
// Deduct credit for this span allocation and sweep if necessary.
spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
deductSweepCredit(spanBytes, 0)
Expand Down Expand Up @@ -148,7 +309,77 @@ havespan:
}

// Return span from an mcache.
//
// s must have a span class corresponding to this
// mcentral and it must not be empty.
func (c *mcentral) uncacheSpan(s *mspan) {
if !go115NewMCentralImpl {
c.oldUncacheSpan(s)
return
}
if s.allocCount == 0 {
throw("uncaching span but s.allocCount == 0")
}

sg := mheap_.sweepgen
stale := s.sweepgen == sg+1

// Fix up sweepgen.
if stale {
// Span was cached before sweep began. It's our
// responsibility to sweep it.
//
// Set sweepgen to indicate it's not cached but needs
// sweeping and can't be allocated from. sweep will
// set s.sweepgen to indicate s is swept.
atomic.Store(&s.sweepgen, sg-1)
} else {
// Indicate that s is no longer cached.
atomic.Store(&s.sweepgen, sg)
}
n := int(s.nelems) - int(s.allocCount)

// Fix up statistics.
if n > 0 {
// cacheSpan updated alloc assuming all objects on s
// were going to be allocated. Adjust for any that
// weren't. We must do this before potentially
// sweeping the span.
atomic.Xadd64(&c.nmalloc, -int64(n))

if !stale {
// (*mcentral).cacheSpan conservatively counted
// unallocated slots in heap_live. Undo this.
//
// If this span was cached before sweep, then
// heap_live was totally recomputed since
// caching this span, so we don't do this for
// stale spans.
atomic.Xadd64(&memstats.heap_live, -int64(n)*int64(s.elemsize))
}
}

// Put the span in the appropriate place.
if stale {
// It's stale, so just sweep it. Sweeping will put it on
// the right list.
s.sweep(false)
} else {
if n > 0 {
// Put it back on the partial swept list.
c.partialSwept(sg).push(s)
} else {
// There's no free space and it's not stale, so put it on the
// full swept list.
c.fullSwept(sg).push(s)
}
}
}

// Return span from an mcache.
//
// For !go115NewMCentralImpl.
func (c *mcentral) oldUncacheSpan(s *mspan) {
if s.allocCount == 0 {
throw("uncaching span but s.allocCount == 0")
}
Expand Down Expand Up @@ -207,6 +438,8 @@ func (c *mcentral) uncacheSpan(s *mspan) {
// freeSpan reports whether s was returned to the heap.
// If preserve=true, it does not move s (the caller
// must take care of it).
//
// For !go115NewMCentralImpl.
func (c *mcentral) freeSpan(s *mspan, preserve bool, wasempty bool) bool {
if sg := mheap_.sweepgen; s.sweepgen == sg+1 || s.sweepgen == sg+3 {
throw("freeSpan given cached span")
Expand Down
10 changes: 9 additions & 1 deletion src/runtime/mgc.go
Original file line number Diff line number Diff line change
Expand Up @@ -1320,6 +1320,7 @@ func gcStart(trigger gcTrigger) {
systemstack(func() {
finishsweep_m()
})

// clearpools before we start the GC. If we wait they memory will not be
// reclaimed until the next GC cycle.
clearpools()
Expand Down Expand Up @@ -2141,6 +2142,9 @@ func gcMark(start_time int64) {

// gcSweep must be called on the system stack because it acquires the heap
// lock. See mheap for details.
//
// The world must be stopped.
//
//go:systemstack
func gcSweep(mode gcMode) {
if gcphase != _GCoff {
Expand All @@ -2150,7 +2154,7 @@ func gcSweep(mode gcMode) {
lock(&mheap_.lock)
mheap_.sweepgen += 2
mheap_.sweepdone = 0
if mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 {
if !go115NewMCentralImpl && mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 {
// We should have drained this list during the last
// sweep phase. We certainly need to start this phase
// with an empty swept list.
Expand All @@ -2162,6 +2166,10 @@ func gcSweep(mode gcMode) {
mheap_.reclaimCredit = 0
unlock(&mheap_.lock)

if go115NewMCentralImpl {
sweep.centralIndex.clear()
}

if !_ConcurrentSweep || mode == gcForceBlockMode {
// Special case synchronous sweep.
// Record that no proportional sweeping has to happen.
Expand Down
Loading

0 comments on commit 7cae940

Please sign in to comment.