Skip to content

Commit

Permalink
stats: fix race condition when refreshing cache
Browse files Browse the repository at this point in the history
This commit fixes a race condition that was introduced in cockroachdb#51616.
The race condition existed because it was possible for two refreshes
to be triggered in close succession, and there was no mechanism to
prevent the first refresh from overwriting the second if it was
delayed.

We fix the race condition by introducing two boolean flags into each
cache entry: refreshing and mustRefreshAgain. If refreshing is true,
that indicates that the current statistics for this table are stale,
and we are in the process of fetching the updated stats from the database.
In the mean time, other callers can use the stale stats and do not need to
wait.

If a goroutine tries to perform a refresh when a refresh is already
in progress, it will see that refreshing=true and will set the
mustRefreshAgain flag to true before returning. When the original
goroutine that was performing the refresh returns from the database and
sees that mustRefreshAgain=true, it will trigger another refresh.

Informs cockroachdb#50863

Release note: None
  • Loading branch information
rytaft committed Jul 23, 2020
1 parent d8b8a16 commit aa309f2
Showing 1 changed file with 38 additions and 9 deletions.
47 changes: 38 additions & 9 deletions pkg/sql/stats/stats_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,25 @@ type TableStatisticsCache struct {
// The cache stores *cacheEntry objects. The fields are protected by the
// cache-wide mutex.
type cacheEntry struct {
// If true, we are in the process of updating the statistics for this
// table. Other callers can wait on the waitCond until this is false.
// If mustWait is true, we do not have any statistics for this table and we
// are in the process of fetching the stats from the database. Other callers
// can wait on the waitCond until this is false.
mustWait bool
waitCond sync.Cond

// If refreshing is true, the current statistics for this table are stale,
// and we are in the process of fetching the updated stats from the database.
// In the mean time, other callers can use the stale stats and do not need to
// wait.
//
// If a goroutine tries to perform a refresh when a refresh is already
// in progress, it will see that refreshing=true and will set the
// mustRefreshAgain flag to true before returning. When the original
// goroutine that was performing the refresh returns from the database and
// sees that mustRefreshAgain=true, it will trigger another refresh.
refreshing bool
mustRefreshAgain bool

stats []*TableStatistic

// err is populated if the internal query to retrieve stats hit an error.
Expand Down Expand Up @@ -246,18 +260,33 @@ func (sc *TableStatisticsCache) refreshCacheEntry(ctx context.Context, tableID s
if !found || e.err != nil {
return
}
sc.mu.numInternalQueries++

// Don't perform a refresh if a refresh is already in progress, but let that
// goroutine know it needs to refresh again.
if e.refreshing {
e.mustRefreshAgain = true
return
}
e.refreshing = true

var stats []*TableStatistic
var err error
func() {
sc.mu.Unlock()
defer sc.mu.Lock()

stats, err = sc.getTableStatsFromDB(ctx, tableID)
}()
for {
func() {
sc.mu.numInternalQueries++
sc.mu.Unlock()
defer sc.mu.Lock()

stats, err = sc.getTableStatsFromDB(ctx, tableID)
}()
if !e.mustRefreshAgain {
break
}
e.mustRefreshAgain = false
}

e.stats, e.err = stats, err
e.refreshing = false

if err != nil {
// Don't keep the cache entry around, so that we retry the query.
Expand Down

0 comments on commit aa309f2

Please sign in to comment.