Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission: goschedstats sampling period is too long #111125

Closed
sumeerbhola opened this issue Sep 22, 2023 · 0 comments · Fixed by #111441
Closed

admission: goschedstats sampling period is too long #111125

sumeerbhola opened this issue Sep 22, 2023 · 0 comments · Fixed by #111441
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Sep 22, 2023

goschedstats uses a 250ms sampling interval when the CPU utilization is "low"

if avgValue < underloadedRunnablePerProcThreshold &&
s.localEWMA < underloadedRunnablePerProcThreshold {
// Underloaded, so switch to longer sampling period.
nextPeriod = samplePeriodLong
}

This "low" is based on 1 runnable goroutine per proc
https://github.com/cockroachdb/cockroach/blob/master/pkg/util/goschedstats/runnable.go#L72

This threshold is too high.
Here is a node running at close to 80% cpu utilization but is mostly doing sampling at a 250ms interval, due to which the slot adjustment is not fast enough, and we see slot exhaustion.

Screenshot 2023-09-22 at 1 56 05 PM Screenshot 2023-09-22 at 1 56 56 PM Screenshot 2023-09-22 at 1 57 39 PM

Jira issue: CRDB-31779

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Sep 22, 2023
craig bot pushed a commit that referenced this issue Sep 29, 2023
111441: goschedstats: reduce underloadedRunnablePerProcThreshold r=aadityasondhi a=sumeerbhola

By reducing this threshold, we more often sample the stats at 1ms intervals, which is desirable for admission control slot adjustment.

Fixes #111125

Epic: none

Release note: None

111521: roachprod: fix createTenantCertBundle script r=andy-kimball a=herkolategan

The `createTenantCertBundle` function has a script in it that has erroneous "+" characters which results in an error when trying to start a secure cluster. This change removes those characters.

Epic: None
Release Note: None

Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Herko Lategan <[email protected]>
craig bot pushed a commit that referenced this issue Sep 29, 2023
109638: sql: use the correct locking strength for FOR SHARE clause r=michae2 a=arulajmani

Previously, FOR SHARE and FOR KEY SHARE would use non-locking KV scans.
Now that the lock table supports shared locks, we can use lock.Shared as
the locking strength for KV scans. This patch does that, and in doing
so, wires up SHARED locks end to end.

By default, we turn of this functionality for serializable transactions.
Instead, it's gated behind a session setting called
`enable_shared_locking_for_serializable`.

Informs #91545

Release note (sql change): SELECT FOR SHARE and SELECT FOR UPDATE
previously did not acquire any locks. Users issuing these statements
would expect them to acquire shared locks (multiple readers allowed,
no writers though). This patch switches over the behavior to acquire
such read locks.

For serializable transactions, we default to the old behaviour, unless
the `enable_shared_locking_for_serializable` session setting is set
to true. We'll probably switch this behavior in the future, but for
now, given shared locks are a preview feature, we gate things behind
a session setting.

111441: goschedstats: reduce underloadedRunnablePerProcThreshold r=aadityasondhi a=sumeerbhola

By reducing this threshold, we more often sample the stats at 1ms intervals, which is desirable for admission control slot adjustment.

Fixes #111125

Epic: none

Release note: None

111466: ui: update default timescale to 1h r=maryliag a=maryliag

On the Metrics page, the default value was 10min, which was too small. This commit updates it to 1h to match the most selected value on the SQL Activity.

Fixes #96479

Release note: None

111521: roachprod: fix createTenantCertBundle script r=andy-kimball a=herkolategan

The `createTenantCertBundle` function has a script in it that has erroneous "+" characters which results in an error when trying to start a secure cluster. This change removes those characters.

Epic: None
Release Note: None

111526: kv/spanset: permit writes to lock table by shared lock latching configuration r=nvanbenschoten a=nvanbenschoten

Fixes #111409.
Fixes #111492.

This commit updates addLockTableSpans to account for the way that shared locking requests declare their latches. Even though they declare a "read" latch, they do so at max timestamp, which gives them sufficient isolation to write to the lock table without having to declare a write latch and be serialized with other shared lock acquisitions.

Release note: None

Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Herko Lategan <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in 32adda2 Sep 29, 2023
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Oct 6, 2023
By reducing this threshold, we more often sample the stats at 1ms
intervals, which is desirable for admission control slot adjustment.

Fixes cockroachdb#111125

Epic: none

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant