Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce sampling QPS threshold for the CC telemetry logging channel #70393

Closed
kevin-v-ngo opened this issue Sep 18, 2021 · 4 comments
Closed

Reduce sampling QPS threshold for the CC telemetry logging channel #70393

kevin-v-ngo opened this issue Sep 18, 2021 · 4 comments
Assignees
Labels
A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker

Comments

@kevin-v-ngo
Copy link

kevin-v-ngo commented Sep 18, 2021

The current QPS threshold of 2000 is too high to be supported by the new logging telemetry channel. We need to reduce this threshold to a supported value at which sampling begins as well as adjust the sampling frequency.

Additional context is outlined here

@kevin-v-ngo kevin-v-ngo added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. labels Sep 18, 2021
@maryliag
Copy link
Contributor

From @knz comment in one of the docs I would be in favor of reducing it, so we can start sampling after e.g. 5 queries per second..
Is 5 a good value to change the default to? At least on this first iteration (before dealing with the more complex solution on #70553 )
@logston

@blathers-crl
Copy link

blathers-crl bot commented Sep 22, 2021

Hi @maryliag, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@logston
Copy link
Contributor

logston commented Sep 23, 2021

@maryliag, from the SRE/pipeline perspective, anything less than or equal to 10 log events/second (LLPS) emitted by CRDB is sufficient. So long as the sampling produced at max 10 LLPS, the sampling can start at 10 queries per second.

xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Sep 24, 2021
Resolves cockroachdb#70393

Previously, we set 2000 QPS as the default QPS threshold at which to start
sampling events for the telemetry logging channel (based on a configurable
sampling rate). This value was far too high, as each node would be emitting
up to 2000 log lines per second. Most nodes reach saturation at
approximately 2000 QPS, and so downsampling would rarely even occur.

This commit lowers the QPS threshold to 10 queries per second, which is
a more manageable number of log lines. This is a merely temporary patch
before we can introduce an adaptive sampling rate in favour of this QPS
cutoff approach.

Release note (bug fix): The QPS threshold at which sampling should occur
for events emitted to the telemetry logging channel has been lowered
from 2000 QPS to 10 QPS.
@xinhaoz
Copy link
Member

xinhaoz commented Sep 30, 2021

Closing this issue in favour of #70553.

@xinhaoz xinhaoz closed this as completed Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants