-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlstats: always enable tracing first time fingerprint is seen #89418
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @yuzefovich)
-- commits
line 10 at r1:
typo? to -> so
-- commits
line 20 at r1:
typo frist -> first
pkg/sql/instrumentation.go
line 258 at r1 (raw file):
ih.withStatementTrace = cfg.TestingKnobs.WithStatementTrace var everSampled bool
nit: just personal pref but this wording seems a little awkward to me. would something like previouslySampled
or sampleExists
be too lengthy? would just sampled
suffice?
pkg/sql/crdb_internal_test.go
line 941 at r1 (raw file):
} func TestTxnContentionEventsTable(t *testing.T) {
Is this possible as a data driven test?
pkg/sql/crdb_internal_test.go
line 1038 at r1 (raw file):
require.Less(t, rowCount, 3, "node_execution_insights found 3 rows. It should only record first, "+ "but there is a chance based ons sampling to get 2 rows.")
typo: ons -> on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! I left a bunch of nits, but please don't be discouraged by that - over time all of these code style things will become the second nature.
Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)
-- commits
line 2 at r1:
nit: use package name prefix to indicate where most of the changes are, also no capital letters in the commit message, so it should look something like sqlstats: always enable ...
.
-- commits
line 20 at r1:
nit: I don't think we've had a stable release with this bug, so I would omit the release note entirely. If you prefer to keep it, then I would explicitly mention that the bug would occur only if the plan collection setting is explicitly disabled by user. (Also, I believe we don't really document what and how we are sampling, which is yet another reason to not have a release note here.)
pkg/sql/instrumentation.go
line 258 at r1 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
nit: just personal pref but this wording seems a little awkward to me. would something like
previouslySampled
orsampleExists
be too lengthy? would justsampled
suffice?
I like previouslySampled
more too.
pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go
line 129 at r1 (raw file):
} // TODO: this might need to be thought through a bit more.
We should address this TODO as part of this commit. I believe the current logic makes sense, but it'd be good for @xinhaoz to confirm it.
pkg/sql/crdb_internal_test.go
line 941 at r1 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
Is this possible as a data driven test?
It would probably be more involved to do it as a data-driven test, so I'd be in favor of keeping the current approach.
(If one is interested, we did add an ability for this kind of test to be done in the logic test framework - which is very similar to data driven framework - in #79010 (and maybe a couple of other related PRs from Alex Sarkesian) , and I would imagine we would have to do something similar here.)
pkg/sql/crdb_internal_test.go
line 945 at r1 (raw file):
defer log.Scope(t).Close(t) // Start the cluster. (One node is sufficient; the outliers system is currently in-memory only.)
nit: we try to keep the comments wrapped at 80 characters. If you're using GoLand, you could set a visual guide (or automatic wrapping on typing) via Preferences -> Editor -> Code Style -> Go -> Wrapping and Braces.
pkg/sql/crdb_internal_test.go
line 951 at r1 (raw file):
tc := testcluster.StartTestCluster(t, 1, args) defer tc.Stopper().Stop(ctx) conn := tc.ServerConn(0)
nit: you could wrap conn
with sqlutils.MakeSQLRunner
to avoid having to check errors and whatnot. This is just FYI, so please feel free to ignore this comment.
pkg/sql/crdb_internal_test.go
line 975 at r1 (raw file):
wgTxnStarted.Add(1) // Lock to wait for the txn to complete to avoid the test finishing before the txn is committed
nit: missing period here and below.
pkg/sql/crdb_internal_test.go
line 989 at r1 (raw file):
errTxn = tx.Commit() require.NoError(t, errTxn) wgTxnDone.Done()
nit: it's a good practice to do this in a defer
that is the first line of the new goroutine, i.e. something like
waitGroup.Add(1)
go func() {
defer waitGroup.Done()
...
}
pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go
line 132 at r1 (raw file):
stubTime.setTime(tm) return stubTime.Now().String() case "should-sample-logical-plan":
nit: we probably should rename this to should-sample
.
pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go
line 148 at r1 (raw file):
fingerprint = strings.Replace(fingerprint, "%", " ", -1) // test harness needs an update.
nit: this comment should be removed.
pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go
line 154 at r1 (raw file):
dbName, ) return fmt.Sprintf("%t, %t", savePlanForStats, everSampled)
nit: it seems more intuitive to keep the order of variables here in sync with ShouldSample
- so either switch here or in the signature of ShouldSample
itself.
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'll remove it |
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I see in the coding standard is 80, but almost every file I go has some violation. The default is 120. I wonder if this could be enforced with a linter. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines#Line-Length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @j82w, @xinhaoz, and @yuzefovich)
pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go
line 123 at r3 (raw file):
) (previouslySampled bool, savePlanForStats bool) { sampledInFlushTarget := false savePlanForStatsInFlushTarget := true
Thanks for this rename 👍
Previously, xinhaoz (Xin Hao Zhang) wrote…
Done. |
Previously, yuzefovich (Yahor Yuzefovich) wrote…
The only concern I have is when the stats get flushed it will see all statements as new again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 7 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go
line 129 at r1 (raw file):
Previously, j82w wrote…
The only concern I have is when the stats get flushed it will see all statements as new again.
Hm, I see. How frequent are these flushes? I'll defer to you and Xin Hao on whether this is concerning. I guess you could leave a TODO to address it later (just use a github username for it, i.e. something like TODO(j82w): ...
).
pkg/sql/crdb_internal_test.go
line 945 at r1 (raw file):
Previously, j82w wrote…
I see in the coding standard is 80, but almost every file I go has some violation. The default is 120. I wonder if this could be enforced with a linter. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines#Line-Length
Yes, we definitely have many violations, still we should try to keep them at minimum. We are also more stringent about the character count for comments than for the regular code.
pkg/sql/crdb_internal_test.go
line 1019 at r3 (raw file):
// internal workflow. testutils.SucceedsWithin(t, func() error { rows, errVerify := conn.QueryContext(ctx, "SELECT "+
nit: with multiple-line single SQL statements we usually use backticks `. Feel free to ignore this nit, just FYI.
The flushes are default every 10mins, but we were already seeing the behaviour described by Jake prior to this change. |
Fixes: #89185 The first time a fingerprint is seen tracing should be enabled. This currently is broken if sql.metrics.statement_details.plan_collection.enabled is set to false. This can cause crdb_internal.transaction_contention_events to be empty because tracing was never enabled so the contention event was never recorded. To properly fix this a new value needs to be returned on ShouldSample to tell if it is the first time a fingerprint is seen. This will remove the dependency on plan_collection feature switch. Release justification: Bug fixes and low-risk updates to new functionality. Release note: none
Previously, xinhaoz (Xin Hao Zhang) wrote…
I confirmed that it currently always records the statement if you wait 10mins. In that case I will not add a todo note, because it's the current behavior. We should wait for either customer feedback or perf analysis to show it's an issue. |
bors r+ |
Build succeeded: |
89782: roachtest: limit FingerprintValidator's memory usage internally r=miretskiy,srosenberg a=renatolabs In #89332, we started optionally validating changefeed semantics in the `FingerprintValidator` by making sure that unseen updates are never observed if a `resolved` message is received. In order to keep bounds on the validator's memory usage, the `cdc/mixed-versions` test was setting a maximum number of operations at the workload level. However, a consequence of that change is that it creates the possibility for the workload to finish before the roachtest has received all the `resolved` events it expects (for instance, if draining takes longer than usual for some reason; a variety of other non-determinism is also at play). To deal with this possibility, we instead enforce a maximum number of previously seen events at the validator's level. For the `cdc/mixed-versions` test, we set a maximum of 100,000 previously seen events being stored (20MB memory footprint). This should be equivalent to roughly 50,000 bank transfers, and give enough time for the test to finish successfully. Release note: None Epic: None. 89847: sql: turn plan sampling back off by default r=j82w a=j82w The value was originally disabled by default in: #88343. The value was enabled by default: #89020 because of a bug that was fixed in #89418. This PR is disabling it by default again now that the bug has been fixed. Part Of #89847 Release note (sql change): Turn the default value of sql.metrics.statement_details.plan_collection.enabled to false. Co-authored-by: Renato Costa <[email protected]> Co-authored-by: j82w <[email protected]>
This value was enabled by default again: #89020. The reason it was enabled again was fixed in #89418. Part of #89847 Release justification: Bug fixes and low-risk updates to new functionality. Release note (sql change): Turn the default value of sql.metrics.statement_details.plan_collection.enabled to false.
This value was enabled by default again: #89020. The reason it was enabled again was fixed in #89418. Part of #89847 Release justification: Bug fixes and low-risk updates to new functionality. Release note (sql change): Turn the default value of sql.metrics.statement_details.plan_collection.enabled to false.
109785: sql: fix performance when disabling sql.metrics.statement_details r=j82w a=j82w There is a performance regression when disabling sql.metrics.statement_details.enabled. It will see that every execution is the first time the query was executed so it will enabled execution stats collection for all executions. This is a bug caused by always collecting the execution stats of the first time a statement is executed in #89418. Fixes: #106811 Release note (bug fix): Fixes a bug causing performance regression when disabling `sql.metrics.statement_details.enabled` which caused execution stats to be collected for all queries instead of the default one percent. Co-authored-by: j82w <[email protected]>
…bled There is a performance regression when disabling sql.metrics.statement_details.enabled. It will see that every execution is the first time the query was executed so it will enabled execution stats collection for all executions. This is a bug caused by always collecting the execution stats of the first time a statement is executed in #89418. Fixes: #106811 Release note (bug fix): Fixes a bug causing performance regression when disabling `sql.metrics.statement_details.enabled` which caused execution stats to be collected for all queries instead of the default one percent.
Fixes: #89185
The first time a fingerprint is seen tracing should be enabled. This currently is broken if
sql.metrics.statement_details.plan_collection.enabled is set to false. This can cause crdb_internal.transaction_contention_events to be empty because tracing was never enabled to the contention event was never recorded.
To properly fix this a new value needs to be returned on ShouldSample to tell if it is the first time a fingerprint is seen. This will remove the dependency on plan_collection feature switch.
Release justification: Bug fixes and low-risk updates to new functionality.
Release note (bug fix): Always enable tracing the frist time a fingerprint is seen.