sqlstats: always enable tracing first time fingerprint is seen #89418

j82w · 2022-10-05T17:42:38Z

The first time a fingerprint is seen tracing should be enabled. This currently is broken if
sql.metrics.statement_details.plan_collection.enabled is set to false. This can cause crdb_internal.transaction_contention_events to be empty because tracing was never enabled to the contention event was never recorded.

To properly fix this a new value needs to be returned on ShouldSample to tell if it is the first time a fingerprint is seen. This will remove the dependency on plan_collection feature switch.

Release justification: Bug fixes and low-risk updates to new functionality.

Release note (bug fix): Always enable tracing the frist time a fingerprint is seen.

cockroach-teamcity · 2022-10-05T17:42:53Z

This change is

xinhaoz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @yuzefovich)

-- commits line 10 at r1:
typo? to -> so

-- commits line 20 at r1:
typo frist -> first

pkg/sql/instrumentation.go line 258 at r1 (raw file):

	ih.withStatementTrace = cfg.TestingKnobs.WithStatementTrace

	var everSampled bool

nit: just personal pref but this wording seems a little awkward to me. would something like previouslySampled or sampleExists be too lengthy? would just sampled suffice?

pkg/sql/crdb_internal_test.go line 941 at r1 (raw file):

}

func TestTxnContentionEventsTable(t *testing.T) {

Is this possible as a data driven test?

pkg/sql/crdb_internal_test.go line 1038 at r1 (raw file):

	require.Less(t, rowCount, 3, "node_execution_insights found 3 rows. It should only record first, "+
		"but there is a chance based ons sampling to get 2 rows.")

typo: ons -> on

yuzefovich

Looks good overall! I left a bunch of nits, but please don't be discouraged by that - over time all of these code style things will become the second nature.

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)

-- commits line 2 at r1:
nit: use package name prefix to indicate where most of the changes are, also no capital letters in the commit message, so it should look something like sqlstats: always enable ....

-- commits line 20 at r1:
nit: I don't think we've had a stable release with this bug, so I would omit the release note entirely. If you prefer to keep it, then I would explicitly mention that the bug would occur only if the plan collection setting is explicitly disabled by user. (Also, I believe we don't really document what and how we are sampling, which is yet another reason to not have a release note here.)

pkg/sql/instrumentation.go line 258 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

nit: just personal pref but this wording seems a little awkward to me. would something like previouslySampled or sampleExists be too lengthy? would just sampled suffice?

I like previouslySampled more too.

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

	}

	// TODO: this might need to be thought through a bit more.

We should address this TODO as part of this commit. I believe the current logic makes sense, but it'd be good for @xinhaoz to confirm it.

pkg/sql/crdb_internal_test.go line 941 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Is this possible as a data driven test?

It would probably be more involved to do it as a data-driven test, so I'd be in favor of keeping the current approach.

(If one is interested, we did add an ability for this kind of test to be done in the logic test framework - which is very similar to data driven framework - in #79010 (and maybe a couple of other related PRs from Alex Sarkesian) , and I would imagine we would have to do something similar here.)

pkg/sql/crdb_internal_test.go line 945 at r1 (raw file):

	defer log.Scope(t).Close(t)

	// Start the cluster. (One node is sufficient; the outliers system is currently in-memory only.)

nit: we try to keep the comments wrapped at 80 characters. If you're using GoLand, you could set a visual guide (or automatic wrapping on typing) via Preferences -> Editor -> Code Style -> Go -> Wrapping and Braces.

pkg/sql/crdb_internal_test.go line 951 at r1 (raw file):

	tc := testcluster.StartTestCluster(t, 1, args)
	defer tc.Stopper().Stop(ctx)
	conn := tc.ServerConn(0)

nit: you could wrap conn with sqlutils.MakeSQLRunner to avoid having to check errors and whatnot. This is just FYI, so please feel free to ignore this comment.

pkg/sql/crdb_internal_test.go line 975 at r1 (raw file):

		wgTxnStarted.Add(1)

		// Lock to wait for the txn to complete to avoid the test finishing before the txn is committed

nit: missing period here and below.

pkg/sql/crdb_internal_test.go line 989 at r1 (raw file):

			errTxn = tx.Commit()
			require.NoError(t, errTxn)
			wgTxnDone.Done()

nit: it's a good practice to do this in a defer that is the first line of the new goroutine, i.e. something like

waitGroup.Add(1)
go func() {
  defer waitGroup.Done()
  ...
}

pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go line 132 at r1 (raw file):

			stubTime.setTime(tm)
			return stubTime.Now().String()
		case "should-sample-logical-plan":

nit: we probably should rename this to should-sample.

pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go line 148 at r1 (raw file):

			fingerprint = strings.Replace(fingerprint, "%", " ", -1)

			// test harness needs an update.

nit: this comment should be removed.

pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go line 154 at r1 (raw file):

				dbName,
			)
			return fmt.Sprintf("%t, %t", savePlanForStats, everSampled)

nit: it seems more intuitive to keep the order of variables here in sync with ShouldSample - so either switch here or in the signature of ShouldSample itself.

j82w · 2022-10-06T12:15:37Z

-- commits line 20 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I don't think we've had a stable release with this bug, so I would omit the release note entirely. If you prefer to keep it, then I would explicitly mention that the bug would occur only if the plan collection setting is explicitly disabled by user. (Also, I believe we don't really document what and how we are sampling, which is yet another reason to not have a release note here.)

I'll remove it

j82w · 2022-10-06T12:31:05Z

pkg/sql/crdb_internal_test.go line 945 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we try to keep the comments wrapped at 80 characters. If you're using GoLand, you could set a visual guide (or automatic wrapping on typing) via Preferences -> Editor -> Code Style -> Go -> Wrapping and Braces.

I see in the coding standard is 80, but almost every file I go has some violation. The default is 120. I wonder if this could be enforced with a linter. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines#Line-Length

xinhaoz

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @j82w, @xinhaoz, and @yuzefovich)

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 123 at r3 (raw file):

) (previouslySampled bool, savePlanForStats bool) {
	sampledInFlushTarget := false
	savePlanForStatsInFlushTarget := true

Thanks for this rename 👍

j82w · 2022-10-06T16:35:25Z

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 123 at r3 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Thanks for this rename 👍

Done.

j82w · 2022-10-06T16:35:43Z

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We should address this TODO as part of this commit. I believe the current logic makes sense, but it'd be good for @xinhaoz to confirm it.

The only concern I have is when the stats get flushed it will see all statements as new again.

yuzefovich

Reviewed 4 of 7 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

Previously, j82w wrote…

The only concern I have is when the stats get flushed it will see all statements as new again.

Hm, I see. How frequent are these flushes? I'll defer to you and Xin Hao on whether this is concerning. I guess you could leave a TODO to address it later (just use a github username for it, i.e. something like TODO(j82w): ...).

pkg/sql/crdb_internal_test.go line 945 at r1 (raw file):

Previously, j82w wrote…

I see in the coding standard is 80, but almost every file I go has some violation. The default is 120. I wonder if this could be enforced with a linter. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines#Line-Length

Yes, we definitely have many violations, still we should try to keep them at minimum. We are also more stringent about the character count for comments than for the regular code.

pkg/sql/crdb_internal_test.go line 1019 at r3 (raw file):

	// internal workflow.
	testutils.SucceedsWithin(t, func() error {
		rows, errVerify := conn.QueryContext(ctx, "SELECT "+

nit: with multiple-line single SQL statements we usually use backticks `. Feel free to ignore this nit, just FYI.

xinhaoz · 2022-10-06T18:32:10Z

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

The only concern I have is when the stats get flushed it will see all statements as new again.

The flushes are default every 10mins, but we were already seeing the behaviour described by Jake prior to this change.

Fixes: #89185 The first time a fingerprint is seen tracing should be enabled. This currently is broken if sql.metrics.statement_details.plan_collection.enabled is set to false. This can cause crdb_internal.transaction_contention_events to be empty because tracing was never enabled so the contention event was never recorded. To properly fix this a new value needs to be returned on ShouldSample to tell if it is the first time a fingerprint is seen. This will remove the dependency on plan_collection feature switch. Release justification: Bug fixes and low-risk updates to new functionality. Release note: none

j82w · 2022-10-10T15:46:37Z

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

The only concern I have is when the stats get flushed it will see all statements as new again.

The flushes are default every 10mins, but we were already seeing the behaviour described by Jake prior to this change.

I confirmed that it currently always records the statement if you wait 10mins. In that case I will not add a todo note, because it's the current behavior. We should wait for either customer feedback or perf analysis to show it's an issue.

j82w · 2022-10-10T15:47:04Z

bors r+

craig · 2022-10-10T17:06:57Z

Build succeeded:

Bazel Essential CI (Cockroach)

89782: roachtest: limit FingerprintValidator's memory usage internally r=miretskiy,srosenberg a=renatolabs In #89332, we started optionally validating changefeed semantics in the `FingerprintValidator` by making sure that unseen updates are never observed if a `resolved` message is received. In order to keep bounds on the validator's memory usage, the `cdc/mixed-versions` test was setting a maximum number of operations at the workload level. However, a consequence of that change is that it creates the possibility for the workload to finish before the roachtest has received all the `resolved` events it expects (for instance, if draining takes longer than usual for some reason; a variety of other non-determinism is also at play). To deal with this possibility, we instead enforce a maximum number of previously seen events at the validator's level. For the `cdc/mixed-versions` test, we set a maximum of 100,000 previously seen events being stored (20MB memory footprint). This should be equivalent to roughly 50,000 bank transfers, and give enough time for the test to finish successfully. Release note: None Epic: None. 89847: sql: turn plan sampling back off by default r=j82w a=j82w The value was originally disabled by default in: #88343. The value was enabled by default: #89020 because of a bug that was fixed in #89418. This PR is disabling it by default again now that the bug has been fixed. Part Of #89847 Release note (sql change): Turn the default value of sql.metrics.statement_details.plan_collection.enabled to false. Co-authored-by: Renato Costa <[email protected]> Co-authored-by: j82w <[email protected]>

This value was enabled by default again: #89020. The reason it was enabled again was fixed in #89418. Part of #89847 Release note (sql change): Turn the default value of sql.metrics.statement_details.plan_collection.enabled to false.

This value was enabled by default again: #89020. The reason it was enabled again was fixed in #89418. Part of #89847 Release justification: Bug fixes and low-risk updates to new functionality. Release note (sql change): Turn the default value of sql.metrics.statement_details.plan_collection.enabled to false.

109785: sql: fix performance when disabling sql.metrics.statement_details r=j82w a=j82w There is a performance regression when disabling sql.metrics.statement_details.enabled. It will see that every execution is the first time the query was executed so it will enabled execution stats collection for all executions. This is a bug caused by always collecting the execution stats of the first time a statement is executed in #89418. Fixes: #106811 Release note (bug fix): Fixes a bug causing performance regression when disabling `sql.metrics.statement_details.enabled` which caused execution stats to be collected for all queries instead of the default one percent. Co-authored-by: j82w <[email protected]>

…bled There is a performance regression when disabling sql.metrics.statement_details.enabled. It will see that every execution is the first time the query was executed so it will enabled execution stats collection for all executions. This is a bug caused by always collecting the execution stats of the first time a statement is executed in #89418. Fixes: #106811 Release note (bug fix): Fixes a bug causing performance regression when disabling `sql.metrics.statement_details.enabled` which caused execution stats to be collected for all queries instead of the default one percent.

j82w requested review from yuzefovich and a team October 5, 2022 17:42

xinhaoz reviewed Oct 5, 2022

View reviewed changes

yuzefovich reviewed Oct 6, 2022

View reviewed changes

j82w changed the title ~~Always enable tracing first time fingerprint is seen~~ sql: always enable tracing first time fingerprint is seen Oct 6, 2022

j82w changed the title ~~sql: always enable tracing first time fingerprint is seen~~ sqlstats: always enable tracing first time fingerprint is seen Oct 6, 2022

xinhaoz approved these changes Oct 6, 2022

View reviewed changes

j82w added the backport-22.2.x label Oct 6, 2022

yuzefovich reviewed Oct 6, 2022

View reviewed changes

craig bot merged commit 94e2a2b into cockroachdb:master Oct 10, 2022

blathers-crl bot mentioned this pull request Oct 10, 2022

release-22.2: sqlstats: always enable tracing first time fingerprint is seen #89671

Merged

j82w mentioned this pull request Oct 12, 2022

sql: turn plan sampling back off by default #89847

Merged

blathers-crl bot mentioned this pull request Oct 13, 2022

release-22.2: sql: turn plan sampling back off by default #89920

Merged

j82w mentioned this pull request Jul 14, 2023

Performance regression if sql.metrics.statement_details.enabled is disabled #106811

Closed

j82w mentioned this pull request Aug 30, 2023

sql: fix performance when disabling sql.metrics.statement_details #109785

Merged

blathers-crl bot mentioned this pull request Sep 1, 2023

release-23.1: sql: fix performance when disabling sql.metrics.statement_details #109881

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlstats: always enable tracing first time fingerprint is seen #89418

sqlstats: always enable tracing first time fingerprint is seen #89418

j82w commented Oct 5, 2022

cockroach-teamcity commented Oct 5, 2022

xinhaoz left a comment

yuzefovich left a comment

j82w commented Oct 6, 2022

j82w commented Oct 6, 2022

xinhaoz left a comment

j82w commented Oct 6, 2022

j82w commented Oct 6, 2022

yuzefovich left a comment

xinhaoz commented Oct 6, 2022

j82w commented Oct 10, 2022

j82w commented Oct 10, 2022

craig bot commented Oct 10, 2022

sqlstats: always enable tracing first time fingerprint is seen #89418

sqlstats: always enable tracing first time fingerprint is seen #89418

Conversation

j82w commented Oct 5, 2022

cockroach-teamcity commented Oct 5, 2022

xinhaoz left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

j82w commented Oct 6, 2022

j82w commented Oct 6, 2022

xinhaoz left a comment

Choose a reason for hiding this comment

j82w commented Oct 6, 2022

j82w commented Oct 6, 2022

yuzefovich left a comment

Choose a reason for hiding this comment

xinhaoz commented Oct 6, 2022

j82w commented Oct 10, 2022

j82w commented Oct 10, 2022

craig bot commented Oct 10, 2022