Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlstats: always enable tracing first time fingerprint is seen #89418

Merged
merged 1 commit into from
Oct 10, 2022
Merged

sqlstats: always enable tracing first time fingerprint is seen #89418

merged 1 commit into from
Oct 10, 2022

Conversation

j82w
Copy link
Contributor

@j82w j82w commented Oct 5, 2022

Fixes: #89185

The first time a fingerprint is seen tracing should be enabled. This currently is broken if
sql.metrics.statement_details.plan_collection.enabled is set to false. This can cause crdb_internal.transaction_contention_events to be empty because tracing was never enabled to the contention event was never recorded.

To properly fix this a new value needs to be returned on ShouldSample to tell if it is the first time a fingerprint is seen. This will remove the dependency on plan_collection feature switch.

Release justification: Bug fixes and low-risk updates to new functionality.

Release note (bug fix): Always enable tracing the frist time a fingerprint is seen.

@j82w j82w requested review from yuzefovich and a team October 5, 2022 17:42
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @yuzefovich)


-- commits line 10 at r1:
typo? to -> so


-- commits line 20 at r1:
typo frist -> first


pkg/sql/instrumentation.go line 258 at r1 (raw file):

	ih.withStatementTrace = cfg.TestingKnobs.WithStatementTrace

	var everSampled bool

nit: just personal pref but this wording seems a little awkward to me. would something like previouslySampled or sampleExists be too lengthy? would just sampled suffice?


pkg/sql/crdb_internal_test.go line 941 at r1 (raw file):

}

func TestTxnContentionEventsTable(t *testing.T) {

Is this possible as a data driven test?


pkg/sql/crdb_internal_test.go line 1038 at r1 (raw file):

	require.Less(t, rowCount, 3, "node_execution_insights found 3 rows. It should only record first, "+
		"but there is a chance based ons sampling to get 2 rows.")

typo: ons -> on

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! I left a bunch of nits, but please don't be discouraged by that - over time all of these code style things will become the second nature.

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)


-- commits line 2 at r1:
nit: use package name prefix to indicate where most of the changes are, also no capital letters in the commit message, so it should look something like sqlstats: always enable ....


-- commits line 20 at r1:
nit: I don't think we've had a stable release with this bug, so I would omit the release note entirely. If you prefer to keep it, then I would explicitly mention that the bug would occur only if the plan collection setting is explicitly disabled by user. (Also, I believe we don't really document what and how we are sampling, which is yet another reason to not have a release note here.)


pkg/sql/instrumentation.go line 258 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

nit: just personal pref but this wording seems a little awkward to me. would something like previouslySampled or sampleExists be too lengthy? would just sampled suffice?

I like previouslySampled more too.


pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

	}

	// TODO: this might need to be thought through a bit more.

We should address this TODO as part of this commit. I believe the current logic makes sense, but it'd be good for @xinhaoz to confirm it.


pkg/sql/crdb_internal_test.go line 941 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Is this possible as a data driven test?

It would probably be more involved to do it as a data-driven test, so I'd be in favor of keeping the current approach.

(If one is interested, we did add an ability for this kind of test to be done in the logic test framework - which is very similar to data driven framework - in #79010 (and maybe a couple of other related PRs from Alex Sarkesian) , and I would imagine we would have to do something similar here.)


pkg/sql/crdb_internal_test.go line 945 at r1 (raw file):

	defer log.Scope(t).Close(t)

	// Start the cluster. (One node is sufficient; the outliers system is currently in-memory only.)

nit: we try to keep the comments wrapped at 80 characters. If you're using GoLand, you could set a visual guide (or automatic wrapping on typing) via Preferences -> Editor -> Code Style -> Go -> Wrapping and Braces.


pkg/sql/crdb_internal_test.go line 951 at r1 (raw file):

	tc := testcluster.StartTestCluster(t, 1, args)
	defer tc.Stopper().Stop(ctx)
	conn := tc.ServerConn(0)

nit: you could wrap conn with sqlutils.MakeSQLRunner to avoid having to check errors and whatnot. This is just FYI, so please feel free to ignore this comment.


pkg/sql/crdb_internal_test.go line 975 at r1 (raw file):

		wgTxnStarted.Add(1)

		// Lock to wait for the txn to complete to avoid the test finishing before the txn is committed

nit: missing period here and below.


pkg/sql/crdb_internal_test.go line 989 at r1 (raw file):

			errTxn = tx.Commit()
			require.NoError(t, errTxn)
			wgTxnDone.Done()

nit: it's a good practice to do this in a defer that is the first line of the new goroutine, i.e. something like

waitGroup.Add(1)
go func() {
  defer waitGroup.Done()
  ...
}

pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go line 132 at r1 (raw file):

			stubTime.setTime(tm)
			return stubTime.Now().String()
		case "should-sample-logical-plan":

nit: we probably should rename this to should-sample.


pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go line 148 at r1 (raw file):

			fingerprint = strings.Replace(fingerprint, "%", " ", -1)

			// test harness needs an update.

nit: this comment should be removed.


pkg/sql/sqlstats/persistedsqlstats/datadriven_test.go line 154 at r1 (raw file):

				dbName,
			)
			return fmt.Sprintf("%t, %t", savePlanForStats, everSampled)

nit: it seems more intuitive to keep the order of variables here in sync with ShouldSample - so either switch here or in the signature of ShouldSample itself.

@j82w
Copy link
Contributor Author

j82w commented Oct 6, 2022

-- commits line 20 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I don't think we've had a stable release with this bug, so I would omit the release note entirely. If you prefer to keep it, then I would explicitly mention that the bug would occur only if the plan collection setting is explicitly disabled by user. (Also, I believe we don't really document what and how we are sampling, which is yet another reason to not have a release note here.)

I'll remove it

@j82w
Copy link
Contributor Author

j82w commented Oct 6, 2022

pkg/sql/crdb_internal_test.go line 945 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we try to keep the comments wrapped at 80 characters. If you're using GoLand, you could set a visual guide (or automatic wrapping on typing) via Preferences -> Editor -> Code Style -> Go -> Wrapping and Braces.

I see in the coding standard is 80, but almost every file I go has some violation. The default is 120. I wonder if this could be enforced with a linter. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines#Line-Length

@j82w j82w changed the title Always enable tracing first time fingerprint is seen sql: always enable tracing first time fingerprint is seen Oct 6, 2022
@j82w j82w changed the title sql: always enable tracing first time fingerprint is seen sqlstats: always enable tracing first time fingerprint is seen Oct 6, 2022
Copy link
Member

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @j82w, @xinhaoz, and @yuzefovich)


pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 123 at r3 (raw file):

) (previouslySampled bool, savePlanForStats bool) {
	sampledInFlushTarget := false
	savePlanForStatsInFlushTarget := true

Thanks for this rename 👍

@j82w
Copy link
Contributor Author

j82w commented Oct 6, 2022

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 123 at r3 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Thanks for this rename 👍

Done.

@j82w
Copy link
Contributor Author

j82w commented Oct 6, 2022

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We should address this TODO as part of this commit. I believe the current logic makes sense, but it'd be good for @xinhaoz to confirm it.

The only concern I have is when the stats get flushed it will see all statements as new again.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 7 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)


pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

Previously, j82w wrote…

The only concern I have is when the stats get flushed it will see all statements as new again.

Hm, I see. How frequent are these flushes? I'll defer to you and Xin Hao on whether this is concerning. I guess you could leave a TODO to address it later (just use a github username for it, i.e. something like TODO(j82w): ...).


pkg/sql/crdb_internal_test.go line 945 at r1 (raw file):

Previously, j82w wrote…

I see in the coding standard is 80, but almost every file I go has some violation. The default is 120. I wonder if this could be enforced with a linter. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines#Line-Length

Yes, we definitely have many violations, still we should try to keep them at minimum. We are also more stringent about the character count for comments than for the regular code.


pkg/sql/crdb_internal_test.go line 1019 at r3 (raw file):

	// internal workflow.
	testutils.SucceedsWithin(t, func() error {
		rows, errVerify := conn.QueryContext(ctx, "SELECT "+

nit: with multiple-line single SQL statements we usually use backticks `. Feel free to ignore this nit, just FYI.

@xinhaoz
Copy link
Member

xinhaoz commented Oct 6, 2022

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

The only concern I have is when the stats get flushed it will see all statements as new again.

The flushes are default every 10mins, but we were already seeing the behaviour described by Jake prior to this change.

Fixes: #89185

The first time a fingerprint is seen tracing should be enabled.
This currently is broken if
sql.metrics.statement_details.plan_collection.enabled is set to false.
This can cause crdb_internal.transaction_contention_events to be
empty because tracing was never enabled so the contention event was
never recorded.

To properly fix this a new value needs to be returned on ShouldSample
to tell if it is the first time a fingerprint is seen. This will remove
the dependency on plan_collection feature switch.

Release justification: Bug fixes and low-risk updates to new
functionality.

Release note: none
@j82w
Copy link
Contributor Author

j82w commented Oct 10, 2022

pkg/sql/sqlstats/sslocal/sslocal_stats_collector.go line 129 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

The only concern I have is when the stats get flushed it will see all statements as new again.

The flushes are default every 10mins, but we were already seeing the behaviour described by Jake prior to this change.

I confirmed that it currently always records the statement if you wait 10mins. In that case I will not add a todo note, because it's the current behavior. We should wait for either customer feedback or perf analysis to show it's an issue.

@j82w
Copy link
Contributor Author

j82w commented Oct 10, 2022

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 10, 2022

Build succeeded:

@craig craig bot merged commit 94e2a2b into cockroachdb:master Oct 10, 2022
craig bot pushed a commit that referenced this pull request Oct 13, 2022
89782: roachtest: limit FingerprintValidator's memory usage internally r=miretskiy,srosenberg a=renatolabs

In #89332, we started optionally validating changefeed semantics in the `FingerprintValidator` by making sure that unseen updates are never observed if a `resolved` message is received. In order to keep bounds on the validator's memory usage, the `cdc/mixed-versions` test was setting a maximum number of operations at the workload level. However, a consequence of that change is that it creates the possibility for the workload to finish before the roachtest has received all the `resolved` events it expects (for instance, if draining takes longer than usual for some reason; a variety of other non-determinism is also at play).

To deal with this possibility, we instead enforce a maximum number of previously seen events at the validator's level. For the `cdc/mixed-versions` test, we set a maximum of 100,000 previously seen events being stored (20MB memory footprint). This should be equivalent to roughly 50,000 bank transfers, and give enough time for the test to finish successfully.

Release note: None

Epic: None.

89847: sql: turn plan sampling back off by default r=j82w a=j82w

The value was originally disabled by default in: #88343. The value was enabled by default: #89020 because of a bug that was fixed in #89418. This PR is disabling it by default again now that the bug has been fixed.

Part Of #89847

Release note (sql change): Turn the default value of sql.metrics.statement_details.plan_collection.enabled to false.

Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: j82w <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Oct 13, 2022
This value was enabled by default again: #89020. The reason it was
enabled again was fixed in #89418.

Part of #89847

Release note (sql change): Turn the default value of
sql.metrics.statement_details.plan_collection.enabled to false.
j82w pushed a commit that referenced this pull request Oct 18, 2022
This value was enabled by default again: #89020. The reason it was
enabled again was fixed in #89418.

Part of #89847

Release justification: Bug fixes and low-risk updates to new functionality.

Release note (sql change): Turn the default value of
sql.metrics.statement_details.plan_collection.enabled to false.
j82w pushed a commit that referenced this pull request Nov 2, 2022
This value was enabled by default again: #89020. The reason it was
enabled again was fixed in #89418.

Part of #89847

Release justification: Bug fixes and low-risk updates to new functionality.

Release note (sql change): Turn the default value of
sql.metrics.statement_details.plan_collection.enabled to false.
craig bot pushed a commit that referenced this pull request Sep 1, 2023
109785: sql: fix performance when disabling sql.metrics.statement_details r=j82w a=j82w

There is a performance regression when disabling
sql.metrics.statement_details.enabled. It will see that every execution is the first time the query was executed so it will enabled execution stats collection for all executions. This is a bug caused by always collecting the execution stats of the first time a statement is executed in #89418.

Fixes: #106811

Release note (bug fix): Fixes a bug causing performance regression when disabling `sql.metrics.statement_details.enabled` which caused execution stats to be 
collected for all queries instead of the default one percent.

Co-authored-by: j82w <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Sep 1, 2023
…bled

There is a performance regression when disabling
sql.metrics.statement_details.enabled. It will see that every execution
is the first time the query was executed so it will enabled execution
stats collection for all executions. This is a bug caused by always
collecting the execution stats of the first time a statement is
executed in #89418.

Fixes: #106811

Release note (bug fix): Fixes a bug causing performance regression
when disabling `sql.metrics.statement_details.enabled` which caused
execution stats to be collected for all queries instead of the default
one percent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracing should always log first time statement is executed
4 participants