Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Add database ID to sampled query log #14604

Open
cockroach-teamcity opened this issue Jul 21, 2022 · 1 comment
Open

sql: Add database ID to sampled query log #14604

cockroach-teamcity opened this issue Jul 21, 2022 · 1 comment

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 21, 2022

Exalate commented:

Related PR: cockroachdb/cockroach#84354
Commit: cockroachdb/cockroach@79dae84


Release note (sql change): Sampled query telemetry log includes new
database ID field.

Jira Issue: DOC-5138

@THardy98
Copy link

THardy98 commented Jul 25, 2022

This change is being reverted as per cockroachdb/cockroach#85017, for reasons outlined in this comment:

This change adds risk to the sampling since it requires a lease, along with a new knob that we'd have to educate users about in the case of an emergency: #84429

we require access to descriptors every time we attempt to emit such a log. However, if these descriptors or the leasing mechanism are unavailable, the emission of this log will indefinitely block, greatly impacting the corresponding SQL Session.

Given the edge case nature here, and given that we're about to begin emitting schema telemetry every day that we can join samples against during ETL or after, I'd prefer to see this change reverted unless we can find a way to get the database ID without having to unconditionally grab a lease, which happens if you call GetImmutableDatabaseByName.

The problem is that in some plans, like SHOW session_var or SELECT crdb_internal.some_function(), we might not already have the leased database. And requiring these to lease the database could indeed lead to extra unavailability, which we'd prefer to avoid.

Some possible ways to deal with this:

only log the DB ID if the plan we just ran touched the DB (as in ordinary SQL plans)
only log the DB ID if the DB descriptor is already cached
For the first idea, we can ask the Queries team for help (perhaps the optimizer's metadata structure caches the database ID when it's used). For the second idea, we can ask the Schema team for help (we don't have an API that checks for entries just in the cache yet, but we could build one).

In the meantime, I think we should revert this change (sorry @THardy98! you couldn't have known about all of this complexity ahead of time).

@exalate-issue-sync exalate-issue-sync bot assigned ghost Aug 9, 2022
@exalate-issue-sync exalate-issue-sync bot unassigned ghost Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants