Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add more info to the combined statement endpoint #109040

Merged
merged 1 commit into from
Aug 21, 2023

Conversation

maryliag
Copy link
Contributor

@maryliag maryliag commented Aug 18, 2023

Previously, it was hard to know which tables were
used to populate the SQL Activity, making debug for it complicated.
This commit adds extra information to the
combinedstmts to help:

  • oldestAggregatedTsReturned
  • stmtsSourceTable
  • txnsSourceTable

Returning a value for oldestAggregatedTsReturned will also allow us to show proper warning when the older timestamp doesn't match the start date of the selected time period on the Search Criteria.

Part Of #108989

Release note: None

@maryliag maryliag requested review from koorosh and a team August 18, 2023 20:38
@maryliag maryliag requested review from a team as code owners August 18, 2023 20:38
@maryliag maryliag requested review from a team and dhartunian and removed request for a team August 18, 2023 20:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@maryliag maryliag removed the request for review from dhartunian August 18, 2023 20:38
@maryliag maryliag changed the title server: add more info the combined statement endpoint server: add more info to the combined statement endpoint Aug 19, 2023
Copy link
Collaborator

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)


pkg/server/combined_statement_stats.go line 342 at r1 (raw file):

			fmt.Sprintf(`
SELECT COALESCE(min(aggregated_ts), now())
FROM %s %s`, table, whereClause), args...)

I have concern about this statement, I ran EXPLAIN ANALYZE on this query and it used full scan on it.

EXPLAIN ANALYZE SELECT COALESCE(min(aggregated_ts), now()) FROM crdb_internal.statement_statistics WHERE true AND aggregated_ts >= now() AND aggregated_ts <= now();

result:

planning time: 9ms
  execution time: 9ms
  distribution: local
  vectorized: true

  • render
  │
  └── • group (scalar)
      │ nodes: n1
   ...
                  │
                  └── • scan
                        ....
                        table: statement_statistics@fingerprint_stats_idx  ----------------------  WARNING: the row count estimate is inaccurate, consider running 'ANALYZE statement_statistics'
                        spans: FULL SCAN (SOFT LIMIT)

Most of the indexes on system.statement\_statistics table defined with condition WHERE app\_name NOT LIKE '$ internal%' so I suggest to include this clause in WHERE condition of the statement:

EXPLAIN ANALYZE 
  SELECT COALESCE(min(aggregated_ts), now()) FROM crdb_internal.statement_statistics 
  WHERE true 
    AND aggregated_ts >= now()  
    AND aggregated_ts <= now() 
    AND app_name NOT LIKE '$ internal%' -- <- add this

which gives us:

          └── • scan
                nodes: n1
                ...
                table: statement_statistics@execution_count_idx (partial index)
                spans: [/'2023-08-18 17:31:07.875656+00' - /'2023-08-19 17:40:35.478088+00']

pkg/server/serverpb/status.proto line 1644 at r1 (raw file):

  // OlderAggregatedTsReturned is the timestamp of the oldest entry returned,
  // or NOW() if there is no data returned.
  google.protobuf.Timestamp older_aggregated_ts_returned = 8 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];

The naming of field is a bit confusing.

Copy link
Collaborator

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)


pkg/server/combined_statement_stats.go line 345 at r1 (raw file):

		if err != nil {
			return timeutil.Now(), err

What is the motivation to return now() instead of NULL in case of error or empty results?

Copy link
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @koorosh)


pkg/server/combined_statement_stats.go line 342 at r1 (raw file):

Previously, koorosh (Andrii Vorobiov) wrote…

I have concern about this statement, I ran EXPLAIN ANALYZE on this query and it used full scan on it.

EXPLAIN ANALYZE SELECT COALESCE(min(aggregated_ts), now()) FROM crdb_internal.statement_statistics WHERE true AND aggregated_ts >= now() AND aggregated_ts <= now();

result:

planning time: 9ms
  execution time: 9ms
  distribution: local
  vectorized: true

  • render
  │
  └── • group (scalar)
      │ nodes: n1
   ...
                  │
                  └── • scan
                        ....
                        table: statement_statistics@fingerprint_stats_idx  ----------------------  WARNING: the row count estimate is inaccurate, consider running 'ANALYZE statement_statistics'
                        spans: FULL SCAN (SOFT LIMIT)

Most of the indexes on system.statement\_statistics table defined with condition WHERE app\_name NOT LIKE '$ internal%' so I suggest to include this clause in WHERE condition of the statement:

EXPLAIN ANALYZE 
  SELECT COALESCE(min(aggregated_ts), now()) FROM crdb_internal.statement_statistics 
  WHERE true 
    AND aggregated_ts >= now()  
    AND aggregated_ts <= now() 
    AND app_name NOT LIKE '$ internal%' -- <- add this

which gives us:

          └── • scan
                nodes: n1
                ...
                table: statement_statistics@execution_count_idx (partial index)
                spans: [/'2023-08-18 17:31:07.875656+00' - /'2023-08-19 17:40:35.478088+00']

We actually have a cluster setting to decide if we're returning internal statements on this endpoint or not, so I added the changes to take that into consideration. The default value is to filter out internal, so now is doing that correctly! Nice callout!


pkg/server/combined_statement_stats.go line 345 at r1 (raw file):

Previously, koorosh (Andrii Vorobiov) wrote…

What is the motivation to return now() instead of NULL in case of error or empty results?

I get compile error if I try to just return null, since the function expects a Time. I changed to just the empty Time component, which ends up also returning NOW, but at east shows that we don't really care the value itself


pkg/server/serverpb/status.proto line 1644 at r1 (raw file):

Previously, koorosh (Andrii Vorobiov) wrote…

The naming of field is a bit confusing.

Made a slight change, let me know if is more clear now. We do select a time period, but we might not have data for that full period, so let's say we request from august 10 to august 19, but we only have data starting on aug 15, so that is the value returned here. A following PR will use this value to show a warning in the UI.

Let me know if the name is better now.

Copy link
Collaborator

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)


pkg/server/combined_statement_stats.go line 342 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

We actually have a cluster setting to decide if we're returning internal statements on this endpoint or not, so I added the changes to take that into consideration. The default value is to filter out internal, so now is doing that correctly! Nice callout!

Great!


pkg/server/combined_statement_stats.go line 345 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

I get compile error if I try to just return null, since the function expects a Time. I changed to just the empty Time component, which ends up also returning NOW, but at east shows that we don't really care the value itself

It requires following changes to make it work with null:

  1. Remove (gogoproto.nullable) = false attribute from oldest_aggregated_ts_returned field in ptoto message
google.protobuf.Timestamp oldest\_aggregated\_ts\_returned = 8 [(gogoproto.stdtime) = true];
  1. Return pointer to time.Date in getOldestDate and change returned values to either nil or address of value
getOldestDate := func(table string) (\*time.Time, error) {

My concern about returning current time is to avoid confusion when users reset stats so there's no records in tables but oldest aggregated ts would return current time which in turn would be problematic.


pkg/server/combined_statement_stats.go line 375 at r2 (raw file):

		}()

		return tree.MustBeDTimestampTZ(row[0]).Time, nil

tree.MustBeDTimestampTZ panics if it cannot retrieve the value. Should we be more tolerant and use AsDTimestampTZ that indicates where the value parsed or not?


pkg/server/serverpb/status.proto line 1644 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

Made a slight change, let me know if is more clear now. We do select a time period, but we might not have data for that full period, so let's say we request from august 10 to august 19, but we only have data starting on aug 15, so that is the value returned here. A following PR will use this value to show a warning in the UI.

Let me know if the name is better now.

Got it! Thanks for detailed explanation it is clear now! Thank you!

Copy link
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @koorosh)


pkg/server/combined_statement_stats.go line 345 at r1 (raw file):

Previously, koorosh (Andrii Vorobiov) wrote…

It requires following changes to make it work with null:

  1. Remove (gogoproto.nullable) = false attribute from oldest_aggregated_ts_returned field in ptoto message
google.protobuf.Timestamp oldest\_aggregated\_ts\_returned = 8 [(gogoproto.stdtime) = true];
  1. Return pointer to time.Date in getOldestDate and change returned values to either nil or address of value
getOldestDate := func(table string) (\*time.Time, error) {

My concern about returning current time is to avoid confusion when users reset stats so there's no records in tables but oldest aggregated ts would return current time which in turn would be problematic.

Nice! I made the changes and now returns null when no data is returned :)
thank you for the detailed steps


pkg/server/combined_statement_stats.go line 375 at r2 (raw file):

Previously, koorosh (Andrii Vorobiov) wrote…

tree.MustBeDTimestampTZ panics if it cannot retrieve the value. Should we be more tolerant and use AsDTimestampTZ that indicates where the value parsed or not?

the value should ne null or timestamp, so I added a check for DNull first, then make the call the parse the timestamp, now it will probably return null or the timestamp.

Copy link
Collaborator

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great!
Also want to clarify if we need to do the same check for transaction statistics?

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)


pkg/server/combined_statement_stats.go line 375 at r2 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

the value should ne null or timestamp, so I added a check for DNull first, then make the call the parse the timestamp, now it will probably return null or the timestamp.

Nice! 🎉

Previously, it was hard to know which tables were
used to populate the SQL Activity, making debug for it
complicated.
This commit adds extra information to the
`combinedstmts` to help:
- olderAggregatedTsReturned
- stmtsSourceTable
- txnsSourceTable

Returning a value for `olderAggregatedTsReturned` will
also allow us to show proper warning when the older
timestamp doesn't match the start date of the selected
time period on the Search Criteria.

Part Of cockroachdb#108989

Release note: None
Copy link
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On transaction statistics we also return statement statistics, so it will go over the same parts of the code, I just had to made an adjustement.

Examples of responses when:

Statement page with results:

"stmtsTotalRuntimeSecs": 89.46363,
  "txnsTotalRuntimeSecs": 0,
  "oldestAggregatedTsReturned": "2023-08-20T14:00:00Z",
  "stmtsSourceTable": "crdb\_internal.statement\_statistics\_persisted",
  "txnsSourceTable": ""

Statement page with no results:

"stmtsTotalRuntimeSecs": 0,
  "txnsTotalRuntimeSecs": 0,
  "oldestAggregatedTsReturned": null,
  "stmtsSourceTable": "crdb\_internal.statement\_statistics",
  "txnsSourceTable": ""

Transactions page with results:

"stmtsTotalRuntimeSecs": 90.2161,
  "txnsTotalRuntimeSecs": 47.55528,
  "oldestAggregatedTsReturned": "2023-08-20T14:00:00Z",
  "stmtsSourceTable": "crdb\_internal.statement\_statistics\_persisted",
  "txnsSourceTable": "crdb\_internal.transaction\_statistics\_persisted"

Transactions page with no results:

"stmtsTotalRuntimeSecs": 0,
  "txnsTotalRuntimeSecs": 0,
  "oldestAggregatedTsReturned": null,
  "stmtsSourceTable": "crdb\_internal.statement\_statistics",
  "txnsSourceTable": "crdb\_internal.transaction\_statistics"
```

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @koorosh)

Copy link
Collaborator

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my question was misleading.
getOldestDate function is only called to get oldest timestamp for statement_activity tables but not transaction_activity tables. I'm not sure if it is necessary or not.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)

Copy link
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was on purpose, because the statements data would reach the limit and be deleted first and that is used by both statements and transactions, so having just one of them is enough.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @koorosh)

Copy link
Collaborator

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag)

Copy link
Collaborator

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag)

@maryliag
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 21, 2023

Build succeeded:

@craig craig bot merged commit 2ac44e6 into cockroachdb:master Aug 21, 2023
@maryliag maryliag deleted the warning-old branch August 21, 2023 14:23
@maryliag
Copy link
Contributor Author

blathers backport 23.1

@blathers-crl
Copy link

blathers-crl bot commented Aug 21, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error setting reviewers, but backport branch blathers/backport-release-23.1-109040 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/109132/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 23.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants