jobsprofiler: enable requesting a job's execution details #105384

adityamaru · 2023-06-22T20:01:44Z

Similar to statement bundles this change introduces the
infrastructure to request, collect and read the execution
details for a particular job.
Right now, the execution details will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:

cluster-wide job traces
cpu profiles
trace-driven aggregated stats
raw payload and progress protos

Downloading some or all of these execution details will be
exposed in a future patch in all of the places where
statement bundles are today:

DBConsole
CLI shell
SQL shell

This change introduces a builtin that allows the caller
to request the collection and persistence of a job's
current execution details.

This change also introduces a new endpoint on the status
server to read the data corresponding to the execution details
persisted for a job. The next set of
PRs will add the necessary components to allow downloading
the files from the DBConsole.

Informs: #105076

Release note: None

cockroach-teamcity · 2023-06-22T20:01:55Z

This change is

This change adds a new component to the `Profiler` tab of the job details page that supports collecting and viewing job profiler bundles. The component has a button to collect job profiler bundles. These bundles are then listed in a sorted table with the ability to download each bundle. The above operations are backed by the infrastructure added in cockroachdb#105384. Note, the `Profiler` tab is currently disabled for CC but this change allows for a future project to enable the collection of bundles through the CC console as well. Informs: cockroachdb#105076 Release note (ui change): collect and download job profiler bundles from the `Profiler` tab on the job details page.

dt · 2023-06-27T21:32:46Z

Overall looks good to me. One question I had though is if we even need to request/persist/fetch the generated bundle to job_info, or if we could just have the bundle fetch endpoint generate it on the fly since it is generated from job state that is already persisted, isn't it?

pkg/sql/jobs_profiler_bundle.go

adityamaru · 2023-06-28T00:42:30Z

if we even need to request/persist/fetch the generated bundle to job_info, or if we could just have the bundle fetch endpoint generate it on the fly since it is generated from job state that is already persisted

Not all the information in the bundle is going to persisted to job state. For example, active tracing spans of a job or goroutine stacks at the time the bundle was collected. Separating the request/persist from the fetch allows us to download older bundles at a later point in time if we want to see the state of the job at different points in time.

https://www.loom.com/share/4d0ff8ffe53b4e09bf8f0de1009c066e?sid=7b1bb121-b8b4-4215-8b5c-12f103f482df is a prototype of how I want the bundles to be listed. When you request a bundle it shows up in the table, it can then be downloaded at any point in the future.

adityamaru · 2023-07-05T18:45:59Z

friendly ping @dt with the updated approach discussed offline

pkg/sql/jobs_profiler_bundle.go

dt · 2023-07-07T18:40:19Z

pkg/sql/jobs_profiler_bundle.go

+				return errors.Wrapf(err, "failed to compress chunk for file %s", filename)
+			}
+
+			// On listing we want the info_key of each chunk to sort after the


A couple nits:

%d formatted monotonic ints don't sort monotonically

unixnano isn't using the monotonic clock

I wonder if we should just use a loop counter that starts at zero and goes up, and I wonder if we should give the last chunk a well-known name so that the reader can verify they got all chunks.

I might just say use a loop counter that starts at 0 and then print them with %04d in MakeProfilerBundleChunkKey

Nice %04d was what I had forgotten about. Changed to a chunk counter, and I prefix the last chunk with _final.

Also caught a potential txn retry bug where we were mutating data inside the closure. Now we take a copy and operate on that.

Similar to statement bundles this change introduces the infrastructure to request, collect and read the execution details for a particular job. Right now, the execution details will only contain the latest DSP diagram for a job, but going forward this will give us a place to dump raw files such as: - cluster-wide job traces - cpu profiles - trace-driven aggregated stats - raw payload and progress protos Downloading some or all of these execution details will be exposed in a future patch in all of the places where statement bundles are today: - DBConsole - CLI shell - SQL shell This change introduces a builtin that allows the caller to request the collection and persistence of a job's current execution details. This change also introduces a new endpoint on the status server to read the data corresponding to the execution details persisted for a job. The next set of PRs will add the necessary components to allow downloading the files from the DBConsole. Informs: cockroachdb#105076 Release note: None

adityamaru · 2023-07-11T19:11:11Z

TFTR!

bors r=dt

craig · 2023-07-11T19:46:16Z

Build succeeded:

Bazel Essential CI (Cockroach)

In cockroachdb#105384 we added infrastructure to request and store execution details for a job. This currently only includes the DistSQL diagram generated during a job execution. Going forward this will include several files such as traces, goroutines, profiles etc. This change introduces an endpoint that allows listing all such files that are available for consumption. This list will be displayed on the job details page allowing the user to download any subset of the files collected during job execution. Informs: cockroachdb#105076 Release note: None

106629: sql,server: add endpoint to list a job's execution details r=dt a=adityamaru In #105384 we added infrastructure to request and store execution details for a job. This currently only includes the DistSQL diagram generated during a job execution. Going forward this will include several files such as traces, goroutines, profiles etc. This change introduces an endpoint that allows listing all such files that are available for consumption. This list will be displayed on the job details page allowing the user to download any subset of the files collected during job execution. Informs: #105076 Release note: None Co-authored-by: adityamaru <[email protected]>

In cockroachdb#105384 and cockroachdb#106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved obersvability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A future change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. Informs: cockroachdb#105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details

This change teaches the job resumer to fetch and write its trace recording before finishing its tracing span. These traces will be a part of the execution detail files introduced in cockroachdb#105384. These traces will be valuable in understanding a job's execution characteristics during each resumption, even if the job has reached a terminal state. Currently, this behaviour is opt-in and has been enabled for backups, restore, import and physical replication jobs. Informs: cockroachdb#102794 Release note: None

105368: backupccl: add unit tests for FileSSTSink r=rhu713 a=rhu713 Backfill unit tests for the basic functionality of FileSSTSink with additional test cases involving inputs of keys with many entries in its revision history. Epic: CRDB-27758 Release note: None 105624: jobsprofiler: dump trace recording on job completion r=dt a=adityamaru This change teaches the job resumer to fetch and write its trace recording before finishing its tracing span. These traces will be consumed by the job profiler bundle that is being introduced in #105384. These traces will be valuable in understanding a job's execution characteristics during each resumption, even if the job has reached a terminal state. Currently, this behaviour is opt-in and has been enabled for backups, restore, import and physical replication jobs. Informs: #102794 Release note: None 106515: DEPS: bump across etcd-io/raft#81 and disable conf change validation r=erikgrinaker a=tbg We don't want raft to validate conf changes, since that causes issues due to false positives (the check is above raft, but needs to be below raft to always work correctly). We are taking responsibility for carrying out only valid conf changes, as we always have. See also etcd-io/raft#80. Fixes #105797. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Note to editors: this time it's really addressed (fingers crossed); a previous attempt with an identical release note had to be reverted. 106939: changefeedccl: fix flake in TestParquetRows r=miretskiy a=jayshrivastava changefeedccl: fix flake in TestParquetRows Previously, this test would flake when rows were not emitted in the exact order they were inserted/modified. This change makes the test resilient to different ordering. Epic: None Fixes: #106911 Release note: None --- util/parquet: make metadata transparent in tests Previously, users of the library would need to explicitly call `NewWriterWithReaderMetadata()` to configure the parquet writer to add metadata required to use reader utils in `pkg/util/parquet/testutils.go`. This led to a lot of code uncessary code duplication. This moves the logic to decide if metadata should be written to `NewWriter()` so callers do not need to do the extra work. Epic: None Release note: None Co-authored-by: Rui Hu <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]>

In cockroachdb#105384 and cockroachdb#106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved observability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A follow-up change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. This change also renames the `Profiler` tab to `Advanced Debugging` as the users of this tab are going to be internal CRDB support and engineering for the time being. Informs: cockroachdb#105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details

106879: jobs: add table to display execution details r=maryliag a=adityamaru In #105384 and #106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved obersvability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A future change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. Informs: #105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details <img width="1505" alt="Screenshot 2023-07-18 at 2 26 50 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/aebe18a6-9c25-4c9a-ad7c-a94e2e4c97ff"> <img width="1510" alt="Screenshot 2023-07-18 at 2 27 03 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/da9b3a21-8dc6-47ca-ac02-24d8bb7d09e7"> 107236: sql: use txn.NewBatch instead of &kv.Batch{} r=fqazi a=rafiss This will make these requests properly passes along the admission control headers. informs #79212 Epic: None Release note: None 107447: sql: fix CREATE MATERIALIZED VIEW AS schema change job description r=fqazi a=ecwall Fixes #107445 This changes the CREATE MATERIALIZED VIEW AS schema change job description SQL syntax. For example ``` CREATE VIEW "v" AS "SELECT t.id FROM movr.public.t"; ``` becomes ``` CREATE MATERIALIZED VIEW defaultdb.public.v AS SELECT t.id FROM defaultdb.public.t WITH DATA; ``` Release note (bug fix): Fix CREATE MATERIALIZED VIEW AS schema change job description SQL syntax. Co-authored-by: adityamaru <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Evan Wall <[email protected]>

adityamaru requested a review from dt June 22, 2023 20:01

adityamaru requested review from a team as code owners June 22, 2023 20:01

adityamaru requested a review from a team June 22, 2023 20:01

adityamaru requested review from a team as code owners June 22, 2023 20:01

adityamaru force-pushed the bundle-one branch from 4272896 to ec46b6b Compare June 22, 2023 21:33

adityamaru requested a review from a team as a code owner June 22, 2023 21:33

adityamaru mentioned this pull request Jun 24, 2023

job: add job profiler bundles to the job details page #105481

Closed

adityamaru force-pushed the bundle-one branch 2 times, most recently from 682e9f7 to 3b49e79 Compare June 24, 2023 21:17

adityamaru mentioned this pull request Jun 27, 2023

jobsprofiler: dump trace recording on job completion #105624

Merged

dt reviewed Jun 27, 2023

View reviewed changes

pkg/sql/jobs_profiler_bundle.go Outdated Show resolved Hide resolved

adityamaru requested a review from dt June 28, 2023 00:44

adityamaru force-pushed the bundle-one branch from 3b49e79 to d9d8dc9 Compare June 28, 2023 13:27

adityamaru force-pushed the bundle-one branch 2 times, most recently from 1e70b8e to 5cde323 Compare June 28, 2023 22:01

adityamaru changed the title ~~jobsprofiler: introduce collection of job bundles~~ jobsprofiler: enable requesting a job's execution details Jun 28, 2023

adityamaru force-pushed the bundle-one branch from 5cde323 to 517258d Compare June 29, 2023 16:12

adityamaru force-pushed the bundle-one branch from 517258d to 61e684c Compare July 7, 2023 01:28

dt approved these changes Jul 7, 2023

View reviewed changes

adityamaru force-pushed the bundle-one branch from 61e684c to fbf1716 Compare July 11, 2023 15:47

adityamaru force-pushed the bundle-one branch from fbf1716 to 0558434 Compare July 11, 2023 16:17

craig bot merged commit 89d6fdd into cockroachdb:master Jul 11, 2023

adityamaru deleted the bundle-one branch July 11, 2023 19:46

adityamaru mentioned this pull request Jul 11, 2023

sql,server: add endpoint to list a job's execution details #106629

Merged

adityamaru mentioned this pull request Jul 15, 2023

jobs: add table to display execution details #106879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobsprofiler: enable requesting a job's execution details #105384

jobsprofiler: enable requesting a job's execution details #105384

adityamaru commented Jun 22, 2023 •

edited

Loading

cockroach-teamcity commented Jun 22, 2023

dt commented Jun 27, 2023

adityamaru commented Jun 28, 2023 •

edited

Loading

adityamaru commented Jul 5, 2023

dt Jul 7, 2023

adityamaru Jul 11, 2023

adityamaru commented Jul 11, 2023

craig bot commented Jul 11, 2023

jobsprofiler: enable requesting a job's execution details #105384

jobsprofiler: enable requesting a job's execution details #105384

Conversation

adityamaru commented Jun 22, 2023 • edited Loading

cockroach-teamcity commented Jun 22, 2023

dt commented Jun 27, 2023

adityamaru commented Jun 28, 2023 • edited Loading

adityamaru commented Jul 5, 2023

dt Jul 7, 2023

Choose a reason for hiding this comment

adityamaru Jul 11, 2023

Choose a reason for hiding this comment

adityamaru commented Jul 11, 2023

craig bot commented Jul 11, 2023

adityamaru commented Jun 22, 2023 •

edited

Loading

adityamaru commented Jun 28, 2023 •

edited

Loading