-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobsprofiler: enable requesting a job's execution details #105384
Conversation
This change adds a new component to the `Profiler` tab of the job details page that supports collecting and viewing job profiler bundles. The component has a button to collect job profiler bundles. These bundles are then listed in a sorted table with the ability to download each bundle. The above operations are backed by the infrastructure added in cockroachdb#105384. Note, the `Profiler` tab is currently disabled for CC but this change allows for a future project to enable the collection of bundles through the CC console as well. Informs: cockroachdb#105076 Release note (ui change): collect and download job profiler bundles from the `Profiler` tab on the job details page.
682e9f7
to
3b49e79
Compare
Overall looks good to me. One question I had though is if we even need to request/persist/fetch the generated bundle to job_info, or if we could just have the bundle fetch endpoint generate it on the fly since it is generated from job state that is already persisted, isn't it? |
Not all the information in the bundle is going to persisted to job state. For example, active tracing spans of a job or goroutine stacks at the time the bundle was collected. Separating the request/persist from the fetch allows us to download older bundles at a later point in time if we want to see the state of the job at different points in time. https://www.loom.com/share/4d0ff8ffe53b4e09bf8f0de1009c066e?sid=7b1bb121-b8b4-4215-8b5c-12f103f482df is a prototype of how I want the bundles to be listed. When you request a bundle it shows up in the table, it can then be downloaded at any point in the future. |
1e70b8e
to
5cde323
Compare
friendly ping @dt with the updated approach discussed offline |
return errors.Wrapf(err, "failed to compress chunk for file %s", filename) | ||
} | ||
|
||
// On listing we want the info_key of each chunk to sort after the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple nits:
- %d formatted monotonic ints don't sort monotonically
- unixnano isn't using the monotonic clock
I wonder if we should just use a loop counter that starts at zero and goes up, and I wonder if we should give the last chunk a well-known name so that the reader can verify they got all chunks.
I might just say use a loop counter that starts at 0 and then print them with %04d
in MakeProfilerBundleChunkKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice %04d was what I had forgotten about. Changed to a chunk counter, and I prefix the last chunk with _final
.
Also caught a potential txn retry bug where we were mutating data
inside the closure. Now we take a copy and operate on that.
Similar to statement bundles this change introduces the infrastructure to request, collect and read the execution details for a particular job. Right now, the execution details will only contain the latest DSP diagram for a job, but going forward this will give us a place to dump raw files such as: - cluster-wide job traces - cpu profiles - trace-driven aggregated stats - raw payload and progress protos Downloading some or all of these execution details will be exposed in a future patch in all of the places where statement bundles are today: - DBConsole - CLI shell - SQL shell This change introduces a builtin that allows the caller to request the collection and persistence of a job's current execution details. This change also introduces a new endpoint on the status server to read the data corresponding to the execution details persisted for a job. The next set of PRs will add the necessary components to allow downloading the files from the DBConsole. Informs: cockroachdb#105076 Release note: None
TFTR! bors r=dt |
Build succeeded: |
In cockroachdb#105384 we added infrastructure to request and store execution details for a job. This currently only includes the DistSQL diagram generated during a job execution. Going forward this will include several files such as traces, goroutines, profiles etc. This change introduces an endpoint that allows listing all such files that are available for consumption. This list will be displayed on the job details page allowing the user to download any subset of the files collected during job execution. Informs: cockroachdb#105076 Release note: None
In cockroachdb#105384 we added infrastructure to request and store execution details for a job. This currently only includes the DistSQL diagram generated during a job execution. Going forward this will include several files such as traces, goroutines, profiles etc. This change introduces an endpoint that allows listing all such files that are available for consumption. This list will be displayed on the job details page allowing the user to download any subset of the files collected during job execution. Informs: cockroachdb#105076 Release note: None
106629: sql,server: add endpoint to list a job's execution details r=dt a=adityamaru In #105384 we added infrastructure to request and store execution details for a job. This currently only includes the DistSQL diagram generated during a job execution. Going forward this will include several files such as traces, goroutines, profiles etc. This change introduces an endpoint that allows listing all such files that are available for consumption. This list will be displayed on the job details page allowing the user to download any subset of the files collected during job execution. Informs: #105076 Release note: None Co-authored-by: adityamaru <[email protected]>
In cockroachdb#105384 and cockroachdb#106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved obersvability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A future change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. Informs: cockroachdb#105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details
This change teaches the job resumer to fetch and write its trace recording before finishing its tracing span. These traces will be a part of the execution detail files introduced in cockroachdb#105384. These traces will be valuable in understanding a job's execution characteristics during each resumption, even if the job has reached a terminal state. Currently, this behaviour is opt-in and has been enabled for backups, restore, import and physical replication jobs. Informs: cockroachdb#102794 Release note: None
105368: backupccl: add unit tests for FileSSTSink r=rhu713 a=rhu713 Backfill unit tests for the basic functionality of FileSSTSink with additional test cases involving inputs of keys with many entries in its revision history. Epic: CRDB-27758 Release note: None 105624: jobsprofiler: dump trace recording on job completion r=dt a=adityamaru This change teaches the job resumer to fetch and write its trace recording before finishing its tracing span. These traces will be consumed by the job profiler bundle that is being introduced in #105384. These traces will be valuable in understanding a job's execution characteristics during each resumption, even if the job has reached a terminal state. Currently, this behaviour is opt-in and has been enabled for backups, restore, import and physical replication jobs. Informs: #102794 Release note: None 106515: DEPS: bump across etcd-io/raft#81 and disable conf change validation r=erikgrinaker a=tbg We don't want raft to validate conf changes, since that causes issues due to false positives (the check is above raft, but needs to be below raft to always work correctly). We are taking responsibility for carrying out only valid conf changes, as we always have. See also etcd-io/raft#80. Fixes #105797. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Note to editors: this time it's really addressed (fingers crossed); a previous attempt with an identical release note had to be reverted. 106939: changefeedccl: fix flake in TestParquetRows r=miretskiy a=jayshrivastava changefeedccl: fix flake in TestParquetRows Previously, this test would flake when rows were not emitted in the exact order they were inserted/modified. This change makes the test resilient to different ordering. Epic: None Fixes: #106911 Release note: None --- util/parquet: make metadata transparent in tests Previously, users of the library would need to explicitly call `NewWriterWithReaderMetadata()` to configure the parquet writer to add metadata required to use reader utils in `pkg/util/parquet/testutils.go`. This led to a lot of code uncessary code duplication. This moves the logic to decide if metadata should be written to `NewWriter()` so callers do not need to do the extra work. Epic: None Release note: None Co-authored-by: Rui Hu <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]>
In cockroachdb#105384 and cockroachdb#106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved observability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A follow-up change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. This change also renames the `Profiler` tab to `Advanced Debugging` as the users of this tab are going to be internal CRDB support and engineering for the time being. Informs: cockroachdb#105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details
In cockroachdb#105384 and cockroachdb#106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved observability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A follow-up change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. This change also renames the `Profiler` tab to `Advanced Debugging` as the users of this tab are going to be internal CRDB support and engineering for the time being. Informs: cockroachdb#105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details
In cockroachdb#105384 and cockroachdb#106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved observability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A follow-up change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. This change also renames the `Profiler` tab to `Advanced Debugging` as the users of this tab are going to be internal CRDB support and engineering for the time being. Informs: cockroachdb#105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details
In cockroachdb#105384 and cockroachdb#106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved observability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A follow-up change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. This change also renames the `Profiler` tab to `Advanced Debugging` as the users of this tab are going to be internal CRDB support and engineering for the time being. Informs: cockroachdb#105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details
106879: jobs: add table to display execution details r=maryliag a=adityamaru In #105384 and #106629 we added support to collect and list files that had been collected as part of a job's execution details. These files are meant to provide improved obersvability into the state of a job. This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details. A future change will add support to request these files, sort them and download them from the job details page. This page is not available on the Cloud Console as it is meant for advanced debugging. Informs: #105076 Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details <img width="1505" alt="Screenshot 2023-07-18 at 2 26 50 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/aebe18a6-9c25-4c9a-ad7c-a94e2e4c97ff"> <img width="1510" alt="Screenshot 2023-07-18 at 2 27 03 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/da9b3a21-8dc6-47ca-ac02-24d8bb7d09e7"> 107236: sql: use txn.NewBatch instead of &kv.Batch{} r=fqazi a=rafiss This will make these requests properly passes along the admission control headers. informs #79212 Epic: None Release note: None 107447: sql: fix CREATE MATERIALIZED VIEW AS schema change job description r=fqazi a=ecwall Fixes #107445 This changes the CREATE MATERIALIZED VIEW AS schema change job description SQL syntax. For example ``` CREATE VIEW "v" AS "SELECT t.id FROM movr.public.t"; ``` becomes ``` CREATE MATERIALIZED VIEW defaultdb.public.v AS SELECT t.id FROM defaultdb.public.t WITH DATA; ``` Release note (bug fix): Fix CREATE MATERIALIZED VIEW AS schema change job description SQL syntax. Co-authored-by: adityamaru <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Evan Wall <[email protected]>
Similar to statement bundles this change introduces the
infrastructure to request, collect and read the execution
details for a particular job.
Right now, the execution details will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:
Downloading some or all of these execution details will be
exposed in a future patch in all of the places where
statement bundles are today:
This change introduces a builtin that allows the caller
to request the collection and persistence of a job's
current execution details.
This change also introduces a new endpoint on the status
server to read the data corresponding to the execution details
persisted for a job. The next set of
PRs will add the necessary components to allow downloading
the files from the DBConsole.
Informs: #105076
Release note: None