-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui,jobs: improve jobs overview page in DBConsole #68179
Comments
cc @sajjadrizvi |
I am still working on adding the exponential backoff that is a pre-requisite for job metrics. I am expecting that it will take another day or two to complete. After that I can start working on adding the required metrics for observability. So I think the ETA is by the end of next week. |
I am thinking about the following. We add three columns, In addition, we add a column that provides insights about each job's lifecycle. A user should be able to see that a job has transitioned from To achieve that, we can add a repeated structure in jobs proto that keeps track of job history:
Each time a job runs This structure can be used to populate a column A question is, how many log entries to show to a user? I think we should show only one log entry by default and provide an option to display N entries in reverse chronological order. |
Can you be more concrete as to when these entries are actually written? I'm having a hard time understanding what happens before Resume/OnFailOrCancel and what happens after in terms of writes to the jobs table. Also, let's isolate somewhat the discussion about what values we write to system.jobs from the implied changes to crdb_internal.jobs. The latter is a function of the former, so they are related. |
Currently a resumer runs in states In In the cases of |
That sounds good. In addition, I think it'd be nice to have the coordinator ID. It'd be good to have an invariant that when we write |
Great, that's a good suggestion.
That's precisely what we want. |
May be not! There is an exception. We update |
Another question is how to add the transition logs in the jobs table? On top of my head, I am thinking to add a column
An alternative is to have a separate table with jobID and four columns for each log entry. I think that's not the best way to go. |
Firstly, when you say jobs table, please indicate which jobs table. I don't think separate rows is going to leave to a happy outcome. My guess is the right answer is to use json. |
OK, I meant, crdb_internal.jobs. Sorry for the confusion. |
Also, I wanted to say that a row has. a string that contains the list of the entries. JSON seems appropriate here. |
This commit adds tranistion_logs columns in crdb_internal.jobs table. Moreover, it adds tests to validate the correctness of values accessed through crdb_internal.jobs. Release note: None Fixes: cockroachdb#68179
This commit adds transition_logs column in crdb_internal.jobs table. Moreover, it adds tests to validate the correctness of values accessed through crdb_internal.jobs. Release note: None Fixes: cockroachdb#68179
Synced with Andrew and for this issue, we'll track adding the 3 metrics (LAST EXECUTION TIME, NEXT EXECUTION TIME, and EXECUTION COUNT to the jobs overview page. For error details, we'll track it with the following issue: #69170. CC'ing @Annebirzin @vy-ton as FYI |
This commit adds transition_logs column in crdb_internal.jobs table. Moreover, it adds tests to validate the correctness of values accessed through crdb_internal.jobs. Release note: None Fixes: cockroachdb#68179
This commit adds transition_logs column in crdb_internal.jobs table. Moreover, it adds tests to validate the correctness of values accessed through crdb_internal.jobs. Release note: None Fixes: cockroachdb#68179
68995: sql: add columns in jobs virtual table for overview in DBConsole r=ajwerner a=sajjadrizvi This commit adds new columns in `crdb_internal.jobs` table, which show the current exponential-backoff state of a job and its execution history. Release justification: This commit adds low-risk updates to new functionality. Jobs subsystem now supports job retries with exponential-backoff. We want to give users more insights about the backoff state of jobs and jobs' lifecycles through additional columns in `crdb_internal.jobs` table. Release note (general change): The functionality to retry failed jobs with exponential-backoff has introduced recently in the system. This commit adds new columns in `crdb_internal.jobs` table, which show the current backoff-state of a job and its execution log. The execution log consists of a sequence of job start and end events and any associated errors that were encountered during the job's each execution. Now users can query internal jobs table to get more insights about jobs through the following columns: (a) `last_run` shows the last execution time of a job, (b) `next_run` shows the next execution time of a job based on exponential-backoff delay, (c) `num_runs` shows the number of times the job has been executed, and (d) `execution_log` provides a set of events that are generated when a job starts and ends its execution. Relates to #68179 69044: storageccl: remove non-ReturnSST ExportRequest r=dt a=dt Release justification: bug fix in new functionality. Release note: none. 69239: roachtest: move roachtest stress CI job instructions to README r=tbg,stevendanna a=erikgrinaker Release justification: non-production code changes Release note: None 69285: roachtest: increase consistency check timeout, and ignore errors r=tbg a=erikgrinaker This bumps the consistency check timeout to 5 minutes. There are indications that a recent libpq upgrade unmasked previously ignored context cancellation errors, caused by the timeout here being too low. It also ignores errors during the consistency check, since it is best-effort anyway. Resolves #68883. Release justification: non-production code changes Release note: None Co-authored-by: Sajjad Rizvi <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>
Filed https://github.com/cockroachdb/ui/issues/395 because the Tooltip component is mis-centered for inline elements (in this case, the status badge), and because the solution is move involved than I had thought. edit: The new design has a pretty full table cell, so I centered the tooltip around the entire cell and so the bug no longer blocks this issue |
@Annebirzin , just want to re-ping you about how we should indicate that a job is running/reverting, but also retrying? I like Marylia's suggestion about the hyphenated statuses, though that raises the question of what to do about the % and timing remaining visualization that would usually show on running jobs. The definition for a job that is retrying is @ajwerner , assuming you have no objections, I'm going to modify the endpoint as Marylia suggested (#68179 (comment)) to send |
@jocrl I wonder if for Does that make sense? |
@Annebirzin That makes sense! For consistency, do you think |
@jocrl I do like the last row where they look like two separate badges. Maybe just a bit of space between them and remove the |
@ajwerner if |
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 [wip] tests are still work in progress. Just wanted to get people's thoughts in the meantime! Only tests are missing This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 [wip] tests are still work in progress. Just wanted to get people's thoughts in the meantime! Only tests are missing This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
72291: ui/db-console: surface more job metrics around reverting and retrying in the DBConsole Jobs Overview page r=jocrl a=jocrl Fixes #68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`. Jobs table: <img width="1602" alt="image" src="https://user-images.githubusercontent.com/91907326/141374430-bfad72de-aa2d-4cbb-98ef-62ddf5f98f4a.png"> Filter and hover: https://user-images.githubusercontent.com/91907326/141375153-2cf2641a-33a1-4bfb-a900-a187dc5579a1.mov Permutations of running jobs with present/absent combinations of time remaining, running message, or retrying: <img width="979" alt="image" src="https://user-images.githubusercontent.com/91907326/141374527-124a86c0-d10d-451f-b8dc-f745d52fe6d4.png"> Co-authored-by: Josephine Lee <[email protected]>
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
… in the DBConsole Jobs Overview page Fixes cockroachdb#68179 This commit surfaces the status `reverting`, annotates existing `running` and `reverting` statuses UI with "retrying" where applicable, and adds the "Last Execution Time (UTC)" and "Execution Count" columns to the jobs overview table in db console. "Retrying" is defined as `status IN ('running', 'reverting') AND next_run > now() AND num_runs > 1`. Hovering a retrying status shows the next execution time. The "Status" column was also moved left to the second column. Filtering using the dropdown by `Status: Running` or `Status: Reverting` will include those that are also "retrying". Users can also filter by `Status: Retrying`. The `/jobs` endpoint was modified to add the `last_run`, `next_run`, and `num_runs` fields required for the UI change. Jobs with status `running` or `reverting` and are also "retrying" have their statuses sent as `retry-running` and `retry-reverting` respectively. The endpoint was also modified to support the value `retrying` for the `status` query parameter. This commit also adds a storybook story for the jobs table, which showcases the different possible statuses in permutations of information that could be present for the `running` status. Release note (ui change): The jobs overview table in DBConsole now shows when jobs have the status "reverting", and shows the badge "retrying" when running or reverting jobs are also retrying. Hovering the status for a "retrying" job will show the "Next execution time" in UTC. Two new columns, "Last Execution Time (UTC)" and "Execution Count", were also added to the jobs overview table in DBConsole, and the "Status" column was moved left to the second column in the table. The `status` query parameter in the `/jobs` endpoint now supports the values `reverting` and `retrying`.
In #44594, Schema is adding a retry mechanism to the jobs infrastructure. During this work, we plan to add additional columns to
crdb_internal.jobs
to surface more job metrics.These new metrics would be helpful in the DBConsole Jobs Overview page.
FYI @ajwerner @sajjadrizvi
Epic CRDB-7912
The text was updated successfully, but these errors were encountered: