sql: admin server gives scheduled GC job "Retrying" status #95712

ericharmeling · 2023-01-24T02:08:19Z

The query behind the jobs endpoint of the admin server assigns a retrying status to all jobs that meet the following condition:
status='running' AND next_run > now() AND num_runs > 1
or
status='reverting' AND next_run > now() AND num_runs > 1

We need to either change the logic defining the retry*Condition here, or, further down, not assign a status of running to jobs that are scheduled but have not yet started.

Background thread: https://cockroachlabs.slack.com/archives/CUVEFUU3C/p1674507323054389

Jira issue: CRDB-23683

The text was updated successfully, but these errors were encountered:

maryliag · 2023-01-24T14:33:30Z

Seems like other filters are having the same issue: https://cockroachlabs.slack.com/archives/C0159JK877C/p1674533214747149

xinhaoz · 2023-02-21T15:03:50Z

@kevin-v-ngo What should be the suggested fix here in terms of what status to show? Are we just removing the forced 'retry' status mentioned above altogether (i.e. just show the status from the jobs page without interpretation).

kevin-v-ngo · 2023-02-21T22:45:14Z

I think it's odd that we aren't consistent with the internal table and SHOW command but I understand we have this behavior because we tried to be more specific into this "running" state.

It seems like this is more of a sub-status of "running" that we're trying to convey but it caused more confusion because of this inconsistency. In the near term for this issue, let's be consistent and remove this retry status.

Medium-longer term let's track adding in the UI that this 'running' state is retrying and then reflect this in the internal table as well.

This commit removes the 'Retrying' status from the jobs UX. Previously, we were interpolating this status from the running status. This just added confusion and incorectness to the status of the job being displayed. The status being surfaced now aligns directly with what is shown in the `crdb_internal.jobs` table. Some missing job statuses were also added as request options to the 'Status' dropdown, including: - Pause Requested - Cancel Requested - Revert Failed Fixes: cockroachdb#95712 Release note (ui change): Retrying is no longer a status shown in the jobs page.

97465: c2c: gather perf metrics from prometheus r=stevendanna a=msbutler c2c roachtest performance metrics are now gathered by a prom/grafana instance running locally on the roachprod cluster. This change allows us to gather and process any metrics exposed to the crdb prom endpoint. Specifically, we now gather: `capacity_used`, `replication_logical_bytes`, `replication_sst_bytes` at various points during the c2c roachtest, allowing us to measure: - Initial Scan Throughput: initial scan size / initial scan duration - Workload Throughput: data ingested during workload / workload duration - Cutover Throughput: (data ingested between cutover time and cutover cmd) / (cutover process duration) where the size of these operations can be measured as either physical replicated bytes, logical ingested bytes, or physical ingested bytes on the source cluster. This patch also fixes a recent bug which mislabeled src cluster throughput as initial scan throughput. Informs #89176 Release note: None 97505: server, ui: remove interpreted jobs retrying status r=xinhaoz a=xinhaoz This commit removes the 'Retrying' status from the jobs UX. Previously, we were interpolating this status from the running status. This just added confusion and incorectness to the status of the job being displayed. The status being surfaced now aligns directly with what is shown in the `crdb_internal.jobs` table. Some missing job statuses were also added as request options to the 'Status' dropdown, including: - Pause Requested - Cancel Requested - Revert Failed Fixes: #95712 Release note (ui change): Retrying is no longer a status shown in the jobs page. <img width="1326" alt="image" src="https://user-images.githubusercontent.com/20136951/220738075-733b0cc8-9f77-4ace-a944-3791ff159c62.png"> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]>

ericharmeling added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-observability labels Jan 24, 2023

ericharmeling changed the title ~~sql,ui: scheduled GC job shows up with "Retrying" status on Jobs Page, "Retrying" filter broken~~ sql: admin server gives scheduled GC job "Retrying" status Jan 24, 2023

xinhaoz self-assigned this Feb 15, 2023

kevin-v-ngo mentioned this issue Feb 21, 2023

Surface jobs that are "retrying" in the Jobs page UI #97426

Closed

xinhaoz mentioned this issue Feb 22, 2023

server, ui: remove interpreted jobs retrying status #97505

Merged

craig bot closed this as completed in 2eca521 Feb 27, 2023

cockroach-teamcity mentioned this issue Feb 28, 2023

PR #97505 - server, ui: remove interpreted jobs retrying status cockroachdb/docs#16376

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: admin server gives scheduled GC job "Retrying" status #95712

sql: admin server gives scheduled GC job "Retrying" status #95712

ericharmeling commented Jan 24, 2023 •

edited

Loading

maryliag commented Jan 24, 2023

xinhaoz commented Feb 21, 2023

kevin-v-ngo commented Feb 21, 2023

sql: admin server gives scheduled GC job "Retrying" status #95712

sql: admin server gives scheduled GC job "Retrying" status #95712

Comments

ericharmeling commented Jan 24, 2023 • edited Loading

maryliag commented Jan 24, 2023

xinhaoz commented Feb 21, 2023

kevin-v-ngo commented Feb 21, 2023

ericharmeling commented Jan 24, 2023 •

edited

Loading