Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: admin server gives scheduled GC job "Retrying" status #95712

Closed
ericharmeling opened this issue Jan 24, 2023 · 3 comments · Fixed by #97505
Closed

sql: admin server gives scheduled GC job "Retrying" status #95712

ericharmeling opened this issue Jan 24, 2023 · 3 comments · Fixed by #97505
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@ericharmeling
Copy link
Contributor

ericharmeling commented Jan 24, 2023

The query behind the jobs endpoint of the admin server assigns a retrying status to all jobs that meet the following condition:
status='running' AND next_run > now() AND num_runs > 1
or
status='reverting' AND next_run > now() AND num_runs > 1

We need to either change the logic defining the retry*Condition here, or, further down, not assign a status of running to jobs that are scheduled but have not yet started.

Background thread: https://cockroachlabs.slack.com/archives/CUVEFUU3C/p1674507323054389

Jira issue: CRDB-23683

@ericharmeling ericharmeling added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-observability labels Jan 24, 2023
@maryliag
Copy link
Contributor

Seems like other filters are having the same issue: https://cockroachlabs.slack.com/archives/C0159JK877C/p1674533214747149

@ericharmeling ericharmeling changed the title sql,ui: scheduled GC job shows up with "Retrying" status on Jobs Page, "Retrying" filter broken sql: admin server gives scheduled GC job "Retrying" status Jan 24, 2023
@xinhaoz xinhaoz self-assigned this Feb 15, 2023
@xinhaoz
Copy link
Member

xinhaoz commented Feb 21, 2023

@kevin-v-ngo What should be the suggested fix here in terms of what status to show? Are we just removing the forced 'retry' status mentioned above altogether (i.e. just show the status from the jobs page without interpretation).

@kevin-v-ngo
Copy link

I think it's odd that we aren't consistent with the internal table and SHOW command but I understand we have this behavior because we tried to be more specific into this "running" state.

It seems like this is more of a sub-status of "running" that we're trying to convey but it caused more confusion because of this inconsistency. In the near term for this issue, let's be consistent and remove this retry status.

Medium-longer term let's track adding in the UI that this 'running' state is retrying and then reflect this in the internal table as well.

xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Feb 22, 2023
This commit removes the 'Retrying' status from the jobs UX.
Previously, we were interpolating this status from the running
status. This just added confusion and incorectness to the status
of the job being displayed. The status being surfaced now aligns
directly with what is shown in the `crdb_internal.jobs` table.

Some missing job statuses were also added as request options to
the 'Status' dropdown, including:
- Pause Requested
- Cancel Requested
- Revert Failed

Fixes: cockroachdb#95712

Release note (ui change): Retrying is no longer a status shown
in the jobs page.
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Feb 27, 2023
This commit removes the 'Retrying' status from the jobs UX.
Previously, we were interpolating this status from the running
status. This just added confusion and incorectness to the status
of the job being displayed. The status being surfaced now aligns
directly with what is shown in the `crdb_internal.jobs` table.

Some missing job statuses were also added as request options to
the 'Status' dropdown, including:
- Pause Requested
- Cancel Requested
- Revert Failed

Fixes: cockroachdb#95712

Release note (ui change): Retrying is no longer a status shown
in the jobs page.
craig bot pushed a commit that referenced this issue Feb 27, 2023
97465: c2c: gather perf metrics from prometheus r=stevendanna a=msbutler

c2c roachtest performance metrics are now gathered by a prom/grafana instance running locally on the roachprod cluster. This change allows us to gather and process any metrics exposed to the crdb prom endpoint. Specifically, we now gather: `capacity_used`, `replication_logical_bytes`, `replication_sst_bytes` at various points during the c2c roachtest, allowing us to measure:
- Initial Scan Throughput: initial scan size / initial scan duration
- Workload Throughput: data ingested during workload / workload duration
- Cutover Throughput: (data ingested between cutover time and cutover cmd) / (cutover process duration)

where the size of these operations can be measured as either physical replicated bytes, logical ingested bytes, or physical ingested bytes on the source cluster.

This patch also fixes a recent bug which mislabeled src cluster throughput as initial scan throughput.

Informs #89176

Release note: None

97505: server, ui: remove interpreted jobs retrying status  r=xinhaoz a=xinhaoz

This commit removes the 'Retrying' status from the jobs UX.
Previously, we were interpolating this status from the running
status. This just added confusion and incorectness to the status
of the job being displayed. The status being surfaced now aligns
directly with what is shown in the `crdb_internal.jobs` table.

Some missing job statuses were also added as request options to
the 'Status' dropdown, including:
- Pause Requested
- Cancel Requested
- Revert Failed

Fixes: #95712

Release note (ui change): Retrying is no longer a status shown
in the jobs page.

<img width="1326" alt="image" src="https://user-images.githubusercontent.com/20136951/220738075-733b0cc8-9f77-4ace-a944-3791ff159c62.png">


Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
@craig craig bot closed this as completed in 2eca521 Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants