Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: surface job error details in the DB console #69170

Closed
kevin-v-ngo opened this issue Aug 19, 2021 · 4 comments · May be fixed by #81180
Closed

jobs: surface job error details in the DB console #69170

kevin-v-ngo opened this issue Aug 19, 2021 · 4 comments · May be fixed by #81180
Labels
A-sql-console-general SQL Observability issues on the DB console spanning multiple areas. Includes Cockroach Cloud Console C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-jobs

Comments

@kevin-v-ngo
Copy link

kevin-v-ngo commented Aug 19, 2021

We should surface job error details in the console to provide observability as to why jobs are degraded or running slower than expected.

Related to the following issue: #68179

We'd potentially need an Error list per job. The list would have the following columns:

Time of error (end_ts)
Error message (error_message)

Jira issue: CRDB-9476

Epic CRDB-20388

@kevin-v-ngo kevin-v-ngo added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-observability A-sql-console-general SQL Observability issues on the DB console spanning multiple areas. Includes Cockroach Cloud Console labels Aug 19, 2021
@ajwerner ajwerner changed the title Surface job error details in the DB console jobs: surface job error details in the DB console Aug 31, 2021
@Annebirzin
Copy link

@kevin-v-ngo @maryliag I've included designs for adding job errors to the jobs detail page here: https://www.figma.com/file/c6BNEn0mY6iHG1t5NhoH3E/20.1_obsrv_db-console-jobs?node-id=2011%3A9921

Let me know any thoughts

ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 27, 2022
This commit adds the structured information about job execution to the
crdb_internal table in addition to the flattened string array. It then
leverages this information in the admin server API.

Informs cockroachdb#69170

Release note (sql change): The crdb_internal.jobs table now has a new column
`execution_events` which is a structured json form of `execution_errors`.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 27, 2022
This commit adds the structured information about job execution to the
crdb_internal table in addition to the flattened string array. It then
leverages this information in the admin server API.

Informs cockroachdb#69170

As an example of crdb_internal.jobs for a retrying reverting job, you'll
see:

```
> select json_array_elements(execution_events) from crdb_internal.jobs where job_id = 731484911676784641;
                                                                                                                                             json_array_elements
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  {"executionEndMicros": "1643301928919757", "executionStartMicros": "1643301927541336", "instanceId": 1, "status": "reverting", "truncatedError": "scan with start key /Table/54/1/13110: root: memory budget exceeded: 532480 bytes requested, 133939200 currently allocated, 134217728 bytes in budget"}
  {"executionEndMicros": "1643302029260612", "executionStartMicros": "1643302028024034", "instanceId": 1, "status": "reverting", "truncatedError": "scan with start key /Table/54/1/13110: root: memory budget exceeded: 532480 bytes requested, 133867520 currently allocated, 134217728 bytes in budget"}
  {"executionEndMicros": "1643302255024877", "executionStartMicros": "1643302253353781", "instanceId": 1, "status": "reverting", "truncatedError": "scan with start key /Table/54/1/13110: root: memory budget exceeded: 532480 bytes requested, 133898240 currently allocated, 134217728 bytes in budget"}
```

The setup to make this example was to run a cluster with a small
`--max-sql-memory=128MiB` and then create a table:

```sql
create table foo (i int primary key, j int default (unique_rowid()) unique, b string);
insert into foo(i, b) select i, repeat('a', 1<<12) from generate_series(1, 100000) as t(i);
set sql_safe_updates=false;
alter table foo add column bb string as (i::string || b) stored unique, drop column j; -- retries forever
```

Release note (sql change): The crdb_internal.jobs table now has a new column
`execution_events` which is a structured json form of `execution_errors`.
craig bot pushed a commit that referenced this issue Jan 29, 2022
74443: bench/tpcc: add a go benchmark for tpcc queries r=ajwerner a=ajwerner

This commit introduces a package for running go benchmarks for tpcc queries
against tpcc data. I envisioned some more advanced combinations but for now
this still seems pretty handy. @nvanbenschoten has been fighting the good fight
of micro-optimizations against comparatively simple benchmarks, I presume,
because the iteration cycle is tight and the tooling is good.

This benchmark suite seeks to also provide a tight iteration cycle. It does, by
default, load TPCC data into cockroach once per `go test` invocation, but it
also has flags to allow re-use of a previously generated store directory.
The benchmark also runs the client portion of the workload out-of-process
in order to prevent it from mucking with the `--benchmem` numbers and profiles.

In order to run it to generate a new store directory in a stable location, you
use the following incantation:

```bash
go test -v ./pkg/bench/tpcc --benchmem --bench . --store-dir=/tmp/benchtpcc --generate-store-dir
```

Subsequent runs can then be run with:

```bash
go test -v ./pkg/bench/tpcc --benchmem --bench . --store-dir=/tmp/benchtpcc
```

A `--count=10` run provided stable results, which is encouraging that this will
be a useful optimization target. Did I mention we allocate a lot?

```
name                                                                      time/op
TPCC/mix=newOrder=1-16                                                    7.75ms ± 2%
TPCC/mix=payment=1-16                                                     4.29ms ± 6%
TPCC/mix=orderStatus=1-16                                                 3.36ms ± 3%
TPCC/mix=delivery=1-16                                                    39.7ms ± 3%
TPCC/mix=stockLevel=1-16                                                  2.84ms ± 7%
TPCC/mix=newOrder=10,payment=10,orderStatus=1,delivery=1,stockLevel=1-16  7.66ms ± 5%

name                                                                      alloc/op
TPCC/mix=newOrder=1-16                                                    2.23MB ± 3%
TPCC/mix=payment=1-16                                                      746kB ± 1%
TPCC/mix=orderStatus=1-16                                                 1.83MB ± 5%
TPCC/mix=delivery=1-16                                                    8.85MB ± 1%
TPCC/mix=stockLevel=1-16                                                   602kB ± 0%
TPCC/mix=newOrder=10,payment=10,orderStatus=1,delivery=1,stockLevel=1-16  1.82MB ± 2%

name                                                                      allocs/op
TPCC/mix=newOrder=1-16                                                     15.8k ± 3%
TPCC/mix=payment=1-16                                                      5.07k ± 0%
TPCC/mix=orderStatus=1-16                                                  3.61k ± 4%
TPCC/mix=delivery=1-16                                                     84.0k ± 0%
TPCC/mix=stockLevel=1-16                                                   3.69k ± 1%
TPCC/mix=newOrder=10,payment=10,orderStatus=1,delivery=1,stockLevel=1-16   13.2k ± 1%
```

One somewhat painful thing is that the cloning the store ends up being most of
the allocations. I know there's some tricks to have a base for the heap profile
but I haven't started to make that pleasant.

Release note: None

74628: kv: re-enable time-bound iterators for RefreshRange request r=sumeerbhola a=nvanbenschoten

Closes #53348. The other two requests mentioned in that issue
(`ResolveIntentRange` and `EndTxn`) would no longer benefit from time-bound
iterators because, thanks to b5213fd, they no longer scan the MVCC keyspace.

Transaction refreshing is a form of an optimistic concurrency control validation
phase. Before a transaction can commit, if it will be committing at a timestamp
higher than its original timestamp, it issues point and ranged refresh requests
to the key spans it had previous read. The refresh requests scan a span of keys
and determine whether any new values have been written since the transaction
originally read the keys.

The use of time-bound iterators is an important optimization for ranged refresh
operations because we expect very few new writes between the time that a
transaction originally reads and the time that it refreshes.

Without this optimization, each refresh was redoing all of a transaction's reads
at their original cost. This effectively doubled the cost of reads for
transactions that had to refresh (or worse for those that refreshed multiple
times). With this optimization, refreshing a span of keys is expected to be
significantly cheaper than the original scan over that span of keys, because it
can ignore most files in the lower levels of the LSM.

RefreshRange requests were originally built to use time-bound iterators.
However, this optimization was disabled in 1eb3b2a due to concerns about
correctness. Since then, then correctness concerns have been addressed and
we have begun using time-bound iterators in a handful of places.

This commit re-enables time-bound iterators for `RefreshRange` requests. It does
so by using `MVCCIncrementalIterator`, which was enhanced to support additional
"intent policies" in 87c7f11. This commit uses the "emit" intent policy so that
`RefreshRange` will observe all values and all intents in the refresh time
window.

----

### Microbenchmarks:
```
name                                                      old time/op    new time/op    delta
RefreshRange/linear-keys/refresh_window=[95.00,99.00]-10     230ms ± 1%       0ms ± 1%   -99.99%  (p=0.000 n=9+9)
RefreshRange/linear-keys/refresh_window=[75.00,99.00]-10     185ms ± 1%       0ms ± 1%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,95.00]-10     185ms ± 1%       0ms ± 1%   -99.99%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[50.00,75.00]-10     123ms ± 1%       0ms ± 2%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,95.00]-10     123ms ± 1%       0ms ± 2%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,99.00]-10     123ms ± 0%       0ms ± 1%   -99.99%  (p=0.000 n=10+8)
RefreshRange/linear-keys/refresh_window=[99.00,99.00]-10     240ms ± 1%       0ms ± 1%   -99.96%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[95.00,95.00]-10     237ms ± 1%       0ms ± 2%   -99.96%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,75.00]-10     224ms ± 0%       0ms ± 1%   -99.95%  (p=0.000 n=9+9)
RefreshRange/linear-keys/refresh_window=[50.00,50.00]-10     207ms ± 1%       0ms ± 3%   -99.95%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,0.00]-10       174ms ± 1%       0ms ± 1%   -99.93%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,0.00]-10       189ms ± 1%       0ms ± 1%   -99.86%  (p=0.000 n=9+9)
RefreshRange/mixed-case/refresh_window=[0.00,0.00]-10        184ms ± 0%       0ms ± 0%   -99.85%  (p=0.000 n=8+9)
RefreshRange/mixed-case/refresh_window=[95.00,95.00]-10      252ms ± 0%       1ms ± 2%   -99.70%  (p=0.000 n=8+10)
RefreshRange/random-keys/refresh_window=[0.00,50.00]-10      412µs ± 1%      13µs ± 2%   -96.83%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,50.00]-10       413µs ± 1%      13µs ± 1%   -96.78%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,75.00]-10       292µs ± 1%      13µs ± 1%   -95.43%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,95.00]-10      245µs ± 1%      13µs ± 2%   -94.64%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,95.00]-10     245µs ± 0%      14µs ± 1%   -94.49%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[0.00,99.00]-10      238µs ± 1%      13µs ± 1%   -94.48%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,99.00]-10      237µs ± 1%      13µs ± 2%   -94.42%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,99.00]-10     237µs ± 2%      14µs ± 1%   -94.29%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[50.00,75.00]-10      292µs ± 1%      17µs ± 1%   -94.07%  (p=0.000 n=10+9)
RefreshRange/linear-keys/refresh_window=[0.00,75.00]-10      225µs ± 2%      14µs ± 1%   -94.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,99.00]-10      224µs ± 1%      13µs ± 1%   -93.99%  (p=0.000 n=10+8)
RefreshRange/linear-keys/refresh_window=[0.00,95.00]-10      224µs ± 1%      14µs ± 1%   -93.95%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,95.00]-10     244µs ± 0%      15µs ± 1%   -93.86%  (p=0.000 n=7+10)
RefreshRange/mixed-case/refresh_window=[0.00,99.00]-10       237µs ± 1%      15µs ± 1%   -93.82%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,50.00]-10      224µs ± 1%      14µs ± 1%   -93.76%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,95.00]-10       244µs ± 1%      15µs ± 1%   -93.75%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,99.00]-10     238µs ± 1%      15µs ± 1%   -93.72%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,99.00]-10      236µs ± 1%      15µs ± 2%   -93.64%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,99.00]-10      236µs ± 1%      15µs ± 1%   -93.63%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[50.00,95.00]-10      244µs ± 0%      16µs ± 1%   -93.58%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[75.00,95.00]-10      244µs ± 1%      16µs ± 0%   -93.55%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[0.00,75.00]-10      287µs ± 1%      19µs ± 1%   -93.20%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[95.00,99.00]-10     237µs ± 1%      17µs ± 1%   -92.69%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,75.00]-10     288µs ± 2%      23µs ± 1%   -91.98%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[99.00,99.00]-10      255ms ± 1%     122ms ± 1%   -52.02%  (p=0.000 n=9+9)
RefreshRange/random-keys/refresh_window=[75.00,75.00]-10     242ms ± 1%     152ms ± 1%   -37.02%  (p=0.000 n=10+9)
RefreshRange/random-keys/refresh_window=[99.00,99.00]-10     259ms ± 0%     354ms ± 1%   +36.73%  (p=0.000 n=7+9)
RefreshRange/random-keys/refresh_window=[95.00,95.00]-10     256ms ± 1%     353ms ± 1%   +37.65%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[75.00,75.00]-10      242ms ± 0%     398ms ± 1%   +64.38%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[50.00,50.00]-10      227ms ± 0%     392ms ± 1%   +72.65%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[50.00,50.00]-10     229ms ± 1%     512ms ± 1%  +123.45%  (p=0.000 n=9+9)

name                                                      old alloc/op   new alloc/op   delta
RefreshRange/linear-keys/refresh_window=[99.00,99.00]-10     195MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[95.00,95.00]-10     188MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,95.00]-10      188MB ± 0%       0MB ± 1%  -100.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,75.00]-10     148MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,50.00]-10    98.8MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[99.00,99.00]-10      195MB ± 0%       0MB ± 3%  -100.00%  (p=0.000 n=9+8)
RefreshRange/linear-keys/refresh_window=[95.00,99.00]-10     188MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,95.00]-10     148MB ± 0%       0MB ± 0%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,99.00]-10     148MB ± 0%       0MB ± 0%   -99.99%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[50.00,75.00]-10    99.0MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[50.00,95.00]-10    99.0MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=8+9)
RefreshRange/linear-keys/refresh_window=[50.00,99.00]-10    99.0MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,75.00]-10      148MB ± 0%       0MB ±29%   -99.99%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,75.00]-10     148MB ± 0%       0MB ± 6%   -99.98%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[50.00,50.00]-10     98.7MB ± 0%     0.0MB ± 1%   -99.98%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[99.00,99.00]-10     195MB ± 0%       0MB ± 4%   -99.97%  (p=0.000 n=9+8)
RefreshRange/random-keys/refresh_window=[95.00,95.00]-10     188MB ± 0%       0MB ± 3%   -99.97%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[50.00,50.00]-10    98.7MB ± 0%     0.1MB ±12%   -99.92%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,0.00]-10      41.9kB ± 5%     1.1kB ± 0%   -97.27%  (p=0.000 n=9+9)
RefreshRange/linear-keys/refresh_window=[0.00,50.00]-10      208kB ± 0%       8kB ± 0%   -96.04%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[95.00,99.00]-10     208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[50.00,75.00]-10      208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=8+10)
RefreshRange/mixed-case/refresh_window=[0.00,75.00]-10       208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[0.00,50.00]-10       207kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[0.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,50.00]-10      207kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,75.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,99.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,95.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[75.00,99.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,95.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,75.00]-10      207kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,75.00]-10     207kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[0.00,99.00]-10       208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,95.00]-10       208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[75.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,0.00]-10      29.3kB ± 4%     1.2kB ± 0%   -95.95%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[0.00,0.00]-10       9.07kB ±25%    1.13kB ± 0%   -87.51%  (p=0.000 n=10+10)

name                                                      old allocs/op  new allocs/op  delta
RefreshRange/linear-keys/refresh_window=[99.00,99.00]-10     18.8k ± 0%      0.0k ± 0%   -99.91%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[95.00,95.00]-10     18.1k ± 0%      0.0k ± 0%   -99.91%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,95.00]-10      17.5k ± 0%      0.0k ± 0%   -99.90%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[75.00,75.00]-10     14.4k ± 0%      0.0k ± 0%   -99.88%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,50.00]-10     9.81k ± 0%     0.02k ± 0%   -99.83%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[95.00,99.00]-10     18.2k ± 0%      0.1k ± 0%   -99.69%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[75.00,95.00]-10     14.4k ± 0%      0.1k ± 0%   -99.61%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,99.00]-10     14.4k ± 0%      0.1k ± 0%   -99.61%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[99.00,99.00]-10      18.2k ± 0%      0.1k ± 2%   -99.51%  (p=0.000 n=10+8)
RefreshRange/linear-keys/refresh_window=[50.00,95.00]-10     9.62k ± 0%     0.06k ± 0%   -99.40%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,75.00]-10     9.62k ± 0%     0.06k ± 0%   -99.40%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,99.00]-10     9.62k ± 0%     0.06k ± 0%   -99.40%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,75.00]-10      13.8k ± 0%      0.2k ±28%   -98.85%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[50.00,50.00]-10      9.25k ± 0%     0.17k ± 6%   -98.11%  (p=0.000 n=10+9)
RefreshRange/random-keys/refresh_window=[75.00,75.00]-10     14.2k ± 0%      0.3k ± 8%   -97.89%  (p=0.000 n=10+9)
RefreshRange/linear-keys/refresh_window=[0.00,0.00]-10         535 ± 3%        17 ± 0%   -96.82%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[99.00,99.00]-10     18.6k ± 0%      0.8k ± 3%   -95.56%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[95.00,95.00]-10     17.9k ± 0%      0.8k ± 2%   -95.34%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[0.00,0.00]-10         351 ± 3%        18 ± 0%   -94.88%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[50.00,50.00]-10     9.59k ± 0%     0.95k ±12%   -90.05%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[0.00,0.00]-10         73.7 ±10%      17.0 ± 0%   -76.93%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,50.00]-10       80.0 ± 0%      56.0 ± 0%   -30.00%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,50.00]-10       79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[95.00,99.00]-10      79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,50.00]-10        79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,75.00]-10        79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,75.00]-10       79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,99.00]-10       79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,75.00]-10       80.0 ± 0%      58.0 ± 0%   -27.50%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[0.00,95.00]-10       80.0 ± 0%      58.0 ± 0%   -27.50%  (p=0.002 n=8+10)
RefreshRange/linear-keys/refresh_window=[0.00,99.00]-10       79.7 ± 1%      58.0 ± 0%   -27.23%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,75.00]-10       79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,95.00]-10       79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,99.00]-10       79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,75.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,95.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,99.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,95.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,99.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,95.00]-10        79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,99.00]-10        79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,95.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,99.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[75.00,95.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[75.00,99.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
```

----

Release note (performance improvement): transaction read refresh operations
performed during optimistic concurrency control's validation phase now use a
time-bound file filter when scanning the LSM tree. This allows these operations
to avoid scanning files that contain no keys written since the transaction
originally performed its reads.

74840: sql: assign sequence ownership created when create/alter table r=ajwerner a=chengxiong-ruan

  Fixes #69298

  Release note (bug fix): A sequence is created when a column is
  defined as `SERIAL` type and the `serial_normalization` session
  variable is set to `sql_sequence`. In this case, the sequence is
  owned by the column and the table where the column exists. The
  sequence should be dropped when the owner table/column is dropped,
  which is the postgres behavior. However, cockroach never set
  ownership information correctly but only dependecy relationship.
  So the sequence stays even the owner table/column does not exist
  anymore. This fix assigns correct ownership information to sequence
  descriptor and column descriptor so that cockroach can align with
  postgres behavior.


75556: jobs,sql,server: add structured job execution failure log to job info r=ajwerner a=ajwerner

This commit adds the structured information about job execution to the
crdb_internal table in addition to the flattened string array. It then
leverages this information in the admin server API.

Informs #69170

As an example of crdb_internal.jobs for a retrying reverting job, you'll
see:

```
> select json_array_elements(execution_events) from crdb_internal.jobs where job_id = 731484911676784641;
                                                                                                                                             json_array_elements
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  {"executionEndMicros": "1643301928919757", "executionStartMicros": "1643301927541336", "instanceId": 1, "status": "reverting", "truncatedError": "scan with start key /Table/54/1/13110: root: memory budget exceeded: 532480 bytes requested, 133939200 currently allocated, 134217728 bytes in budget"}
  {"executionEndMicros": "1643302029260612", "executionStartMicros": "1643302028024034", "instanceId": 1, "status": "reverting", "truncatedError": "scan with start key /Table/54/1/13110: root: memory budget exceeded: 532480 bytes requested, 133867520 currently allocated, 134217728 bytes in budget"}
  {"executionEndMicros": "1643302255024877", "executionStartMicros": "1643302253353781", "instanceId": 1, "status": "reverting", "truncatedError": "scan with start key /Table/54/1/13110: root: memory budget exceeded: 532480 bytes requested, 133898240 currently allocated, 134217728 bytes in budget"}
```

The setup to make this example was to run a cluster with a small
`--max-sql-memory=128MiB` and then create a table:

```sql
create table foo (i int primary key, j int default (unique_rowid()) unique, b string);
insert into foo(i, b) select i, repeat('a', 1<<12) from generate_series(1, 100000) as t(i);
set sql_safe_updates=false;
alter table foo add column bb string as (i::string || b) stored unique, drop column j; -- retries forever
```

Release note (sql change): The crdb_internal.jobs table now has a new column
`execution_events` which is a structured json form of `execution_errors`.

75577: Makefile,dev: include `TestTenantLogic` in `testlogic` r=ajwerner a=ajwerner

We didn't run these before when you ran `testlogic` and that's a bummer.

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
jocrl added a commit to jocrl/cockroach that referenced this issue May 10, 2022
Todo:
- Blocked on understanding how the endpoint fields are populated in a real use
  case. See comment in thread.

Fixes cockroachdb#69170

This commit adds a "Job errors" table to the Job Details page.

Currently this draft implementation shows both the message in the `error`
(string) field with the job start and end time as error start and end times,
and the errors in the `execution_failures` field.

Separately, this commit also
- Improves job and table stories
- Renames the active stmts/txn prop type to distinguish those types from their
  respective components

Release note (ui): This commit adds a "Job errors" table to the Job Details
page.
@kevin-v-ngo
Copy link
Author

Should this issue be closed?

@ericharmeling
Copy link
Contributor

@kevin-v-ngo Up to you. AFAICT, we still don't surface per-job error details or time of error on the jobs page.

@maryliag
Copy link
Contributor

It's still a valid issue. We want to show more information about jobs errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-console-general SQL Observability issues on the DB console spanning multiple areas. Includes Cockroach Cloud Console C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-jobs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants