Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outliers: move processing off the hot path #81021

Closed
matthewtodd opened this issue May 5, 2022 · 0 comments · Fixed by #85350
Closed

outliers: move processing off the hot path #81021

matthewtodd opened this issue May 5, 2022 · 0 comments · Fixed by #85350
Assignees
Labels
A-sql-observability Related to observability of the SQL layer C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@matthewtodd
Copy link
Contributor

matthewtodd commented May 5, 2022

Epic: CRDB-13544

Jira issue: CRDB-15395

@matthewtodd matthewtodd added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-observability Related to observability of the SQL layer T-sql-observability labels May 5, 2022
@matthewtodd matthewtodd self-assigned this May 5, 2022
@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
craig bot pushed a commit that referenced this issue Jul 5, 2022
82473: outliers: latency quantile detector r=matthewtodd a=matthewtodd

Closes #79451.

Note that this detector comes disabled by default. To enable, use the
`sql.stats.outliers.experimental.latency_quantile_detection.enabled`
cluster setting.

This new detection "algorithm" is a simple, perhaps naïve, heuristic,
though it may be good enough to be useful: we consider a statement an
outlier if its execution latency is > p99 latency for its fingerprint,
so long as it's also greater than twice the median latency and greater
than an arbitrary user-defined threshold.[^1]

We expect the operation of this detector to be memory constrained. To
avoid OOM errors, we offer a cluster setting[^2] to cap the overall
memory used, along with metrics reporting the number of fingerprints
being tracked, memory usage, and "evictions," or fingerprint histories
being dropped due to memory pressure. If a high rate of evictions is
observed, it will be necessary to raise either one or both of the memory
limit and the "interesting" threshold settings to regain a steady state.

Note also that this detector is not designed to be used concurrently by
multiple goroutines. Correctness currently relies on the global lock in
`outliers.Registry`, to be addressed next in #81021.

[^1]: This threshold is defined by a cluster setting,
`sql.stats.outliers.experimental.latency_quantile_detection.interesting_threshold`.
Without such a threshold, we would otherwise flag a statement that, say,
normally executes in 3ms taking 7ms; but 7ms is probably still fast
enough that there's nothing to be done.

[^2]: `sql.stats.outliers.experimental.latency_quantile_detection.memory_limit`

Release note: None

82741: obsservice: ui handler serves db console assets  r=sjbarag,abarganier a=dhartunian

This change enables the Observabilty Service to serve the DB Console UI
itself. Endpoints that CRDB can handle are proxied (`/_admin/`,
`/_status/`, `/ts/`) but the base URL will return a blank HTML page with
JS assets that load the DB Console.

In order to build with the ui code, the `--config=with_ui` flag must be
passed to bazel.

This commit also adds a shortcut to the `dev` command to build
`obsservice` which includes the `--config=with_ui` flag just as we do by
default in `cockroach` builds.

Release note: None

83717: changefeedccl: infer WITH diff from changefeed expression r=[miretskiy] a=HonoreDB

Informs #83322. Specifically this does the "infer whether
or not to set diff" part of it.

This went through a more ambitious phase
(https://github.com/cockroachdb/cockroach/compare/master...HonoreDB:cockroach:cdc_expressions_infer_diff?expand=1)
where it also handled user defined types, but I ended up
agreeing with wiser heads that this was adding too much complexity
for no known use case.

I also came really close to setting envelope=row as the
default, but envelope=row is not supported in all sinks
so again, it felt too messy to do without a little further
conversation.

Release note (sql change): CREATE CHANGEFEED AS statements no longer need to include WITH DIFF when using cdc_prev().

83772: storage: remove experimental markers for MVCC range tombstones r=aliher1911 a=erikgrinaker

GC is still under development, but I think it's fine to remove the marker before GC works to let people start building with it.

---

This removes the experimental markers for MVCC range tombstones, by
renaming functions, methods, and parameters to not include
`experimental`, as well as warning comments.

The changes are entirely mechanical.

Release note: None

83784: scexec: refactor schema changer job update r=postamar a=postamar

This PR rebases #83724 on top of a few commits which refactor the schema changer job update machinery.


Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
craig bot pushed a commit that referenced this issue Jul 6, 2022
83684: kvstreamer: reuse incomplete Get requests on resume batches r=yuzefovich a=yuzefovich

Previously, for all incomplete requests in a batch we'd allocate new Get
and Scan requests (since - due to a known issue #75452 - at the moment
the lifecycle of the requests is not clearly defined, so we're not
allowed to modify them). However, we can reuse the Get requests since
they won't be ever modified (i.e. they are either complete or
incomplete, and, unlike for Scan requests, the start key won't ever be
shifted), so this commit takes advantage of this observation.

Release note: None

83709: pkg/util/tracing: Add hidden tag group, make server responsible for sorting. r=benbardin a=benbardin

Release note: none

This moves all tags marked as "hidden" into a single tag group at the UI layer. This declutters the trace page a little bit and makes it easier to pick out more important information.

<img width="1620" alt="Screen Shot 2022-07-01 at 12 37 07 PM" src="https://user-images.githubusercontent.com/261508/176937757-bf8ac920-9e28-4908-8de4-1fbc077fd2c7.png">


83834: outliers: extract a Registry interface. r=matthewtodd a=matthewtodd

This is a pure mechanical refactoring, preparing us for #81021, where
we'll move outlier processing off of the hot execution path. The idea is
that the outside world will continue to talk to us as a Registry, but
we'll now have a seam into which we can insert some asynchrony.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Ben Bardin <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
craig bot pushed a commit that referenced this issue Jul 7, 2022
83123: sql/stats: conversion of datums to and from quantile function values r=yuzefovich,rytaft,mgartner a=michae2

To predict histograms in statistics forecasts, we will use linear
regression over quantile functions. (Quantile functions are another
representation of histogram data, in a form more amenable to statistical
manipulation.)

The conversion of histograms to quantile functions will require
conversion of histogram bounds (datums) to quantile values (float64s).
And likewise, the inverse conversion from quantile functions back to
histograms will require the inverse conversion of float64 quantile
values back to datums. These conversions are a little different from our
usual SQL conversions in `eval.PerformCast`, so we add them to a new
quantile file in the `sql/stats` module.

This code was originally part of #77070 but has been pulled out to
simplify that PR. A few changes have been made:
- `histogramValue` has been renamed to `FromQuantileValue`.
- Support for `DECIMAL`, `TIME`, `TIMETZ`, and `INTERVAL` has been
  dropped. Clamping these types in `FromQuantileValue` was too complex
  for the first iteration of statistics forecasting. We expect the
  overwhelming majority of ascending keys to use `INT` or `TIMESTAMP`
  types.
- Bugs in `FLOAT4`, `TIMESTAMP` and `TIMESTAMPTZ` conversions have been
  fixed.
- We're now clamping timestamps to slightly tighter bounds to avoid the
  problems with infinite timestamps (see #41564).

Assists: #79872

Release note: None

83590: sql: fix and rename sql stats session transaction received time r=THardy98 a=THardy98

Resolves: #82894

Due to a change from #76792, implicit transactions can start before
`SessionQueryReceived` session phase time is set by the sqlstats system.
In turn, this caused the `SessionTransactionReceived` (now renamed as
`SessionTransactionStarted`) session phase time to be recorded
incorrectly, causing extremely large transactions times on the UI. This
change fixes this mistake by setting the actual transaction start time
as the `SessionTransactionStarted` session phase time, instead of
`SessionQueryReceived`.

Release note (bug fix): The `SessionTransactionReceived` session phase
time is no longer recorded incorrectly, fixing large transaction times
from appearing on the UI, also renamed to `SessionTransactionStarted`.

83943: outliers: protocol buffer adjustments r=matthewtodd a=matthewtodd

This handful of commits slightly re-shapes the protocol buffer messages we use for tracking outliers, making some of the work in #81021 a little easier to express.

83961: cmd/dev: add generate execgen subcommand r=ajwerner a=yuzefovich

Release note: None

83975: streamingccl: unskip TestTenantStreaming r=miretskiy a=stevendanna

This makes a couple of changes aimed at unskipping
TestTenantStreaming:

- Fix a nil pointer access in our stream status verification
  function. We changed the name of key that this function was
  accessing. This NPE was hidden by another panic along the unclean
  shutdown path in the server.

- Lower various intervals so that this test doesn't take 90 seconds.

I've run this under stress for a few hundred iterations without error.

Release note: None

83979: streamingccl: small logging and tracing cleanups r=miretskiy a=stevendanna

- Add a tracing span to ingestion job cutover.
- Move a particularly noisy log message to VInfo(3).
- Prefer log.VInfo to `if log.V(n) {}` in cases where we aren't doing
  expensive argument construction.

Release note: None

Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
surajr10 pushed a commit to surajr10/cockroach that referenced this issue Jul 12, 2022
Closes cockroachdb#79451.

Note that this detector comes disabled by default. To enable, use the
`sql.stats.outliers.experimental.latency_quantile_detection.enabled`
cluster setting.

This new detection "algorithm" is a simple, perhaps naïve, heuristic,
though it may be good enough to be useful: we consider a statement an
outlier if its execution latency is > p99 latency for its fingerprint,
so long as it's also greater than twice the median latency and greater
than an arbitrary user-defined threshold.[^1]

We expect the operation of this detector to be memory constrained. To
avoid OOM errors, we offer a cluster setting[^2] to cap the overall
memory used, along with metrics reporting the number of fingerprints
being tracked, memory usage, and "evictions," or fingerprint histories
being dropped due to memory pressure. If a high rate of evictions is
observed, it will be necessary to raise either one or both of the memory
limit and the "interesting" threshold settings to regain a steady state.

Note also that this detector is not designed to be used concurrently by
multiple goroutines. Correctness currently relies on the global lock in
`outliers.Registry`, to be addressed next in cockroachdb#81021.

[^1]: This threshold is defined by a cluster setting,
`sql.stats.outliers.experimental.latency_quantile_detection.interesting_threshold`.
Without such a threshold, we would otherwise flag a statement that, say,
normally executes in 3ms taking 7ms; but 7ms is probably still fast
enough that there's nothing to be done.

[^2]: `sql.stats.outliers.experimental.latency_quantile_detection.memory_limit`

Release note: None
craig bot pushed a commit that referenced this issue Jul 15, 2022
84406: sql: make setup of leaf txn for streamer more bullet-proof r=yuzefovich a=yuzefovich

This commit makes sure that we try to use the streamer API only if we
can actually create a non-nil `LeafTxn`. In some edge cases it appears
that all the previous checks were insufficient, and this commit should
take care of that. Note that I couldn't reproduce those edge cases
manually, so there is no regression test.

Fixes: #84239.

Release note: None

84455: outliers: prepare for asynchronous processing r=matthewtodd a=matthewtodd

Moving a few things around in advance of the real work of #81021.

84485: ui: remove link to stmt details on sessions details r=maryliag a=maryliag

Previously, when a statment was active, it would show
on the Sessions page with a link to view its details,
but since the statement was not yet saved/persisted,
clicking the link it would crash the Statement Details
page.
This commit removes this link.

Fixes #84462

Release note (ui change): Removal of `View Statement Details`
link inside the Sessions Details page.

84488: roachtest: lower descriptor lease duration in acceptance/version-upgrade r=srosenberg a=renatolabs

The `acceptance/version-upgrade` test uncovered a descriptor lease
leak that can lead to the test timing out due to waiting a full lease
duration (5 minutes by default), making it flaky.

Once the bug is fixed, we should be able to use the default duration
again.

Relates to #84382.

Release note: None.

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
craig bot pushed a commit that referenced this issue Aug 11, 2022
63416: sql: emit point deletes during delete fastpath r=yuzefovich a=jordanlewis

Previously, the "deleteRange" SQL operator, which is meant to be a
fast-path for cases in which an entire range of keys can be deleted,
always did what it said: emitted DeleteRange KV operations. This
precludes a crucial optimization: sending point deletes when the list of
deleted keys is exactly known.

For example, a query like `DELETE FROM kv WHERE k = 10000` uses the
"fast path" delete, since it has a contiguous set of keys to delete, and
it doesn't need to know the values that were deleted. But, in this case,
the performance is actually worse if we use a DeleteRange KV operation
for various reasons (see #53939), because:

- ranged KV writes (DeleteRangeRequest) cannot be pipelined because an
  enumeration of the intents that they will leave cannot be known ahead
  of time. They must therefore perform evaluation and replication
  synchronously.
- ranged KV writes (DeleteRangeRequest) result in ranged intent
  resolution, which is less efficient (although this became less
  important since we re-enabled time-bound iterators).

The reason we couldn't previously emit point deletes in this case is
that SQL needs to know whether it deleted something or not. This means
we can't do a "blind put" of a deletion: we need to actually understand
whether there was something that we were "overwriting" with our delete.

This commit modifies the DeleteResponse to always return a boolean
indicating whether a key from the DeleteRequest was actually deleted.

Additionally, the deleteRange SQL operator detects when it can emit
single-key deletes, and does so.

Closes #53939.

Release note (performance improvement): point deletes in SQL are now
more efficient during concurrent workloads.

76233: kv: remove clock update on BatchResponse r=nvanbenschoten a=nvanbenschoten

Before this change, we were updating the local clock with each BatchResponse's WriteTimestamp. This was meant to handle cases where the batch request timestamp was forwarded during evaluation.

This was unnecessary for two reasons.

The first is that a BatchResponse can legitimately carry an operation timestamp that leads the local HLC clock on the leaseholder that evaluated the request. This has been true since #80706, which introduced the concept of a "local timestamp". This allowed us to remove the (broken) attempt at ensuring that the HLC on a leaseholder always leads the MVCC timestamp of all values in the leaseholder's keyspace (see the update to `pkg/kv/kvserver/uncertainty/doc.go` in that PR).

The second was that it was not even correct. The idea behind bumping the HLC on the response path was to ensure that if a batch request was forwarded to a newer timestamp during evaluation and then completed a write, that forwarded timestamp would be reflected in the leaseholder's HLC. However, this ignored the fact that any forwarded timestamp must have either come from an existing value in the range or from the leaseholder's clock. So if those didn't lead the clock, the derived timestamp wouldn't either. It also ignored the fact that the clock bump here was too late (post-latch release) and if it had actually been needed (it wasn't), it wouldn't have even ensured that the timestamp on any lease transfer led the maximum time of any response served by the outgoing leaseholder.

There are no mixed-version migration concerns of this change, because #80706 ensured that any future-time operation will still continue to use the synthetic bit until all nodes are running v22.2 or later.

85350: insights: ingester r=matthewtodd a=matthewtodd

Closes #81021.
    
Here we begin observing statements and transactions asynchronously, to
avoid slowing down the hot sql execution path as much as possible.
    
Release note: None

85440: colmem: improve memory-limiting behavior of the accounting helpers r=yuzefovich a=yuzefovich

**colmem: introduce a helper method when no memory limit should be applied**

This commit is a pure mechanical change.

Release note: None

**colmem: move some logic of capacity-limiting into the accounting helper**

This commit moves the logic that was duplicated across each user of the
SetAccountingHelper into the helper itself. Clearly, this allows us to
de-duplicate some code, but it'll make it easier to refactor the code
which is done in the following commit.

Additionally, this commit makes a tiny change to make the resetting
behavior in the hash aggregator more precise.

Release note: None

**colmem: improve memory-limiting behavior of the accounting helpers**

This commit fixes an oversight in how we are allocating batches of the
"dynamic" capacity. We have two related ways for reallocating batches,
and both of them work by growing the capacity of the batch until the
memory limit is exceeded, and then the batch would be reused until the
end of the query execution. This is a reasonable heuristic under the
assumption that all tuples in the data stream are roughly equal in size,
but this might not be the case.

In particular, consider an example when 10k small rows of 1KiB are
followed by 10k large rows of 1MiB. According to our heuristic, we
happily grow the batch until 1024 in capacity, and then we do not shrink
the capacity of that batch, so once the large rows start appearing, we
put 1GiB worth of data into a single batch, significantly exceeding our
memory limit (usually 64MiB with the default `workmem` setting).

This commit introduces a new heuristic as follows:
- the first time a batch exceeds the memory limit, its capacity is memorized,
  and from now on that capacity will determine the upper bound on the
  capacities of the batches allocated through the helper;
- if at any point in time a batch exceeds the memory limit by at least a
  factor of two, then that batch is discarded, and the capacity will never
  exceed half of the capacity of the discarded batch;
- if the memory limit is not reached, then the behavior of the dynamic growth
  of the capacity provided by `Allocator.ResetMaybeReallocate` is still
  applicable (i.e. the capacities will grow exponentially until
  coldata.BatchSize()).

Note that this heuristic does not have an ability to grow the maximum
capacity once it's been set although it might make sense to do so (say,
if after shrinking the capacity, the next five times we see that the
batch is using less than half of the memory limit). This is an conscious
omission since I want this change to be backported, and never growing
seems like a safer choice. Thus, this improvement is left as a TODO.

Also, we still might create batches that are too large in memory
footprint in those places that don't use the SetAccountingHelper (e.g.
in the columnarizer) since we perform the memory limit check at the
batch granularity. However, this commit improves things there so that we
don't reuse that batch on the next iteration and will use half of the
capacity on the next iteration.

Fixes: #76464.

Release note (bug fix): CockroachDB now more precisely respects the
`distsql_workmem` setting which improves the stability of each node and
makes OOMs less likely.

**colmem: unexport Allocator.ResetMaybeReallocate**

This commit is a mechanical change to unexport
`Allocator.ResetMaybeReallocate` so that the users would be forced to use
the method with the same name from the helpers. This required splitting
off the tests into two files.

Release note: None

85492: backupccl: remap all restored tables r=dt a=dt

This PR has a few changes, broken down into separate commits:
a) stop restoring tmp tables and remove the special-case code to synthesize their special schemas; These were previously restored only to be dropped so that restored jobs that referenced them would not be broken, but we stopped restoring jobs.
b) synthesize type-change jobs during cluster restore; this goes with not restoring jobs.
c) fix some assumptions in tests/other code about what IDs restored tables have.
d) finally, always assign new IDs to all restored objects, even during cluster restore, removing the need to carefully move conflicting tables or other things around.

Commit-by-commit review recommended.


85930: jobs: make expiration use intended txn priority r=ajwerner a=rafiss

In aed014f these operations were supposed to be changed to use
MinUserPriority. However, they weren't using the appropriate txn, so it
didn't have the intended effect.

Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in 1533b8e Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-observability Related to observability of the SQL layer C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants