Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvflowcontrol: implement kvflowcontrol.Dispatch #97766

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

irfansharif
Copy link
Contributor

Part of #95563. Dispatch is a concrete implementation of the kvflowcontrol.Dispatch interface. It's used to dispatch information about admitted raft log entries to the specific nodes where (i) said entries originated, (ii) flow tokens were deducted and (iii) are waiting to be returned. This type is also used to read pending dispatches, which will be used in the raft transport layer when looking to piggyback information on traffic already bound to specific nodes. Since timely dispatching (read: piggybacking) is not guaranteed, we allow querying for all long-overdue dispatches.

Internally it's able to coalesce dispatches bound for a given node. If dispatching admission information for two log entries with the same <RangeID,StoreID,WorkPriority> triple, with log positions L1 and L2 where L1 < L2, we can simply dispatch the one with L2. We leave the integration of this type with the {Store,}WorkQueue (#97599) + raft transport to future PRs.

Release note: None

@irfansharif irfansharif requested a review from a team as a code owner February 27, 2023 23:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif force-pushed the 230227.kvflowdispatch branch 2 times, most recently from 7380e06 to 7e86ec4 Compare February 28, 2023 14:10
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/kvflowcontrol/kvflowdispatch/kvflowdispatch.go line 42 at r1 (raw file):

}

type dispatches map[dispatchKey]kvflowcontrolpb.AdmittedRaftLogEntries

nit: is there a reason the value is AdmittedRaftLogEntries and not RaftLogPosition, given that AdmittedRaftLogEntries is the set of fields in dispatchKey + the fields in RaftLogPosition?


pkg/kv/kvserver/kvflowcontrol/kvflowdispatch/testdata/multiple_nodes_priorities_stores line 40 at r1 (raw file):

----

# vim:ft=sh

nit: why do you need these vim (filetype?) directives?

Part of cockroachdb#95563. Dispatch is a concrete implementation of the
kvflowcontrol.Dispatch interface. It's used to dispatch information
about admitted raft log entries to the specific nodes where (i) said
entries originated, (ii) flow tokens were deducted and (iii) are waiting
to be returned. This type is also used to read pending dispatches, which
will be used in the raft transport layer when looking to piggyback
information on traffic already bound to specific nodes. Since timely
dispatching (read: piggybacking) is not guaranteed, we allow querying
for all long-overdue dispatches.

Internally it's able to coalesce dispatches bound for a given node. If
dispatching admission information for two log entries with the same
<RangeID,StoreID,WorkPriority> triple, with log positions L1 and L2
where L1 < L2, we can simply dispatch the one with L2. We leave the
integration of this type with the {Store,}WorkQueue (cockroachdb#97599) + raft
transport to future PRs.

Release note: None
@irfansharif irfansharif force-pushed the 230227.kvflowdispatch branch from 7e86ec4 to 1b01e63 Compare March 7, 2023 18:59
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


pkg/kv/kvserver/kvflowcontrol/kvflowdispatch/kvflowdispatch.go line 42 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: is there a reason the value is AdmittedRaftLogEntries and not RaftLogPosition, given that AdmittedRaftLogEntries is the set of fields in dispatchKey + the fields in RaftLogPosition?

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowdispatch/testdata/multiple_nodes_priorities_stores line 40 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: why do you need these vim (filetype?) directives?

It makes for somewhat better syntax highlighting in editors, given we don't typically use filename extensions.

image.png

@craig
Copy link
Contributor

craig bot commented Mar 7, 2023

Build succeeded:

@craig craig bot merged commit cb36e3b into cockroachdb:master Mar 7, 2023
@irfansharif irfansharif deleted the 230227.kvflowdispatch branch March 7, 2023 20:36
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Apr 7, 2023
In a subsequent commit, when integrating kvflowcontrol into the critical
path for replication traffic, we'll set up the return of flow tokens
from the receiver node back to the sender once log entries get
(asynchronously) admitted[^1]. So we need to intercept the exact points
at which the virtually enqueued work items get admitted, since it all
happens asynchronously[^2]. To that end we introduce the following
interface:

    // OnLogEntryAdmitted is used to observe the specific entries
    // (identified by rangeID + log position) that were admitted. Since
    // admission control for log entries is asynchronous/non-blocking,
    // this allows callers to do requisite post-admission
    // bookkeeping.
    type OnLogEntryAdmitted interface {
     AdmittedLogEntry(
       origin roachpb.NodeID, /* node where the entry originated */
       pri admissionpb.WorkPriority, /* admission priority of the entry */
       storeID roachpb.StoreID, /* store on which the entry was admitted */
       rangeID roachpb.RangeID, /* identifying range for the log entry */
       pos LogPosition, /* log position of the entry that was admitted*/
     )
    }

For now we pass in a no-op implementation in production code, but this
will change shortly.

Seeing as how the asynchronous admit interface is going to be the
primary once once we enable replication admission control by default,
for IO control, we no longer need the storeWriteDone interfaces and
corresponding types. It's being used by our current (and soon-to-be
legacy) above-raft IO admission control to inform granters of when the
write was actually done, post-admission. For above-raft IO control, at
admit-time we do not have sizing info for the writes, so by intercepting
these writes at write-done time we're able to make any outstanding token
adjustments in the granter.

To reflect this new world, we:
- Rename setAdmittedDoneModels to setLinearModels.
- Introduce a storeReplicatedWorkAdmittedInfo[^3]. It provides
  information about the size of replicated work once it's admitted
  (which happens asynchronously from the work itself). This lets us use
  the underlying linear models for L0 {writes,ingests} to deduct an
  appropriate number of tokens from the granter, for the admitted work
  size[^4].
- Rename the granterWithStoreWriteDone interface to
  granterWithStoreReplicatedWorkAdmitted. We'll still intercept the
  actual point of admission for some token adjustments, through the
  the storeReplicatedWorkAdmittedLocked API shown below. There are two
  callstacks through which this API gets invoked, one where the coord.mu
  is already held, and one where it isn't. We plumb this information
  through so the lock is acquired if not already held. The locking
  structure is unfortunate, but this was a minimally invasive diff.

   storeReplicatedWorkAdmittedLocked(
    originalTokens int64,
    admittedInfo storeReplicatedWorkAdmittedInfo,
   ) (additionalTokens int64)

While here, we also export an admission.TestingReverseWorkPriorityDict.
There are at least three tests that have re-invented the wheel.

[^1]: This will happen through the kvflowcontrol.Dispatch interface
      introduced back in cockroachdb#97766, after integrating it with the RaftTransport
      layer.
[^2]: Introduced in cockroachdb#97599, for replicated write work.
[^3]: Identical to the previous StoreWorkDoneInfo.
[^4]: There's a peculiarity here in that at enqueuing-time we actually
      know the size of the write, so we could have deducted the right
      number of tokens upfront and avoid this post-admit granter token
      adjustment. We inherit this structure from earlier, and just leave
      a TODO for now.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Apr 27, 2023
In a subsequent commit, when integrating kvflowcontrol into the critical
path for replication traffic, we'll set up the return of flow tokens
from the receiver node back to the sender once log entries get
(asynchronously) admitted[^1]. So we need to intercept the exact points
at which the virtually enqueued work items get admitted, since it all
happens asynchronously[^2]. To that end we introduce the following
interface:

    // OnLogEntryAdmitted is used to observe the specific entries
    // (identified by rangeID + log position) that were admitted. Since
    // admission control for log entries is asynchronous/non-blocking,
    // this allows callers to do requisite post-admission
    // bookkeeping.
    type OnLogEntryAdmitted interface {
     AdmittedLogEntry(
       origin roachpb.NodeID, /* node where the entry originated */
       pri admissionpb.WorkPriority, /* admission priority of the entry */
       storeID roachpb.StoreID, /* store on which the entry was admitted */
       rangeID roachpb.RangeID, /* identifying range for the log entry */
       pos LogPosition, /* log position of the entry that was admitted*/
     )
    }

For now we pass in a no-op implementation in production code, but this
will change shortly.

Seeing as how the asynchronous admit interface is going to be the
primary once once we enable replication admission control by default,
for IO control, we no longer need the storeWriteDone interfaces and
corresponding types. It's being used by our current (and soon-to-be
legacy) above-raft IO admission control to inform granters of when the
write was actually done, post-admission. For above-raft IO control, at
admit-time we do not have sizing info for the writes, so by intercepting
these writes at write-done time we're able to make any outstanding token
adjustments in the granter.

To reflect this new world, we:
- Rename setAdmittedDoneModels to setLinearModels.
- Introduce a storeReplicatedWorkAdmittedInfo[^3]. It provides
  information about the size of replicated work once it's admitted
  (which happens asynchronously from the work itself). This lets us use
  the underlying linear models for L0 {writes,ingests} to deduct an
  appropriate number of tokens from the granter, for the admitted work
  size[^4].
- Rename the granterWithStoreWriteDone interface to
  granterWithStoreReplicatedWorkAdmitted. We'll still intercept the
  actual point of admission for some token adjustments, through the
  the storeReplicatedWorkAdmittedLocked API shown below. There are two
  callstacks through which this API gets invoked, one where the coord.mu
  is already held, and one where it isn't. We plumb this information
  through so the lock is acquired if not already held. The locking
  structure is unfortunate, but this was a minimally invasive diff.

   storeReplicatedWorkAdmittedLocked(
    originalTokens int64,
    admittedInfo storeReplicatedWorkAdmittedInfo,
   ) (additionalTokens int64)

While here, we also export an admission.TestingReverseWorkPriorityDict.
There are at least three tests that have re-invented the wheel.

[^1]: This will happen through the kvflowcontrol.Dispatch interface
      introduced back in cockroachdb#97766, after integrating it with the RaftTransport
      layer.
[^2]: Introduced in cockroachdb#97599, for replicated write work.
[^3]: Identical to the previous StoreWorkDoneInfo.
[^4]: There's a peculiarity here in that at enqueuing-time we actually
      know the size of the write, so we could have deducted the right
      number of tokens upfront and avoid this post-admit granter token
      adjustment. We inherit this structure from earlier, and just leave
      a TODO for now.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Apr 28, 2023
In a subsequent commit, when integrating kvflowcontrol into the critical
path for replication traffic, we'll set up the return of flow tokens
from the receiver node back to the sender once log entries get
(asynchronously) admitted[^1]. So we need to intercept the exact points
at which the virtually enqueued work items get admitted, since it all
happens asynchronously[^2]. To that end we introduce the following
interface:

    // OnLogEntryAdmitted is used to observe the specific entries
    // (identified by rangeID + log position) that were admitted. Since
    // admission control for log entries is asynchronous/non-blocking,
    // this allows callers to do requisite post-admission
    // bookkeeping.
    type OnLogEntryAdmitted interface {
     AdmittedLogEntry(
       origin roachpb.NodeID, /* node where the entry originated */
       pri admissionpb.WorkPriority, /* admission priority of the entry */
       storeID roachpb.StoreID, /* store on which the entry was admitted */
       rangeID roachpb.RangeID, /* identifying range for the log entry */
       pos LogPosition, /* log position of the entry that was admitted*/
     )
    }

For now we pass in a no-op implementation in production code, but this
will change shortly.

Seeing as how the asynchronous admit interface is going to be the
primary once once we enable replication admission control by default,
for IO control, we no longer need the storeWriteDone interfaces and
corresponding types. It's being used by our current (and soon-to-be
legacy) above-raft IO admission control to inform granters of when the
write was actually done, post-admission. For above-raft IO control, at
admit-time we do not have sizing info for the writes, so by intercepting
these writes at write-done time we're able to make any outstanding token
adjustments in the granter.

To reflect this new world, we:
- Rename setAdmittedDoneModels to setLinearModels.
- Introduce a storeReplicatedWorkAdmittedInfo[^3]. It provides
  information about the size of replicated work once it's admitted
  (which happens asynchronously from the work itself). This lets us use
  the underlying linear models for L0 {writes,ingests} to deduct an
  appropriate number of tokens from the granter, for the admitted work
  size[^4].
- Rename the granterWithStoreWriteDone interface to
  granterWithStoreReplicatedWorkAdmitted. We'll still intercept the
  actual point of admission for some token adjustments, through the
  the storeReplicatedWorkAdmittedLocked API shown below. There are two
  callstacks through which this API gets invoked, one where the coord.mu
  is already held, and one where it isn't. We plumb this information
  through so the lock is acquired if not already held. The locking
  structure is unfortunate, but this was a minimally invasive diff.

   storeReplicatedWorkAdmittedLocked(
    originalTokens int64,
    admittedInfo storeReplicatedWorkAdmittedInfo,
   ) (additionalTokens int64)

While here, we also export an admission.TestingReverseWorkPriorityDict.
There are at least three tests that have re-invented the wheel.

[^1]: This will happen through the kvflowcontrol.Dispatch interface
      introduced back in cockroachdb#97766, after integrating it with the RaftTransport
      layer.
[^2]: Introduced in cockroachdb#97599, for replicated write work.
[^3]: Identical to the previous StoreWorkDoneInfo.
[^4]: There's a peculiarity here in that at enqueuing-time we actually
      know the size of the write, so we could have deducted the right
      number of tokens upfront and avoid this post-admit granter token
      adjustment. We inherit this structure from earlier, and just leave
      a TODO for now.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request May 7, 2023
In a subsequent commit, when integrating kvflowcontrol into the critical
path for replication traffic, we'll set up the return of flow tokens
from the receiver node back to the sender once log entries get
(asynchronously) admitted[^1]. So we need to intercept the exact points
at which the virtually enqueued work items get admitted, since it all
happens asynchronously[^2]. To that end we introduce the following
interface:

    // OnLogEntryAdmitted is used to observe the specific entries
    // (identified by rangeID + log position) that were admitted. Since
    // admission control for log entries is asynchronous/non-blocking,
    // this allows callers to do requisite post-admission
    // bookkeeping.
    type OnLogEntryAdmitted interface {
     AdmittedLogEntry(
       origin roachpb.NodeID, /* node where the entry originated */
       pri admissionpb.WorkPriority, /* admission priority of the entry */
       storeID roachpb.StoreID, /* store on which the entry was admitted */
       rangeID roachpb.RangeID, /* identifying range for the log entry */
       pos LogPosition, /* log position of the entry that was admitted*/
     )
    }

For now we pass in a no-op implementation in production code, but this
will change shortly.

Seeing as how the asynchronous admit interface is going to be the
primary once once we enable replication admission control by default,
for IO control, we no longer need the storeWriteDone interfaces and
corresponding types. It's being used by our current (and soon-to-be
legacy) above-raft IO admission control to inform granters of when the
write was actually done, post-admission. For above-raft IO control, at
admit-time we do not have sizing info for the writes, so by intercepting
these writes at write-done time we're able to make any outstanding token
adjustments in the granter.

To reflect this new world, we:
- Rename setAdmittedDoneModels to setLinearModels.
- Introduce a storeReplicatedWorkAdmittedInfo[^3]. It provides
  information about the size of replicated work once it's admitted
  (which happens asynchronously from the work itself). This lets us use
  the underlying linear models for L0 {writes,ingests} to deduct an
  appropriate number of tokens from the granter, for the admitted work
  size[^4].
- Rename the granterWithStoreWriteDone interface to
  granterWithStoreReplicatedWorkAdmitted. We'll still intercept the
  actual point of admission for some token adjustments, through the
  the storeReplicatedWorkAdmittedLocked API shown below. There are two
  callstacks through which this API gets invoked, one where the coord.mu
  is already held, and one where it isn't. We plumb this information
  through so the lock is acquired if not already held. The locking
  structure is unfortunate, but this was a minimally invasive diff.

   storeReplicatedWorkAdmittedLocked(
    originalTokens int64,
    admittedInfo storeReplicatedWorkAdmittedInfo,
   ) (additionalTokens int64)

While here, we also export an admission.TestingReverseWorkPriorityDict.
There are at least three tests that have re-invented the wheel.

[^1]: This will happen through the kvflowcontrol.Dispatch interface
      introduced back in cockroachdb#97766, after integrating it with the RaftTransport
      layer.
[^2]: Introduced in cockroachdb#97599, for replicated write work.
[^3]: Identical to the previous StoreWorkDoneInfo.
[^4]: There's a peculiarity here in that at enqueuing-time we actually
      know the size of the write, so we could have deducted the right
      number of tokens upfront and avoid this post-admit granter token
      adjustment. We inherit this structure from earlier, and just leave
      a TODO for now.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request May 11, 2023
In a subsequent commit, when integrating kvflowcontrol into the critical
path for replication traffic, we'll set up the return of flow tokens
from the receiver node back to the sender once log entries get
(asynchronously) admitted[^1]. So we need to intercept the exact points
at which the virtually enqueued work items get admitted, since it all
happens asynchronously[^2]. To that end we introduce the following
interface:

    // OnLogEntryAdmitted is used to observe the specific entries
    // (identified by rangeID + log position) that were admitted. Since
    // admission control for log entries is asynchronous/non-blocking,
    // this allows callers to do requisite post-admission
    // bookkeeping.
    type OnLogEntryAdmitted interface {
     AdmittedLogEntry(
       origin roachpb.NodeID, /* node where the entry originated */
       pri admissionpb.WorkPriority, /* admission priority of the entry */
       storeID roachpb.StoreID, /* store on which the entry was admitted */
       rangeID roachpb.RangeID, /* identifying range for the log entry */
       pos LogPosition, /* log position of the entry that was admitted*/
     )
    }

For now we pass in a no-op implementation in production code, but this
will change shortly.

Seeing as how the asynchronous admit interface is going to be the
primary once once we enable replication admission control by default,
for IO control, we no longer need the storeWriteDone interfaces and
corresponding types. It's being used by our current (and soon-to-be
legacy) above-raft IO admission control to inform granters of when the
write was actually done, post-admission. For above-raft IO control, at
admit-time we do not have sizing info for the writes, so by intercepting
these writes at write-done time we're able to make any outstanding token
adjustments in the granter.

To reflect this new world, we:
- Rename setAdmittedDoneModels to setLinearModels.
- Introduce a storeReplicatedWorkAdmittedInfo[^3]. It provides
  information about the size of replicated work once it's admitted
  (which happens asynchronously from the work itself). This lets us use
  the underlying linear models for L0 {writes,ingests} to deduct an
  appropriate number of tokens from the granter, for the admitted work
  size[^4].
- Rename the granterWithStoreWriteDone interface to
  granterWithStoreReplicatedWorkAdmitted. We'll still intercept the
  actual point of admission for some token adjustments, through the
  the storeReplicatedWorkAdmittedLocked API shown below. There are two
  callstacks through which this API gets invoked, one where the coord.mu
  is already held, and one where it isn't. We plumb this information
  through so the lock is acquired if not already held. The locking
  structure is unfortunate, but this was a minimally invasive diff.

   storeReplicatedWorkAdmittedLocked(
    originalTokens int64,
    admittedInfo storeReplicatedWorkAdmittedInfo,
   ) (additionalTokens int64)

While here, we also export an admission.TestingReverseWorkPriorityDict.
There are at least three tests that have re-invented the wheel.

[^1]: This will happen through the kvflowcontrol.Dispatch interface
      introduced back in cockroachdb#97766, after integrating it with the RaftTransport
      layer.
[^2]: Introduced in cockroachdb#97599, for replicated write work.
[^3]: Identical to the previous StoreWorkDoneInfo.
[^4]: There's a peculiarity here in that at enqueuing-time we actually
      know the size of the write, so we could have deducted the right
      number of tokens upfront and avoid this post-admit granter token
      adjustment. We inherit this structure from earlier, and just leave
      a TODO for now.

Release note: None
craig bot pushed a commit that referenced this pull request May 11, 2023
101786: workload: introduce timeout for pre-warming connection pool r=sean- a=sean-

Interrupting target instances during prewarming shouldn't cause workload to proceed: introduce a timeout to prewarming connections.  Connections will have 15s to 5min to warmup before the context will expire.

Epic: none

101987: cli/sql: new option autocerts for TLS client cert auto-discovery r=rafiss a=knz

Fixes #101986.

See the release note below.
An additional benefit not mentioned in the release note is that
it simplifies switching from one tenant to another when using
shared-process multitenancy. For example, this becomes possible:

```
> CREATE TENANT foo;
> ALTER TENANT foo START SERVICE SHARED;
> \c cluster:foo root - - autocerts
```

Alternatively, this can also be used to quickly switch from a non-root
user in an app tenant to the root user in the system tenant:
```
> \c cluster:system root - - autocerts
```

This works because (currently) all tenant servers running side-by-side
use the same TLS CA to validate SQL client certs.

----

Release note (cli change): The `\connect` client-side command for the
SQL shell (included in `cockroach sql`, `cockroach demo`,
`cockroach-sql`) now recognizes an option `autocerts` as last
argument.

When provided, `\c` will now try to discover a TLS client
certificate and key in the same directory(ies) as used by the previous
connection URL.

This feature makes it easier to switch usernames when
TLS client/key files are available for both the previous and the new
username.

102382: c2c: deflake c2c/shutdown roachtests r=stevendanna a=msbutler

   c2c: deflake c2c/shutdown roachtests

    This patch addresses to roachtest failure modes:
    - Prevents roachtest failure if a query fails during a node shutdown.

    - Prevents the src cluster from returning a single node topology, which could
      cause the stream ingestion job to hang if the participating src node gets
    shut down. Longer term, automatic replanning will prevent this. Specifically,
    this patch changes the kv workload to split and scatter the kv table across the
    cluster before the c2c job begins.

    Fixes #101898
    Fixes #102111

    This patch also makes it easier to reproduce c2c roachtest failures by plumbing
    a random seed to several components of the roachtest driver.

    Release note: None


    c2c: rename completeStreamIngestion to applyCutoverTime

    Release note: none


    workload: add --scatter flag to kv workload

    The user can now run `./workload init kv --scatter ....` which scatters the kv
    table across the cluster after the initial data load. This flag is best used
    with `--splits`, `--max-block-bytes`, and `--insert-count`.

    Epic: none

    Release note: none

102819: admission: move CreateTime-sequencing below-raft r=irfansharif a=irfansharif

These are already reviewed commits from #98308. Part of #95563.

---

**admission: move CreateTime-sequencing below-raft**

We move kvflowsequencer.Sequencer and its use in kvflowhandle.Handle (above-raft) to admission.sequencer, now used by admission.StoreWorkQueue (below-raft). This variant appeared in an earlier revision of #97599 where we first introduced monotonically increasing CreateTimes for a given raft group.

In a subsequent commit, when integrating kvflowcontrol into the critical path for replication traffic, we'll observe that it's quite difficult to create sequencing CreateTimes[^1] above raft. This is because these sequence numbers are encoded as part of the raft proposal[^2], and at encode-time, we don't actually know what log position the proposal is going to end up in. It's hard to explicitly guarantee that a proposal with log-position P1 will get encoded before another with log position P2, where P1 < P2.

Naively sequencing CreateTimes at proposal-encode-time could result in over-admission. This is because of how we return flow tokens -- up to some log index[^3], and how use these sequence numbers in below-raft WorkQueues. If P2 ends up with a lower sequence number/CreateTime, it would get admitted first, and when returning flow tokens by log position, in specifying up-to-P2, we'll early return P1's flow tokens despite it not being admitted. So we'd over-admit at the sender. This is all within a <tenant,priority> pair.

[^1]: We use CreateTimes as "sequence numbers" in replication admission control. We want to assign each AC-queued work below-raft a "sequence number" for FIFO ordering within a <tenant,priority>. We ensure these timestamps are roughly monotonic with respect to log positions of replicated work by sequencing work in log position order.
[^2]: In kvflowcontrolpb.RaftAdmissionMeta.
[^3]: See kvflowcontrolpb.AdmittedRaftLogEntries.

---

**admission: add intercept points for when replicated work gets admitted**

In a subsequent commit, when integrating kvflowcontrol into the critical path for replication traffic, we'll set up the return of flow tokens from the receiver node back to the sender once log entries get (asynchronously) admitted[^4]. So we need to intercept the exact points at which the virtually enqueued work items get admitted, since it all happens asynchronously[^5]. To that end we introduce the following interface:
```go
    // OnLogEntryAdmitted is used to observe the specific entries
    // (identified by rangeID + log position) that were admitted. Since
    // admission control for log entries is asynchronous/non-blocking,
    // this allows callers to do requisite post-admission
    // bookkeeping.
    type OnLogEntryAdmitted interface {
     AdmittedLogEntry(
       origin roachpb.NodeID, /* node where the entry originated */
       pri admissionpb.WorkPriority, /* admission priority of the entry */
       storeID roachpb.StoreID, /* store on which the entry was admitted */
       rangeID roachpb.RangeID, /* identifying range for the log entry */
       pos LogPosition, /* log position of the entry that was admitted*/
     )
    }
```
For now we pass in a no-op implementation in production code, but this will change shortly.

Seeing as how the asynchronous admit interface is going to be the primary once once we enable replication admission control by default, for IO control, we no longer need the storeWriteDone interfaces and corresponding types. It's being used by our current (and soon-to-be legacy) above-raft IO admission control to inform granters of when the write was actually done, post-admission. For above-raft IO control, at admit-time we do not have sizing info for the writes, so by intercepting these writes at write-done time we're able to make any outstanding token adjustments in the granter.

To reflect this new world, we:
- Rename setAdmittedDoneModels to setLinearModels.
- Introduce a storeReplicatedWorkAdmittedInfo[^6]. It provides information about the size of replicated work once it's admitted (which happens asynchronously from the work itself). This lets us use the underlying linear models for L0 {writes,ingests} to deduct an appropriate number of tokens from the granter, for the admitted work size[^7].
- Rename the granterWithStoreWriteDone interface to granterWithStoreReplicatedWorkAdmitted. We'll still intercept the actual point of admission for some token adjustments, through the the storeReplicatedWorkAdmittedLocked API shown below. There are two callstacks through which this API gets invoked, one where the coord.mu is already held, and one where it isn't. We plumb this information through so the lock is acquired if not already held. The locking structure is unfortunate, but this was a minimally invasive diff.
```go
   storeReplicatedWorkAdmittedLocked(
    originalTokens int64,
    admittedInfo storeReplicatedWorkAdmittedInfo,
   ) (additionalTokens int64)
```
While here, we also export an admission.TestingReverseWorkPriorityDict. There are at least three tests that have re-invented the wheel.

[^4]: This will happen through the kvflowcontrol.Dispatch interface introduced back in #97766, after integrating it with the RaftTransport layer.
[^5]: Introduced in #97599, for replicated write work.
[^6]: Identical to the previous StoreWorkDoneInfo.
[^7]: There's a peculiarity here in that at enqueuing-time we actually know the size of the write, so we could have deducted the right number of tokens upfront and avoid this post-admit granter token adjustment. We inherit this structure from earlier, and just leave a TODO for now.


103116: generate-logic-test: fix incorrect timeout in logictests template r=rickystewart a=healthy-pod

In #102719, we changed the way we set `-test.timeout` but didn't update the logictests template. This code change updates the template.

Release note: None
Epic: none

Co-authored-by: Sean Chittenden <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: healthy-pod <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants