-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: meta-issue to organize part of the work in CRDB-26691 #98431
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Comments
knz
added
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
Mar 11, 2023
knz
changed the title
server: meta-issue to organize the work in CRDB-23559
server: meta-issue to organize part of work in CRDB-23559
Mar 11, 2023
knz
changed the title
server: meta-issue to organize part of work in CRDB-23559
server: meta-issue to organize part of the work in CRDB-23559
Mar 11, 2023
This was referenced Mar 11, 2023
craig bot
pushed a commit
that referenced
this issue
Mar 16, 2023
98440: upgrades,jobs: stub the auto config runner r=stevendanna,ajwerner a=knz Epic: CRDB-23559 Informs #98431. This change is part of a larger project to perform auto-configuration in newly created clusters. One of the components will be the "auto config runner", a job which receives configuration to apply from the environment, creates other jobs to apply this configuration, and stores the execution results. This commit stubs the job, which means allocating a cluster version for it and instantiating it via a cluster migration. The job is otherwise idle at this point. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Apr 3, 2023
98459: server,autoconfig: automatic configuration via config tasks r=adityamaru a=knz Epic: CRDB-23559 Informs #98431. All commits but the last are from #98993. This change introduces "auto config tasks", a mechanism through which configuration payloads ("tasks") can be injected into a running SQL service. This is driven via the "auto config runner" job that was introduced in the previous commit. The job listens for the arrival of new task definitions via a `Provider` interface. When new tasks are known, and previous tasks have completed, the runner creates a job for the first next task. Release note: None 100476: server/drain: shut down SQL subsystems gracefully before releasing table leases r=JeffSwenson,rytaft a=knz Needed for #99941 and #99958. Epic: CRDB-23559 See individual commits for details. 100511: sqlccl: deflake TestGCTenantJobWaitsForProtectedTimestamps r=adityamaru,arulajmani a=knz Fixes #94808 The tenant server must be shut down before the tenant record is removed; otherwise the tenant's SQL server will self-terminate by calling Stop() on its stopper, which in this case was shared with the parent cluster. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Apr 20, 2023
98466: cli,server: static configuration profiles r=stevendanna a=knz Epic: CRDB-23559 Informs #98431. Fixes #94856. (Based off #98459) Supersedes #98380. This change introduces a mechanism through which an operator can select a "configuration profile" via the command-line flag `--config-profile` or env var `COCKROACH_CONFIG_PROFILE`. The SQL initialization defined by the profile is applied during server start-up. The profiles are (currently) hardcoded inside CockroachDB. The following profiles are predefined: - `default`: no configuration. - `multitenant+noapp`: no pre-defined `application` tenant, but with a predefined application tenant template that is used whenever a new tenant is defined. This config profile is meant for use for C2C replication target clusters. - `multitenant+app+sharedservice`: shared-process multitenancy with pre-defined `application` tenant, based off the same configuration as `multitenant+noapp`. Release note: None 101907: multitenant: misc fixes related to tenant capabilities r=arulajmani a=knz See individual commits for details. The last commit in particular probably addresses #99087. Epic: CRDB-23559 101935: sql: add issue number to todo r=rharding6373 a=rharding6373 Epic: none Informs: #101934 Release note: none Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: rharding6373 <[email protected]>
This was referenced Apr 20, 2023
knz
changed the title
server: meta-issue to organize part of the work in CRDB-23559
server: meta-issue to organize part of the work in CRDB-26691
Apr 21, 2023
craig bot
pushed a commit
that referenced
this issue
Jul 23, 2023
106414: server: properly check existence of tenant record r=yuzefovich,stevendanna a=knz PR extracted from #105441. Informs #83650. Part of #98431. Epic: CRDB-26691 Prior to this patch, the API on the tenant info watcher would return an "undefined" metadata payload if called prior to the tenant record being created. This was most visible in the following scenario: 1. new cluster starts. watcher+rangefeed start successfully (tenant table empty) 2. tenant client connects. At this time there is no metadata for its tenant ID, so the metadata payload is available but empty. 3. CREATE TENANT is executed for the new tenant. 4. only then (later) the rangefeed introduces the metadata into the cache. This is insufficient for use by the KV tenant client RPCs: there we only want to accept incoming requests from tenant clients after we actually have seen metadata for them. This patch improves the situation by checking whether the tenant record is present before returning bogus data to the SQL tenant client. Simultaneously, we add error handling logic in the SQL tenant client to handle this case gracefully. In a mixed-version deployment (with and without this patch applied), the following behaviors will occur if one starts the SQL tenant server without a tenant record defined: - Unpatched server: Late startup error in client with "database 1 does not exist". - Patched server: Client loops forever, waiting for tenant record. Behavior when the tenant record is created shortly *after* the SQL tenant server starts up: - Unpatched server: Inconsistent / non-deterministic behavior. - Patched server: Clean startup of client after tenant record appears. Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Jul 26, 2023
105441: server: notify tenant servers of metadata changes r=yuzefovich,stevendanna a=knz Informs (towards fixing) #98431. Fixes #105721. Epic: CRDB-26691 See the individual commits for details. 106325: sql/parser: parse CREATE PROCEDURE r=mgartner a=mgartner #### sql/parser: parse CREATE PROCEDURE `CREATE PROCEDURE` statements can now be parsed by the SQL parser. Currently, executing `CREATE PROCEDURE` will produce an unimplemented error. Epic: CRDB-25388 Release note: None #### sem/tree: rename function AST nodes This commit renames UDF-related AST nodes with "function" in the name to use the more general term "routine". The term "routine" is inclusive of both UDFs and procedures (e.g., Postgres implements a `DROP ROUTINE` statement which drops both UDFs and procedures), which is fitting because we'll be using the same AST nodes for both `CREATE FUNCTION` and `CREATE PROCEDURE` statements. Release note: None #### sem/tree: rename udf.go to create_routine.go Release note: None 107470: server, ccl: adjust more tests to work with test tenant r=yuzefovich a=yuzefovich Informs: #76378. Epic: CRDB-18499. 107507: kvserver: transfer lease when acquiring lease outside preferences r=erikgrinaker,andrewbaptist a=kvoli First 2 commits are from #107505. When a leaseholder is lost, any surviving replica may acquire the lease, even if it violates lease preferences. There are two main reasons for this: we need to elect a new Raft leader who will acquire the lease, which is agnostic to lease preferences, and there may not even be any surviving replicas that satisfy the lease preferences at all, so we don't want to keep the range unavailable while we try to figure this out (considering e.g. network timeouts can delay this for many seconds). However, after acquiring a lease, we rely on the replicate queue to transfer the lease back to a replica that conforms with the preferences, which can take several minutes. In multi-region clusters, this can cause severe latency degradation if the lease is acquired in a remote region. This PR will detect lease preference violations when a replica acquires a new lease, and eagerly enqueue it in the replicate queue for transfer (if possible). If the first process fails, the replica will be sent to purgatory and retried soon after. Additionally, two roachtests are added for lease preference conformance timing. The first test, `.../partial-first-preference-down`, takes down one of the preferred locality nodes (1/2). The second test, `.../full-first-preference-down`, takes down both the preferred locality nodes (2/2). Resolves #106100. Epic: none Release note (bug fix): When losing a leaseholder and using lease preferences, the lease can be acquired by any other replica (regardless of lease preferences) in order to restore availability as soon as possible. The new leaseholder will now immediately check if it violates the lease preferences, and attempt to transfer the lease to a replica that satisfies the preferences if possible. 107512: sql: fix crdb_internal.encode_key in some contexts r=yuzefovich a=yuzefovich Previously, we forgot to set `CatalogBuiltins` for the internal planner which is used by `crdb_internal.encode_key` (which, in turn, is used to power `SHOW RANGE ... FOR ROW`), so evaluating it would lead to an error. For example, the internal planner is used in the backfill. This is now fixed. Fixes: #106397. Release note (bug fix): CockroachDB would previously return an error when using `SHOW RANGE ... FOR ROW ...` in `CREATE TABLE ... AS ...` construct, and this is now fixed. 107537: workload/ycsb: default --read-modify-write-in-txn to false r=erikgrinaker a=nvanbenschoten The `read-modify-write-in-txn` flag was added to ycsb in #103117. This commit changes the default of the flag from true to false. This makes the default configuration of ycsb more closely mirror the official ycsb implementation, to avoid confusion when external users run the workload and compare. We saw in #103117 and in #107517 that this improves throughput substantially. We will expect to see the same in roachperf once this change is merged. After merging the PR, I will add a roachperf annotation. Epic: None Release note: None 107549: persistedsqlstats: add AOST clause to statement statistics size check r=maryliag a=THardy98 Epic: None This change as an AOST clause to the statement statistics size check. This helps mitigate contention on the `system.statement_statistics` table which has caused the sql stats compaction job to fail. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Thomas Hardy <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
This issue covers 4 related threads of work in epic CRDB-26691:
Architectural overview / tech strategy
We will choose a common, shared architecture for all 4 threads of work, which will reduce the amount of work overall by sharing a big chunk of technology between them:
TenantSettings
and its result payload for this.(this will add a tiny extra startup latency to SQL pods, but we have sign off from the Serverless team on that -- it will not incur cross-region latency since the retrieved data is already in RAM in each KV node)
Work threads
The following list is the logical description of work, but the work items are actually shared across each group of tasks (see summary of work at the bottom).
A. Group capabilities and setting overrides into a single data structure (this is an infra dependency for the work below)
B. Propagate capabilities to tenants
C. Propagate Service mode to tenants
D. Propagate Injected SQL
E. "Exec response" API - eventually needed for multiregion serverless and other purposes
Actual work (execution)
X1. Extract subset of #94856 that produces initial config for system tenant (D.1)X2. Define common data structure and notification mechanism (A.1, A.2, C.2, D.2, E.1)
X3. Extend TenantSettings response payload to support more payload types (B.1, C.3, D.3)X4. Extend TenantSettings RPC handler to propagate more data upon change notifications (B.2, C.4, D.4)X5. Extend tenant connector to receive more data from TenantSettings (B.3, C.5, D.5)X6. Advertise availability of caps tenant-side to team (B.5)X7. Define new system tables and poll from them to refresh in-memory cache (D.6, E.4, E.5, E.6)
X8. Rebase #98380 to define injected SQL, close #94856 and related RFC (D.8)X9. Rebase #96144, close #93145 and #83650. -- in progress
X10. Implement ExecResponse API, write-through to system table and filter injected SQL (E.2, E.3)
X11. Followup to #98380 to ping back storage cluster via ExecResponse API on job completion (E.7)
Jira issue: CRDB-25263
Epic CRDB-26691
The text was updated successfully, but these errors were encountered: