Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,spanconfig: attribute usage costs to tenant-settable fields #77344

Open
irfansharif opened this issue Mar 3, 2022 · 5 comments
Open

sql,spanconfig: attribute usage costs to tenant-settable fields #77344

irfansharif opened this issue Mar 3, 2022 · 5 comments
Labels
A-multiregion Related to multi-region A-multitenancy Related to multi-tenancy A-zone-configs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Mar 3, 2022

Is your feature request related to a problem? Please describe.

Secondary tenants can set zone configs now (#67679). Before rolling it out, we ought to consider what fields should be settable by them, and/or what fields are considered by KV. The full set of settable parameters are:

// SpanConfig holds the configuration that applies to a given keyspan. It is a
// superset of the fields found in zonepb.zone.proto.
message SpanConfig {

There are a few reasons to tread carefully here (non-exhaustive):

  • We currently cost tenants based on logical bytes. Given secondary tenant zone configs allow users to configure the replication factor, there's nothing preventing them from replicating data N-ways where N is the number of nodes in the host cluster. This could be expensive for the host cluster.
  • Zone configs with our current costing make it possible for tenants to bus large amounts of data for free. By reconfiguring where the data is located (through constraints), it's possible to induce substantial egress.
  • Setting low range_min_bytes could cause a large number of range split for the tenant. It's worth noting that this isn't limited by the mechanisms proposed in kvserver: limit the number of kv spans configurable by a tenant #70555.
  • Setting super low or super high GC TTLs could cause KV to either scan for garbage frequently, or keep a lot more garbage data around (again, for free).

Describe the solution you'd like

If we're able to cost+rate limit things precisely, some of the aforementioned concerns go away. We could also start off with ignoring all fields down in KV and selectively accepting some of them as we gain confidence. That would however present as non-conformant ranges for secondary tenants, which could be bad-enough UX to instead disallow setting zone configs entirely.

Describe alternatives you've considered

See above.

Additional context

#67679 also powers MR abstractions for secondary tenants. Whatever we do has implications for MR usability/billing etc. We could also explore what it would mean allowing only MR set configs (though we lack the plumbing necessary today to distinguish between MR-set configs and user-set ones). Even for MR-set configs, some of the costing considerations above still apply.

+cc @ajstorm, @ajwerner, @nvanbenschoten, @arulajmani.

Jira issue: CRDB-13537

Epic CRDB-18604

@irfansharif irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-zone-configs A-multitenancy Related to multi-tenancy A-multiregion Related to multi-region labels Mar 3, 2022
@blathers-crl
Copy link

blathers-crl bot commented Mar 23, 2022

Hi @irfansharif, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@irfansharif irfansharif added the branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 label Mar 23, 2022
@irfansharif
Copy link
Contributor Author

I'm going to mark this as a GA-blocker to ensure at the very least we ship with #78299. That PR does not fully address what this issue talks about (cost attribution for all the work you can induce through zone configs), but it does provide safeguards for 22.1. Once that lands, we can get rid of the GA-blocker label. Given we've layered PTS/tenant backups on top of zone configs, perhaps we want to explicitly allow setting the protected-timestamp fields? If so, we should also disable this by default:

// TODO(irfansharif): This should be a tenant-readonly setting, possible after
// the work for #73349 is completed.
var secondaryTenantZoneConfigsEnabled = settings.RegisterBoolSetting(
settings.TenantWritable,
secondaryTenantsZoneConfigsEnabledSettingName,
"allow secondary tenants to set zone configurations; does not affect the system tenant",
true,
)

irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 23, 2022
\cockroachdb#77344 outlines the various ways how without additional cost
attribution or resource limiting, tenant zone configs provides
a sufficient API to induce (costly) work in the host cluster. While
better attribution/limiting is underway, this PR introduces two cluster
settings to effectively switch off this machinery all throughout, and/or
enable it selectively for specific tenants. These settings will also
help us roll out this functionality more gradually as we build the
controls described in cockroachdb#77344.

 - spanconfig.tenant_reconciliation_job.enabled, a tenant-readonly
   setting that allows the host to disable the span config reconcilation
   job across all tenants. Defaults to false.
 - spanconfig.tenant_kvaccessor.enabled, a system only setting that
   rejects all KVAccessor requests issued by secondary tenants. Defaults
   to false.

Release note: None
@arulajmani
Copy link
Collaborator

arulajmani commented Mar 23, 2022

Once that lands, we can get rid of the GA-blocker label. Given we've layered PTS/tenant backups on top of zone configs, perhaps we want to explicitly allow setting the protected-timestamp fields?

Given we've layered PTS/tenant backups on top of span configs, doesn't this entail a bit more than what the PR does? We can't blanket turn of the KVAccessor for all secondary tenants and have PTS support for secondary tenants -- we might instead need to validate span configs being upserted all have default values for their fields bar the PTS field. All this assuming #70555 is fixed as well.

@exalate-issue-sync exalate-issue-sync bot added the T-kv KV Team label Mar 30, 2022
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 31, 2022
Previously, `sql.zone_configs.allow_for_secondary_tenant.enabled` was a
regular old cluster setting that tenants could control. Now that we have
a notion of tenant read-only setting, this feels like a great candidate
for it. In addition to making this switch, this patch also changes this
setting's default value to be false. See the linked issue for the
motivation around why.

The change to switch this setting to be tenant read-only necessitates
changes in how we run logic tests. This is because logic tests that
run as secondary tenants can no longer change this cluster setting
in the test file. We introduce a new
`tenant-cluster-setting-override-opt` logic test directive with an
option for `allow-zone-configs-for-secondary-tenants` to get this
working. When configured, the host alters this cluster setting
during test setup. This can later be extended for other read-only
settings in the future.

References cockroachdb#77344

Release note (sql change): Changes the default value of
`sql.zone_configs.allow_for_secondary_tenant.enabled` to be false.
 Moreover, this setting is no longer settable by secondary tenants.
Instead, it's now a tenant read-only cluster setting.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 31, 2022
Previously, `sql.zone_configs.allow_for_secondary_tenant.enabled` was a
regular old cluster setting that tenants could control. Now that we have
a notion of tenant read-only setting, this feels like a great candidate
for it. In addition to making this switch, this patch also changes this
setting's default value to be false. See the linked issue for the
motivation around why.

The change to switch this setting to be tenant read-only necessitates
changes in how we run logic tests. This is because logic tests that
run as secondary tenants can no longer change this cluster setting
in the test file. We introduce a new
`tenant-cluster-setting-override-opt` logic test directive with an
option for `allow-zone-configs-for-secondary-tenants` to get this
working. When configured, the host alters this cluster setting
during test setup. This can later be extended for other read-only
settings in the future.

References cockroachdb#77344

Release note (sql change): Changes the default value of
`sql.zone_configs.allow_for_secondary_tenant.enabled` to be false.
 Moreover, this setting is no longer settable by secondary tenants.
Instead, it's now a tenant read-only cluster setting.
craig bot pushed a commit that referenced this issue Mar 31, 2022
79075: sysutil: explicitly chown file in TestGetFileACLInfo r=tbg,Xiang-Gu a=stevendanna

This test assumed that the GID of a newly created file would match the
GID of the creating process.

While this is a reasonable assumption on Linux, it isn't strictly true
across all unix-like operating systems and filesystem..

_The Linux Programming Interface_ summarises the situation on Linux:

> The group ID of the new file may be taken from either the effective
group ID of the process (equivalent to the System V default behavior)
or the group ID of the parent directory (the BSD behavior)...Which of
the two values is used as the new file's group ID is determined by
various factors, including the type of file system on which the new
file is created. [0]

On macOS, the default behavior appears to be the BSD behavior, leading
to this test failing under Bazel when the temporary directory's GID is
0 (wheel) instead of the user's primary group ID.

To solve this, we explicitly change the ownership of the file to a
known UID and GID.

[0] Michael Kerrisk. 2010. The Linux Programming Interface: A Linux
and UNIX System Programming Handbook (1st. ed.). No Starch Press,
USA. Page 291.

Release note: None

79096: dev: add `--no-rebuild-cockroach` flag to some commands that use Docker r=mari-crl a=rickystewart

Release note: None

79097: sql: disallow tenants from setting zone configs by default r=ajwerner a=arulajmani

Previously, `sql.zone_configs.allow_for_secondary_tenant.enabled` was a
regular old cluster setting that tenants could control. Now that we have
a notion of tenant read-only setting, this feels like a great candidate
for it. In addition to making this switch, this patch also changes this
setting's default value to be false. See the linked issue for the
motivation around why.

The change to switch this setting to be tenant read-only necessitates
changes in how we run logic tests. This is because logic tests that
run as secondary tenants can no longer change this cluster setting
in the test file. We introduce a new
`tenant-cluster-setting-override-opt` logic test directive with an
option for `allow-zone-configs-for-secondary-tenants` to get this
working. When configured, the host alters this cluster setting
during test setup. This can later be extended for other read-only
settings in the future.

References #77344


Release note (sql change): Changes the default value of
`sql.zone_configs.allow_for_secondary_tenant.enabled` to be false.
 Moreover, this setting is no longer settable by secondary tenants.
Instead, it's now a tenant read-only cluster setting.


79125: dev: fix panic with multiple targets and extra args r=ajwerner a=ajwerner

The 3rd parameter is in terms of the pre-sliced offsets, not the post. Before
this change, the added test would panic.

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: arulajmani <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Mar 31, 2022
Previously, `sql.zone_configs.allow_for_secondary_tenant.enabled` was a
regular old cluster setting that tenants could control. Now that we have
a notion of tenant read-only setting, this feels like a great candidate
for it. In addition to making this switch, this patch also changes this
setting's default value to be false. See the linked issue for the
motivation around why.

The change to switch this setting to be tenant read-only necessitates
changes in how we run logic tests. This is because logic tests that
run as secondary tenants can no longer change this cluster setting
in the test file. We introduce a new
`tenant-cluster-setting-override-opt` logic test directive with an
option for `allow-zone-configs-for-secondary-tenants` to get this
working. When configured, the host alters this cluster setting
during test setup. This can later be extended for other read-only
settings in the future.

References #77344

Release note (sql change): Changes the default value of
`sql.zone_configs.allow_for_secondary_tenant.enabled` to be false.
 Moreover, this setting is no longer settable by secondary tenants.
Instead, it's now a tenant read-only cluster setting.
@irfansharif irfansharif removed GA-blocker branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 labels Apr 4, 2022
@irfansharif
Copy link
Contributor Author

Removing the GA-blocker label now that we've merged #79097. It disallows tenants directly modification zone config fields, gated by an operator controlled cluster setting. Protected timestamps that use span configs underneath the hood still work, and in environments where we don't care about tenant cost attribution, enabling the cluster setting will give secondary tenants full access to all fields. The cluster setting also allows us to enable the infrastructure on a tenant-by-tenant basis. The top-level issue talks about how use of zone configs could today induce work that isn't cost attributed back to the tenant. In environments where we care about that, we'll still want to do more work here. It's feasible also that we never expose zone configs directly in those environments, or only a subset (like replication factor).

@irfansharif irfansharif removed the T-kv KV Team label Apr 4, 2022
@irfansharif irfansharif changed the title sql,spanconfig: control what fields are settable by secondary tenants sql,spanconfig: attribute usage costs to tenant-settable fields Apr 4, 2022
@exalate-issue-sync exalate-issue-sync bot changed the title sql,spanconfig: attribute usage costs to tenant-settable fields sql,spanconfig: control what fields are settable by secondary tenants Apr 4, 2022
@exalate-issue-sync exalate-issue-sync bot added the T-kv KV Team label Apr 4, 2022
@exalate-issue-sync exalate-issue-sync bot changed the title sql,spanconfig: control what fields are settable by secondary tenants sql,spanconfig: attribute usage costs to tenant-settable fields Apr 4, 2022
arulajmani added a commit that referenced this issue Apr 4, 2022
Previously, `sql.zone_configs.allow_for_secondary_tenant.enabled` was a
regular old cluster setting that tenants could control. Now that we have
a notion of tenant read-only setting, this feels like a great candidate
for it. In addition to making this switch, this patch also changes this
setting's default value to be false. See the linked issue for the
motivation around why.

The change to switch this setting to be tenant read-only necessitates
changes in how we run logic tests. This is because logic tests that
run as secondary tenants can no longer change this cluster setting
in the test file. We introduce a new
`tenant-cluster-setting-override-opt` logic test directive with an
option for `allow-zone-configs-for-secondary-tenants` to get this
working. When configured, the host alters this cluster setting
during test setup. This can later be extended for other read-only
settings in the future.

References #77344

Release note (sql change): Changes the default value of
`sql.zone_configs.allow_for_secondary_tenant.enabled` to be false.
 Moreover, this setting is no longer settable by secondary tenants.
Instead, it's now a tenant read-only cluster setting.
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 4, 2022
\cockroachdb#77344 outlines the various ways how without additional cost
attribution or resource limiting, tenant zone configs provides
a sufficient API to induce (costly) work in the host cluster. While
better attribution/limiting is underway, this PR introduces two cluster
settings to effectively switch off this machinery all throughout, and/or
enable it selectively for specific tenants. These settings will also
help us roll out this functionality more gradually as we build the
controls described in cockroachdb#77344.

 - spanconfig.tenant_reconciliation_job.enabled, a tenant-readonly
   setting that allows the host to disable the span config reconcilation
   job across all tenants. Defaults to false.
 - spanconfig.tenant_kvaccessor.enabled, a system only setting that
   rejects all KVAccessor requests issued by secondary tenants. Defaults
   to true, but acts as a global kill switch to reject all
   tenant reconciliation requests.

Release note: None
@irfansharif
Copy link
Contributor Author

+cc @knz's #85954 (for some reason the github cross linking thing didn't appear above). That RFC talks about whether the ability to set certain zone config fields or not can be modeled as "capabilities" that a tenant has/doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multiregion Related to multi-region A-multitenancy Related to multi-tenancy A-zone-configs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants