[Proposal] Optional TTL for MembershipTables #9164

rkargMsft · 2024-10-08T00:09:11Z

While there is facility for cleaning up defunct silos in a cluster that is still running (via a periodic task), there is nothing to clean up defunct clusters that aren't running anymore (such as for blue/green deployments or ephemeral environments used for dev/test).
One of the challenges is that the periodic defunct silo cleanup uses running Silos to perform the work, while a defunct Silo cleanup would be something that happens after there are no longer any running Silos from that cluster.

Proposal is to define the requirements for optional behavior that Clustering providers can implement to achieve this cleanup.

Overview

Clustering providers can allow configuration to opt-in to have the clustering provider handle Defunct Silo cleanup that is otherwise only handled by the periodic task controlled by DefunctSiloCleanupPeriod. This will either apply to individual MembershipEntries for a specific ServiceId/ClusterId or overall to an entire MembershipTable for a specific ServiceId/ClusterId so that it will be cleaned up by the underlying storage after the corresponding Silos are not running for the TTL/expiry period.
Updating the IAmAliveTime must refresh the TTL for the entry or table. This is the mechanism for keeping it active.

[Proposed] Requirements

This is optional functionality for a clustering provider. If there is no applicable TTL/expiry functionality in the underlying storage technology then this may never be implemented for some clustering providers.
The existing DefunctSiloExpiration value is used as the TTL
TTL or expiration is used on the underlying storage to clean up unused MembershipTable data
- This allows for the cleanup to happen after all Silos in a cluster have stopped
IAmAliveTime writes MUST update the TTL/expiry
- Other writes MAY update the TTL/expiry
Behavior is that if all MembershipEntries for a specific ServerId/ClusterId have expired (no writes in the TTL period) then the entries will be removed from the underlying storage without needing a running Silo to perform that work.
- This can be implemented by having individual TTL/expiry on each MembershipEntry (as opposed to a single TTL for the entire MembershipTable). This achieves the same outcome of having the MembershipTable data cleaned up once all Silos in a cluster are stopped. Additionally, this approach would proactively remove defunct silo entries instead of waiting for the periodic task to do this cleanup

Questions

Where should this be configured?

### Option 1: Generic cluster options
This is the easiest for usage by consumers but it's possibly not the most clear in situations where an underlying provider doesn't support this optional TTL (either because the underlying storage doesn't have appropriate TTL support or because it simply hasn't been implemented). This could be mitigated by a runtime exception but doesn't seem like the best user experience.

Option 2: Provider specific configuration

This is how the feature should be configured on or off. Only a single value is needed for on/off as the DefunctSiloExpiration already provides the time span for when an entry should be considered defunct.
This is more work for each clustering implementation and it requires that the clustering implementation provides the implementation in a compliant way to the requirements. However, it does ensure that a consumer will only be presented with such an option if the underlying clustering provider supports this optional functionality.

How to handle the configured value being different from existing storage

Some storage implementations don't allow blanket updates of TTL/expiry. For example, Cassandra has a default_time_to_live that can be applied to a table (that applies to all rows unless overridden on a write). The table can be altered to add a TTL, but that doesn't change any of the existing rows. If there is an existing table that doesn't have a TTL defined then there are several options on how to handle things:

Throw error if the table exists but has different TTL setting (either non-existent or different duration)
Throw error if the table exists but doesn't have TTL defined, but allow changing the period.
- Old rows will keep the prior TTL until written to again
Allow any update and keep existing rows unchanged
- This could lead to a user thinking that TTL was added but that doesn't apply to old rows
Allow any updates and patch up existing rows with writes to update the applied TTL for those rows

Disable Periodic Defunct Silo Cleanup?

If a specific clustering provider implementation uses per-MembershipEntry TTL then the periodic Defunct Silo cleanup is redundant, as the TTL/expiry will take care of clearing old entries.
Nothing horrible will happen if the periodic task continues to get run, but it is unnecessary.

What could this look like?

Cassandra

This clustering provider has a single table for all MembershipEntries (all ServiceId/ClusterId). If a ClusterTTL is specified, then when the table is created, a WITH default_time_to_live = 6000; -- TTL in seconds can be added (with whatever TTL is actually specified) which adds a TTL to each row of the table. The IAmAliveTime writes refresh that TTL for the specific MembershipEntry.
If all the Silos for a specific cluster are stopped, then after the TTL period all of the entries will be expired and the MembershipTable will have been cleaned up.

RavenDB

This is currently not a public implementation, but it uses a single document to model the MembershipTable. An expiry (TTL) is applied to the MembershipTable document that is updated on writes (including IAmAliveTime updates).
If all Silos for a specific cluster are stopped, then after the TTL period the MembershipTable will expire and will have been cleaned up.

The text was updated successfully, but these errors were encountered:

This is an initial implementation of dotnet#9164 that will work for newly-created membership tables. It does not attempt to address updating existing tables or rows. See https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL for documentation for `default_time_to_live` in Cassandra table creation.

This is an initial implementation of dotnet#9164 that will work for newly-created membership tables. It does not attempt to address updating existing tables or rows. See https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL for documentation on `default_time_to_live` in Cassandra table creation.

dmorganMsft mentioned this issue Nov 9, 2024

Add optional default TTL to Cassandra clustering #9221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Optional TTL for MembershipTables #9164

[Proposal] Optional TTL for MembershipTables #9164

rkargMsft commented Oct 8, 2024 •

edited

Loading

[Proposal] Optional TTL for MembershipTables #9164

[Proposal] Optional TTL for MembershipTables #9164

Comments

rkargMsft commented Oct 8, 2024 • edited Loading

Overview

[Proposed] Requirements

Questions

Where should this be configured?

Option 2: Provider specific configuration

How to handle the configured value being different from existing storage

Disable Periodic Defunct Silo Cleanup?

What could this look like?

Cassandra

RavenDB

rkargMsft commented Oct 8, 2024 •

edited

Loading