-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Optional TTL for MembershipTables #9164
Comments
dmorganMsft
pushed a commit
to dmorganMsft/orleans
that referenced
this issue
Nov 9, 2024
This is an initial implementation of dotnet#9164 that will work for newly-created membership tables. It does not attempt to address updating existing tables or rows. See https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL for documentation for `default_time_to_live` in Cassandra table creation.
dmorganMsft
pushed a commit
to dmorganMsft/orleans
that referenced
this issue
Nov 9, 2024
This is an initial implementation of dotnet#9164 that will work for newly-created membership tables. It does not attempt to address updating existing tables or rows. See https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL for documentation on `default_time_to_live` in Cassandra table creation.
dmorganMsft
pushed a commit
to dmorganMsft/orleans
that referenced
this issue
Nov 9, 2024
This is an initial implementation of dotnet#9164 that will work for newly-created membership tables. It does not attempt to address updating existing tables or rows. See https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL for documentation on `default_time_to_live` in Cassandra table creation.
dmorganMsft
added a commit
to dmorganMsft/orleans
that referenced
this issue
Nov 9, 2024
This is an initial implementation of dotnet#9164 that will work for newly-created membership tables. It does not attempt to address updating existing tables or rows. See https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL for documentation on `default_time_to_live` in Cassandra table creation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While there is facility for cleaning up defunct silos in a cluster that is still running (via a periodic task), there is nothing to clean up defunct clusters that aren't running anymore (such as for blue/green deployments or ephemeral environments used for dev/test).
One of the challenges is that the periodic defunct silo cleanup uses running Silos to perform the work, while a defunct Silo cleanup would be something that happens after there are no longer any running Silos from that cluster.
Proposal is to define the requirements for optional behavior that Clustering providers can implement to achieve this cleanup.
Overview
Clustering providers can allow configuration to opt-in to have the clustering provider handle Defunct Silo cleanup that is otherwise only handled by the periodic task controlled by DefunctSiloCleanupPeriod. This will either apply to individual MembershipEntries for a specific ServiceId/ClusterId or overall to an entire MembershipTable for a specific ServiceId/ClusterId so that it will be cleaned up by the underlying storage after the corresponding Silos are not running for the TTL/expiry period.
Updating the
IAmAliveTime
must refresh the TTL for the entry or table. This is the mechanism for keeping it active.[Proposed] Requirements
Questions
Where should this be configured?
### Option 1: Generic cluster optionsThis is the easiest for usage by consumers but it's possibly not the most clear in situations where an underlying provider doesn't support this optional TTL (either because the underlying storage doesn't have appropriate TTL support or because it simply hasn't been implemented). This could be mitigated by a runtime exception but doesn't seem like the best user experience.
Option 2: Provider specific configuration
This is how the feature should be configured on or off. Only a single value is needed for on/off as the DefunctSiloExpiration already provides the time span for when an entry should be considered defunct.
This is more work for each clustering implementation and it requires that the clustering implementation provides the implementation in a compliant way to the requirements. However, it does ensure that a consumer will only be presented with such an option if the underlying clustering provider supports this optional functionality.
How to handle the configured value being different from existing storage
Some storage implementations don't allow blanket updates of TTL/expiry. For example, Cassandra has a
default_time_to_live
that can be applied to a table (that applies to all rows unless overridden on a write). The table can be altered to add a TTL, but that doesn't change any of the existing rows. If there is an existing table that doesn't have a TTL defined then there are several options on how to handle things:Disable Periodic Defunct Silo Cleanup?
If a specific clustering provider implementation uses per-MembershipEntry TTL then the periodic Defunct Silo cleanup is redundant, as the TTL/expiry will take care of clearing old entries.
Nothing horrible will happen if the periodic task continues to get run, but it is unnecessary.
What could this look like?
Cassandra
This clustering provider has a single table for all MembershipEntries (all ServiceId/ClusterId). If a ClusterTTL is specified, then when the table is created, a
WITH default_time_to_live = 6000; -- TTL in seconds
can be added (with whatever TTL is actually specified) which adds a TTL to each row of the table. The IAmAliveTime writes refresh that TTL for the specific MembershipEntry.If all the Silos for a specific cluster are stopped, then after the TTL period all of the entries will be expired and the MembershipTable will have been cleaned up.
RavenDB
This is currently not a public implementation, but it uses a single document to model the MembershipTable. An expiry (TTL) is applied to the MembershipTable document that is updated on writes (including IAmAliveTime updates).
If all Silos for a specific cluster are stopped, then after the TTL period the MembershipTable will expire and will have been cleaned up.
The text was updated successfully, but these errors were encountered: