Skip to content

Commit

Permalink
fix sentence structure
Browse files Browse the repository at this point in the history
  • Loading branch information
maleck13 committed Mar 12, 2024
1 parent 423190c commit fd0c3d1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion rfcs/0008-distributed-dns.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ The general flow in the Kuadrant operator follows a single path, where it will a
### DNS Operator / controller

We will update the DNS Operator to leverage the external DNS provider logic (IE effectively use it as a library) and layer our multi-cluster changes on top of this logic. Our hope is to keep what we are doing as compatible with external DNS as possible so that we can contribute / propose changes to external-dns. As part of this we will leverage the existing `plan` structure and logic that is responsible for "planning" and executing the required changes to a remote DNS zone. It is our intention to modify the plan code in order to allow for a shared host name reconciled across multiple clusters.
The DNSRecord created by the Kuadrant operator is reconciled by the DNSRecord controller as part of the DNS Operator component (the operator and controller terms can be used interchangeably). In the event of a change to the endpoints in the DNSRecord resource (including deletion and creation of the resource), the DNSRecord controller will first pull down the relevant records for that DNS name from the provider and store them in the DNSRecord status. Next via the external-dns plan, it will update the affected endpoints in that record set and validate the record set has no "dead ends" before writing the changes back to the DNS Provider's zone. Once the remote write is successful, it will then re-queue the DNSRecord for validation. The validation will be the same validation as previously done before the write. It will build a plan and if the plan based on the remote zone is empty of changes the validation is successful. The controller will then mark this in the status of the DNSRecord and will re-queue the validation for ~15 (example) minutes later (a stable state verification). If validation is unsuccessful, it will re-queue the DNSRecord for validation rapidly (5 seconds for example). With each re-queue after an unsuccessful validation, it will add a random amount of "jitter" time to increase the chance it moves out of sync with any other actors. Each time it re-queues, it will mark this in the status (see below). At any point, if there is a change to the on-cluster DNSRecord, this back off and validation will be reset and started again.
The DNSRecord created by the Kuadrant operator is reconciled by the DNSRecord controller as part of the DNS Operator component (the operator and controller terms can be used interchangeably). In the event of a change to the endpoints in the DNSRecord resource (including deletion and creation of the resource), the DNSRecord controller will first pull down the relevant records for that DNS name from the provider and store them in the DNSRecord status. Next via the external-dns plan, it will update the affected endpoints in that record set and validate the record set has no "dead ends" before writing the changes back to the DNS Provider's zone. Once the remote write is successful, it will then re-queue the DNSRecord for validation. The validation will be the same validation as previously done before the write: it will build a plan and if the plan based on the remote zone is empty of changes the validation is successful. The controller will then mark this in the status of the DNSRecord and will re-queue the validation for ~15 (example) minutes later (a stable state verification). If validation is unsuccessful, it will re-queue the DNSRecord for validation rapidly (5 seconds for example). With each re-queue after an unsuccessful validation, it will add a random amount of "jitter" time to increase the chance it moves out of sync with any other actors. Each time it re-queues, it will mark this in the status (see below). At any point, if there is a change to the on-cluster DNSRecord, this back off and validation will be reset and started again.

As each controller is responsible for part of the overall DNS record set for a given DNS name in a shared zones and potentially as values in a shared record and as neither the zone nor records are lockable, there will be scenarios where one controller overwrites some part of a zone/record with a stale view in some providers. This can happen if two or more controllers attempt to update the remote zone at the same time as each controller may have already read the zone before another clusters has executed it's write allowing the zone to be updated without the knowledge of other actors that have also done a read. During this type of timing clash, effectively the last controller to write will have its changes stored. When it does validation, its changes will still be present and it will revert to a stable cycle. The other controllers involved, will see their changes missing in the remote zone during their validation check and so will pull and update the zone again (setting a new validation with a random jitter applied to increase the chances they don't clash again). Again the last controller to write will see its changes. So with this we can predict a worst case scenario of (num clashing controllers * (validation_loop + jitter)). However with adding in the jitter, it is likely that this will be a shorter period of time as multiple clusters will fall out of sync and should resolve their state within the min-max requeue interval rather than only ever one at a time.

Expand Down

0 comments on commit fd0c3d1

Please sign in to comment.