Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable maxConcurrentReconciles and pollInterval with sensible defaults #140

Merged
merged 2 commits into from
Nov 14, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions cmd/provider/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,14 @@ func main() {
var (
app = kingpin.New(filepath.Base(os.Args[0]), "AWS support for Crossplane.").DefaultEnvars()
debug = app.Flag("debug", "Run with debug logging.").Short('d').Bool()
syncPeriod = app.Flag("sync", "Controller manager sync period such as 300ms, 1.5h, or 2h45m").Short('s').Default("1h").Duration()
syncInterval = app.Flag("sync", "Sync interval controls how often all resources will be double checked for drift.").Short('s').Default("1h").Duration()
pollInterval = app.Flag("poll", "Poll interval controls how often an individual resource should be checked for drift.").Default("10m").Duration()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we increasing the poll interval from 1m to 10m?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the proposed value after my experiments in crossplane/upjet#116 (comment) with the motivation below:

In upjet based controllers a typical reconcile loop tooks longer than a native provider due to underlying implementation. If we go with a polling interval like 1m, we immediately cause the work queue to fill up especially when there are multiple instances of the same kind. I believe it is more reasonable to default to a higher value if a typical reconcile loop takes is in the order of ~10s (i.e. refresh/plan).

leaderElection = app.Flag("leader-election", "Use leader election for the controller manager.").Short('l').Default("false").OverrideDefaultFromEnvar("LEADER_ELECTION").Bool()
maxReconcileRate = app.Flag("max-reconcile-rate", "The global maximum rate per second at which resources may be checked for drift from the desired state.").Default("10").Int()
terraformVersion = app.Flag("terraform-version", "Terraform version.").Required().Envar("TERRAFORM_VERSION").String()
providerSource = app.Flag("terraform-provider-source", "Terraform provider source.").Required().Envar("TERRAFORM_PROVIDER_SOURCE").String()
providerVersion = app.Flag("terraform-provider-version", "Terraform provider version.").Required().Envar("TERRAFORM_PROVIDER_VERSION").String()
nativeProviderPath = app.Flag("terraform-native-provider-path", "Terraform native provider path for shared execution.").Default("").Envar("TERRAFORM_NATIVE_PROVIDER_PATH").String()
maxReconcileRate = app.Flag("max-reconcile-rate", "The global maximum rate per second at which resources may checked for drift from the desired state.").Default("10").Int()

namespace = app.Flag("namespace", "Namespace used to set as default scope in default secret store config.").Default("crossplane-system").Envar("POD_NAMESPACE").String()
enableExternalSecretStores = app.Flag("enable-external-secret-stores", "Enable support for ExternalSecretStores.").Default("false").Envar("ENABLE_EXTERNAL_SECRET_STORES").Bool()
Expand All @@ -60,15 +61,16 @@ func main() {
ctrl.SetLogger(zl)
}

log.Debug("Starting", "sync-period", syncPeriod.String())
log.Debug("Starting", "sync-interval", syncInterval.String(),
"poll-interval", pollInterval.String(), "max-reconcile-rate", *maxReconcileRate)

cfg, err := ctrl.GetConfig()
kingpin.FatalIfError(err, "Cannot get API server rest config")

mgr, err := ctrl.NewManager(ratelimiter.LimitRESTConfig(cfg, *maxReconcileRate), ctrl.Options{
LeaderElection: *leaderElection,
LeaderElectionID: "crossplane-leader-election-provider-aws",
SyncPeriod: syncPeriod,
SyncPeriod: syncInterval,
LeaderElectionResourceLock: resourcelock.LeasesResourceLock,
LeaseDuration: func() *time.Duration { d := 60 * time.Second; return &d }(),
RenewDeadline: func() *time.Duration { d := 50 * time.Second; return &d }(),
Expand All @@ -91,8 +93,8 @@ func main() {
Options: xpcontroller.Options{
Logger: log,
GlobalRateLimiter: ratelimiter.NewGlobal(*maxReconcileRate),
PollInterval: 1 * time.Minute,
MaxConcurrentReconciles: 1,
PollInterval: *pollInterval,
MaxConcurrentReconciles: *maxReconcileRate,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have separate parameters for the max reconcile rate (which is a global rate shared between all reconcilers) and the max concurrent reconciles (which is a count of workers consuming the workqueue of a controller)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulucinar I would like to be consistent with other providers and @negz has a detailed description here on why it makes sense to use the same parameter for both.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer @turkenh. I agree, we had better follow the convention suggested there.

Features: &feature.Flags{},
},
Provider: config.GetProvider(),
Expand Down