Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable maxConcurrentReconciles and pollInterval with sensible defaults #140

Merged
merged 2 commits into from
Nov 14, 2022

Conversation

turkenh
Copy link
Contributor

@turkenh turkenh commented Nov 10, 2022

Description of your changes

This PR makes maxConcurrentReconciles and pollInterval configurable with the proposed defaults experimented here.
It follows the same approach we used in the community providers to use input arguments as internal configurations.

Fixes crossplane/upjet#116

I have:

  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

See crossplane/upjet#116 (comment)

cmd/provider/main.go Outdated Show resolved Hide resolved
@@ -37,13 +37,14 @@ func main() {
var (
app = kingpin.New(filepath.Base(os.Args[0]), "AWS support for Crossplane.").DefaultEnvars()
debug = app.Flag("debug", "Run with debug logging.").Short('d').Bool()
syncPeriod = app.Flag("sync", "Controller manager sync period such as 300ms, 1.5h, or 2h45m").Short('s').Default("1h").Duration()
syncInterval = app.Flag("sync", "Sync interval controls how often all resources will be double checked for drift.").Short('s').Default("1h").Duration()
pollInterval = app.Flag("poll", "Poll interval controls how often an individual resource should be checked for drift.").Default("10m").Duration()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we increasing the poll interval from 1m to 10m?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the proposed value after my experiments in crossplane/upjet#116 (comment) with the motivation below:

In upjet based controllers a typical reconcile loop tooks longer than a native provider due to underlying implementation. If we go with a polling interval like 1m, we immediately cause the work queue to fill up especially when there are multiple instances of the same kind. I believe it is more reasonable to default to a higher value if a typical reconcile loop takes is in the order of ~10s (i.e. refresh/plan).

PollInterval: 1 * time.Minute,
MaxConcurrentReconciles: 1,
PollInterval: *pollInterval,
MaxConcurrentReconciles: *maxReconcileRate,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have separate parameters for the max reconcile rate (which is a global rate shared between all reconcilers) and the max concurrent reconciles (which is a count of workers consuming the workqueue of a controller)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulucinar I would like to be consistent with other providers and @negz has a detailed description here on why it makes sense to use the same parameter for both.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer @turkenh. I agree, we had better follow the convention suggested there.

Signed-off-by: Hasan Turken <[email protected]>
Copy link
Collaborator

@ulucinar ulucinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @turkenh, lgtm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upjet providers can't consume workqueue fast enough. Causes huge time-to-readiness delay
2 participants