Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Add maximum concurrent reconciles command-line option to provider executable #4

Closed
wants to merge 3 commits into from

Conversation

ulucinar
Copy link
Collaborator

@ulucinar ulucinar commented Sep 17, 2021

Description of your changes

This PR proposes a change in provider-tf-azure that allows the maximum concurrent reconciles options for the controllers to be set as a command-line option. The default values stay unmodified at 1.
Depends on crossplane/terrajet#67.

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Manually tested this in the scope of the scalability tests I'm running with crossplane/terrajet#55. I will provide more context and a detailed discussion there for the motivation behind this PR.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
@negz
Copy link
Member

negz commented Sep 22, 2021

I was actually planning to add a flag like this to all providers, and have done some work in crossplane/crossplane-runtime#293 to make plumbing flags like this down to all the controllers easier. You can see a WIP example of its usage at crossplane/crossplane@master...negz:performant.

One thing I was considering was perhaps not explicitly exposing MaxConcurrentReconciles as a flag, but rather exposing a similar --max-reconciles flag that would configure the global reconcile rate limiter, then derive an appropriate MaxConcurrentReconciles value from that. I see crossplane/terrajet#67 has already been merged, and it's fine to merge this if it unblocks us, but I'd like to make sure we don't release any providers with these new flags until we're agreed on a consistent set of performance tuning flags that we'll expose across all providers.

@ulucinar
Copy link
Collaborator Author

ulucinar commented Sep 22, 2021

Hi @negz,
I will take a look at the pointers you provided. It would be great to have a common configuration API across all providers to tweak their performance.
My observation with max. concurrent reconciles is that, at its default value of 1, we cannot utilize CPU or memory, with large (~40) workerqueue depth and long wait times. And increasing it to just 2 is enough to saturate the CPU (of a single e2-medium GCP node because provider-tf-azure is running with a single replica) with only 40 MRs of the same kind and no other configuration changes. Terraform-based providers are CPU-bound and in my setup when I tried a larger limit (>=3) for concurrent reconciles, synchronous operations (observations) started timing out making the system unstable (CPU saturated, observations failing to complete in their designated context deadlines, CPU cycles thus wasted -> unusable system).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants