-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review DCR's ASG Configuration #8345
Comments
Instance Types ASG Size Scaling Policy |
Something I think that is also worth adding to this discussion is how we might have scaling work in lockstep with Out current setup of code, not-colocated, makes it seem as though there is no coupling between this AWS configuration, and frontend's AWS configuration. I'd suggest there is strong coupling. If we scale up the We should probably consider that we are planning on being able to scale up separately across endpoints in |
Thanks for writing this up @JamieB-gu!
IIUC this app (and other Dotcom apps) are currently using simple scaling1. AWS have essentially deprecated this style of scaling and are promoting their alternative solutions as a best practice: I would recommend experimenting with these alternative strategies (regardless of which metric you decide to base your scaling on) if you have some time. Footnotes
|
Worth noting that even though we're paying for a |
@AshCorr Snap. I'm looking at configuring clustering |
Talking to myself - there is an issue that speaks to this and might be considered in conjunction with this as we can think of scaling as domain and traffic specific to those parts of the site. |
hey @jamesgorrie I've moved this into the triage column - what do you think the next step is for this piece of work. Do you think it should be prioritised as a 'high impact' ? |
Moving to the backlog - please note this isn't in our list for planning for the near future, please shout if you disagree though (and feel free to add to the list to be discussed in planning: https://docs.google.com/document/d/1-ls95KamOB-lvwKzTUfqpd3gSwcvgszC7fsMIOhXacM/edit) |
We'll be prioritising PM2/Vulnerabilities at the moment, and working out a way to surface this on a backlog (we'll try and trial a new approach to health for the new Q) |
Now that we're serving fronts traffic it might be a good time to review, to establish whether we're:
Suggestions derived from conversations with @alinaboghiu and @jacobwinch .
Tasks
Instance Types
We're currently using
t4g.small
in production:dotcom-rendering/dotcom-rendering/scripts/deploy/riff-raff.yaml
Lines 16 to 19 in e234547
Do we want to consider other instance types? Frontend uses a different type, as mentioned in #7440.
ASG Size
We're currently using a minimum of 15 and a maximum of 60 instances in production:
dotcom-rendering/dotcom-rendering/cloudformation.yml
Lines 72 to 78 in e234547
Do these limits allow enough headroom for our additional traffic, scaling requirements, and RiffRaff deploys?
Scaling Policy
We're currently scaling based on latency:
dotcom-rendering/dotcom-rendering/cloudformation.yml
Lines 312 to 331 in e234547
and scaling up by doubling our capacity every 10 minutes:
dotcom-rendering/dotcom-rendering/cloudformation.yml
Lines 304 to 310 in e234547
whilst scaling down by removing an instance once every 2 minutes:
dotcom-rendering/dotcom-rendering/cloudformation.yml
Lines 296 to 302 in e234547
Do we want to consider other scaling strategies? Apps-rendering, for example, via
guardian/cdk
, scales based on a target CPU utilisation:dotcom-rendering/apps-rendering/cdk/lib/mobile-apps-rendering.ts
Lines 104 to 106 in e234547
@akash1810 recommends to have a completed test before Christmas holidays and base on that we decide what to do.
The text was updated successfully, but these errors were encountered: