-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add trafficReplayerMaxUptime parameter and functionality #620
Add trafficReplayerMaxUptime parameter and functionality #620
Conversation
8941e89
to
73c1582
Compare
startPeriod: Duration.seconds(startupPeriodSeconds * 2) | ||
} | ||
}); | ||
maxUptimeContainer.addContainerDependencies({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment to say which way the dependency is running? Who depends upon whom.
What does "START" mean two lines below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
command: [ | ||
"CMD-SHELL", | ||
"UPTIME=$(awk '{print int($1)}' /proc/uptime); " + | ||
`test $UPTIME -gt ${startupPeriodSeconds} && ` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this line needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets the minimum time that a task should be up before considering it heatlhy. This is needed because we don't have healthchecks configured for our replayer.
We can reduce it lower than 30 seconds, and possibly remove it, but then we increase the risk of having a few second delay between the partitions being revoked on the existing task vs assigned to the new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. That makes sense. If you could add that as a comment, that would be great.
That feature is probably something that should be handled elsewhere (too), and all of these sorts of values are subject to lots of additional negotiation. Keeping track of them with tagged comments and putting them in the front of peoples minds can go a long way to help our future selves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
deployment/cdk/opensearch-service-migration/lib/service-stacks/migration-service-core.ts
Show resolved
Hide resolved
deployment/cdk/opensearch-service-migration/lib/service-stacks/migration-service-core.ts
Show resolved
Hide resolved
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #620 +/- ##
============================================
- Coverage 75.93% 75.88% -0.06%
+ Complexity 1496 1494 -2
============================================
Files 165 165
Lines 6362 6361 -1
Branches 573 573
============================================
- Hits 4831 4827 -4
- Misses 1150 1155 +5
+ Partials 381 379 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with a rec to add one more comment. Thanks!
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
79427e5
to
23b74d8
Compare
Description
This change adds a new deployment parameter for the trafficReplayer
trafficReplayerMaxUptime
along with functionality and documentation. This allows a user to set a max uptime after which AWS will startup a new container and kill the old one.This achieves the functionality by adding a sidecar "MaxUptimeContainer" which succeeds a healthcheck after a given startup time and fails after a given maxUptime. This allows ECS to provision a new task and wait for the startup time before sending SIGTERM to the old task.
Shutdown Behavior: Note, shutdown logs for the existing replayer were not observed and replayer. Created task to follow up on SIGTERM behavior in MIGRATIONS-1698
trafficReplayerMaxUptime
added, no change to default behavior when parameter is not specifiedIssues Resolved
MIGRATIONS-1670
Is this a backport? If so, please add backport PR # and/or commits #
Testing
Deployed to AWS and verified restarting behavior. Note, an observation on shutdown behavior was made resulting in followup MIGRATIONS-1698
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.