Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trafficReplayerMaxUptime parameter and functionality #620

Merged
merged 3 commits into from
May 1, 2024

Conversation

AndreKurait
Copy link
Member

@AndreKurait AndreKurait commented Apr 29, 2024

Description

This change adds a new deployment parameter for the trafficReplayer trafficReplayerMaxUptime along with functionality and documentation. This allows a user to set a max uptime after which AWS will startup a new container and kill the old one.

This achieves the functionality by adding a sidecar "MaxUptimeContainer" which succeeds a healthcheck after a given startup time and fails after a given maxUptime. This allows ECS to provision a new task and wait for the startup time before sending SIGTERM to the old task.

Shutdown Behavior: Note, shutdown logs for the existing replayer were not observed and replayer. Created task to follow up on SIGTERM behavior in MIGRATIONS-1698

  • Category: New Feature
  • Why these changes are required? This allows customers to have more controlled behavior of the traffic replayer by limiting edge case scenarios that can arise from a long running task like out of memory errors due to leaks and the time between startup and teardown increasing
  • What is the old behavior before changes and new behavior after changes? New parameter trafficReplayerMaxUptime added, no change to default behavior when parameter is not specified

Issues Resolved

MIGRATIONS-1670

Is this a backport? If so, please add backport PR # and/or commits #

Testing

Deployed to AWS and verified restarting behavior. Note, an observation on shutdown behavior was made resulting in followup MIGRATIONS-1698

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@AndreKurait AndreKurait marked this pull request as ready for review April 29, 2024 15:48
startPeriod: Duration.seconds(startupPeriodSeconds * 2)
}
});
maxUptimeContainer.addContainerDependencies({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment to say which way the dependency is running? Who depends upon whom.
What does "START" mean two lines below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

command: [
"CMD-SHELL",
"UPTIME=$(awk '{print int($1)}' /proc/uptime); " +
`test $UPTIME -gt ${startupPeriodSeconds} && ` +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this line needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets the minimum time that a task should be up before considering it heatlhy. This is needed because we don't have healthchecks configured for our replayer.

We can reduce it lower than 30 seconds, and possibly remove it, but then we increase the risk of having a few second delay between the partitions being revoked on the existing task vs assigned to the new one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That makes sense. If you could add that as a comment, that would be great.
That feature is probably something that should be handled elsewhere (too), and all of these sorts of values are subject to lots of additional negotiation. Keeping track of them with tagged comments and putting them in the front of peoples minds can go a long way to help our future selves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

deployment/cdk/opensearch-service-migration/options.md Outdated Show resolved Hide resolved
Copy link

codecov bot commented Apr 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.88%. Comparing base (0c6d19d) to head (57fe87a).
Report is 20 commits behind head on main.

❗ Current head 57fe87a differs from pull request most recent head 79427e5. Consider uploading reports for the commit 79427e5 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #620      +/-   ##
============================================
- Coverage     75.93%   75.88%   -0.06%     
+ Complexity     1496     1494       -2     
============================================
  Files           165      165              
  Lines          6362     6361       -1     
  Branches        573      573              
============================================
- Hits           4831     4827       -4     
- Misses         1150     1155       +5     
+ Partials        381      379       -2     
Flag Coverage Δ
unittests 75.88% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@gregschohn gregschohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a rec to add one more comment. Thanks!

@AndreKurait AndreKurait force-pushed the RestartingReplayer branch from 79427e5 to 23b74d8 Compare May 1, 2024 14:49
@AndreKurait AndreKurait merged commit 842f53a into opensearch-project:main May 1, 2024
5 of 6 checks passed
@AndreKurait AndreKurait deleted the RestartingReplayer branch May 1, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants