-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SearchBackPressure Policy/Decider Generic Framework #2
base: main
Are you sure you want to change the base?
Conversation
…zon.com) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
….com) Signed-off-by: CoderJeffrey <[email protected]>
…amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…@amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
…y Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…@amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
… Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
… (signed-off-by: Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…ed-off-by: Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…f-by: Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…e level (Signed-off-by: Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…zon.com) Signed-off-by: CoderJeffrey <[email protected]>
…mazon.com) Signed-off-by: CoderJeffrey <[email protected]>
…@amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
…: Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…rchBP (Signed-off-by: Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
…@amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…iu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…ffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…y Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
} | ||
|
||
public static class Constants { | ||
public static final String INCREASE_STR = "increase"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the var name as INCREASE
should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to INCREASE
* serch-back-pressure-policy should be enabled "hour-breach-threshold": 30, // threshold for hourly | ||
* received unhealthy cluster level rca flow units, if above, then the below thresholds should be | ||
* modified "threshold_count": 2, // how many thresholds to be changed, in this case | ||
* search-heap-threshold and search-task-heap-threshold "search_task_heap_stepsize_in_percentage": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need these percentages individually for all the heap related settings. We can keep a single value which implies the percentage increase in all of the heap related SBP settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static final String SEARCHBP_HEAP_STEPSIZE_IN_PERCENTAGE =
"searchbp-heap-stepsize-in-percentage";
is now the value to imply the percentage increase in all of heap related SBP settings. Comments (javadoc) alsol changed accordingly.
"hour-monitor-window-size-minutes"; | ||
private static final String HOUR_MONITOR_BUCKET_SIZE_MINUTES = | ||
"hour-monitor-bucket-size-minutes"; | ||
private static final String SEARCHBP_HEAP_STEPSIZE_IN_PERCENTAGE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed and now getMonitorWindowSizeMinutes() and getHourMonitorBucketSizeMinutes() return default values.
public int getHourMonitorWindowSizeMinutes() {
return DEFAULT_HOUR_MONITOR_WINDOW_SIZE_MINUTES;
}
public int getHourMonitorBucketSizeMinutes() {
return DEFAULT_HOUR_MONITOR_BUCKET_SIZE_MINUTES;
}
SearchBackPressurePolicy searchBackPressurePolicy; | ||
|
||
private int currentIteration = 0; | ||
private SearchBackPressureClusterRCA searchBackPressureClusterRCA; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information is redundant since we are not using it anywhere, I would suggest to remove this member variable from the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The currentIteration
makes sure new decision will be emitted only per period
private void record(HotResourceSummary issue) { | ||
LOG.info("SearchBackPressurePolicy#record()"); | ||
if (HEAP_SEARCHBP_SHARD_SIGNALS.contains(issue.getResource())) { | ||
LOG.info("Recording shard-level issue"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can refactor the nested if blocks into meaningful methods here. e,g; recordShardTaskIssue
} | ||
|
||
initialize(); | ||
LOG.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we logging this here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
private long coolOffPeriodInMillis; | ||
private String thresholdName; | ||
private String dimension; | ||
private String direction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an enum to limit the possible values for this var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean an enum like this? but in the constructor, the field should still be a String?
public enum ThresholdName {
HEAP_THRESHOLD("searchbp-heap-threshold"),
// ...
private final String value;
ThresholdName(final String value) {
this.value = value;
}
@Override
public String toString() {
return value;
}
}
|
||
recordIssues(); | ||
|
||
if (shardHeapThresholdIsTooSmall()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shard and SearchTask related breach should be done separately to avoid a potential bug when both of them needs to be adjusted.
@Override | ||
public void next(SlidingWindowData e) { | ||
boolean pollFirstCondition; | ||
if (isMinSlidingWindow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic can be refactored into a lambda function and have that lambda as a member variable.
* threshold for search back pressure service stats | ||
* currently, only consider the percentage of JVM Usage cancellation count compared to the total cancellation count | ||
*/ | ||
this.heapShardCancellationIncreaseMaxThreshold = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets discuss about these
Signed-off-by: CoderJeffrey <[email protected]>
eba2b69
to
bbcdfa1
Compare
…zon.com) Signed-off-by: CoderJeffrey <[email protected]>
… [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
… Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…com) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…frey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…iu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…azon.com) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…@amazon.com) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…y Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…com) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
…[email protected]) Signed-off-by: CoderJeffrey <[email protected]>
…: Jeffrey Liu [email protected]) Signed-off-by: CoderJeffrey <[email protected]>
Signed-off-by: CoderJeffrey <[email protected]>
Is your feature request related to a problem? Please provide an existing Issue # , or describe.
Previous work related to this PR:
For Search Back Pressure Service, previous PR (opensearch-project/performance-analyzer#483) (opensearch-project#427) (opensearch-project#437) have established a pipeline using Collector to read SearchBackPressure Cancellation Stats, using Reader to consume the metrics and populate to the sqlitedb, using Cluster/Node level RCA to consume stats in sqlitedb and compare stats with thresholds to generate Healthy/Unhealthy Flow Unit.
Problem:
We nee a Policy/Decider framework to suggest actions based on cluster level RCAs to tune the thresholds for SearchBackPressure Service.
Describe the solution you are proposing
This PR added the Policy/Decider Framework that consumed the cluster level RCA and suggestion actions to downstream classes (e.g. Users).
Task Breakdown:
Policy
Class to determine when we need to suggest actions to user to auto-tune the thresholds for Search Back Pressure Service.Decider
Class to consume the cluster-level RCA and use the Policy to send the actions to downstream classes.PolicyConfig
file to read key thresholds/settings from Config files to improve extensibility forSearchBackPressureDecider
.Action
Class for actions that will be sent to downstream classes to suggest user to tune the thresholds.Describe alternatives you've considered
N/A
Additional context
Currently, the framework only suggest actions to tune heap related thresholds for Search Back Pressure Service.
Final policy logic to determine when to send out actions to users have not been decided yet. This PR mainly offers an extensible Decider/Policy framework.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.