Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchBackPressure Policy/Decider Generic Framework #2

Open
wants to merge 77 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
46bd63e
Remove log files and add DCO (Signed-off-by: Jeffrey Liu ujeffliu@ama…
CoderJeffrey Jun 22, 2023
92c3fc8
Remove extra files (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jun 22, 2023
a47388a
Remove styling difference (Signed-off-by: Jeffrey Liu ujeffliu@amazon…
CoderJeffrey Jun 22, 2023
8b86501
Remove unnecessary file changes (Signed-off-by: Jeffrey Liu ujeffliu@…
CoderJeffrey Jun 22, 2023
c6549a9
Add RCA_Decider (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jun 26, 2023
a296384
Extract Heap Usage from SQlitedb (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jun 27, 2023
c44b928
Extract required searchbp metrics for deciders (signed-off-by: Jeffre…
CoderJeffrey Jun 27, 2023
c1e957d
Add SearchBackPressureRCA Metric (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jun 27, 2023
55e5cdd
Use SearchBackPressureRCAMetrics to aggregate metrics (signed-off-by:…
CoderJeffrey Jun 28, 2023
2898060
Add the conf file extracted part for SearchBackPressureRcaConfig.java…
CoderJeffrey Jun 28, 2023
48c92fb
Add MinMaxSlidingWindow in OldGenRca (Signed-off-by: Jeffrey Liu ujef…
CoderJeffrey Jun 28, 2023
c84cf34
Rename SearchBackPressureClusterRCA and add it to AnalysisGraph (Sign…
CoderJeffrey Jun 28, 2023
08f6927
Add basic UTs for SearchBackPressureRCA cluster/node level (Signed-of…
CoderJeffrey Jun 28, 2023
31e8b49
Add unhealthy/healthy stats UTs for SearchBackPressureRCA cluster/nod…
CoderJeffrey Jun 29, 2023
4bfa1b2
Add healthy resource unit UT (Signed-off-by: Jeffrey Liu ujeffliu@ama…
CoderJeffrey Jun 29, 2023
13e2d48
Add UT s both shard/task level (Signed-off-by: Jeffrey Liu ujeffliu@a…
CoderJeffrey Jun 29, 2023
5e3aed7
Add a new SearchBp Resource Unit (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jun 30, 2023
8d78c3b
Add UTs to test shard/task level resource include-ness (Signed-off-by…
CoderJeffrey Jun 30, 2023
9c5e832
Merge latest code change in main branch with Node/Cluster RCA for Sea…
CoderJeffrey Jul 3, 2023
55b8ec0
Remove styling changes for Version.java (Signed-off-by: Jeffrey Liu u…
CoderJeffrey Jul 3, 2023
12fe8a8
Add metadata to resourceSummary (Signed-off-by: Jeffrey Liu ujeffliu@…
CoderJeffrey Jul 6, 2023
1b7837d
Update to more general framework (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jul 6, 2023
8b059a8
(Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 6, 2023
c49e771
Refactor the MinMaxSlidingWindow and bug fix (Signed-off-by: Jeffrey …
CoderJeffrey Jul 7, 2023
648e94d
Refactor Heap Stats Metrics Getter(Signed-off-by: Jeffrey Liu ujeffli…
CoderJeffrey Jul 7, 2023
ca65059
Refactor HeapUsed and HeapMax Getters (Signed-off-by: Jeffrey Liu uje…
CoderJeffrey Jul 7, 2023
4c69fb3
Refactor operate() (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 7, 2023
cf92a61
Refactor operate() and remove dead comments (Signed-off-by: Jeffrey L…
CoderJeffrey Jul 7, 2023
8996edd
Add new ActionPojo for Searchbp (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jul 10, 2023
2aab197
Add new ActionPojo for Searchbp#2 (Signed-off-by: Jeffrey Liu ujeffl…
CoderJeffrey Jul 10, 2023
e51da5a
Update SearchBackPressureAction (Signed-off-by: Jeffrey Liu ujeffliu@…
CoderJeffrey Jul 10, 2023
6eb937a
Add Searchbp Decider (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 10, 2023
d317715
Add Searchbp Policy and Config (Signed-off-by: Jeffrey Liu ujeffliu@a…
CoderJeffrey Jul 10, 2023
8eab98a
Add dummy Searchbp Policy and Config in OpenSearchAnalysis Graph (Sig…
CoderJeffrey Jul 10, 2023
38d66e8
Update SearchBackpressure Policy (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jul 11, 2023
a20d6f0
Update SearchBackpressure Policy (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jul 11, 2023
87686ee
Merge branch 'main' into searchbp_policy
CoderJeffrey Jul 11, 2023
3f53461
Merged Main (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 11, 2023
5f35a9e
Merge branch 'main' into searchbp_policy
CoderJeffrey Jul 11, 2023
da31bdc
Add new AlarmMonitor (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 11, 2023
473270c
Workable decider (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 11, 2023
59f47c8
Workable decider (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 11, 2023
015593c
Workable pipeline (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 11, 2023
c8dbb2c
Workable pipeline (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 12, 2023
68f27c6
Update ActionPojo (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 12, 2023
4e56863
Framework can read from config (Signed-off-by: Jeffrey Liu ujeffliu@a…
CoderJeffrey Jul 13, 2023
1e9de37
Add Policy Increase/Decrease Alarm (Signed-off-by: Jeffrey Liu ujeff…
CoderJeffrey Jul 13, 2023
84d9d65
Generic Framework can generate shard/task and increase/decrease actio…
CoderJeffrey Jul 14, 2023
1d9e821
Generic Framework can generate specific actions and read from config …
CoderJeffrey Jul 14, 2023
d6558ed
removed dead comment for SearchBpActionConfig.java (Signed-off-by: Je…
CoderJeffrey Jul 14, 2023
e3bef94
removed dead comment for action/polict (Signed-off-by: Jeffrey Liu uj…
CoderJeffrey Jul 14, 2023
b0c7f02
removed dead comment for action/policy (Signed-off-by: Jeffrey Liu uj…
CoderJeffrey Jul 14, 2023
b6398fd
Add increase/decrease direction for ActionPojo (Signed-off-by: Jeffr…
CoderJeffrey Jul 17, 2023
2d40516
remove trailing (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 19, 2023
bbcdfa1
Restore to workable solution
CoderJeffrey Jul 19, 2023
11e2241
Restore to workable solution (Signed-off-by: Jeffrey Liu ujeffliu@ama…
CoderJeffrey Jul 19, 2023
2249395
remove hourly window size and bucket size (Signed-off-by: Jeffrey Liu…
CoderJeffrey Jul 19, 2023
d05f093
Add hourMonitorConfig to set up alarm monitor (Signed-off-by: Jeffrey…
CoderJeffrey Jul 19, 2023
f60a8d9
Remove unused counter (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 20, 2023
c6551f0
change field description (Signed-off-by: Jeffrey Liu ujeffliu@amazon…
CoderJeffrey Jul 20, 2023
ed42e7d
refactor (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 20, 2023
3709bf0
enum added for direction and shard/task dimension (Signed-off-by: Jef…
CoderJeffrey Jul 20, 2023
db55b03
Use enum for Dimension/Direction for Searchbp Action (Signed-off-by: …
CoderJeffrey Jul 21, 2023
f99424f
Add lambda function for next() in OldGenRCA (Signed-off-by: Jeffrey L…
CoderJeffrey Jul 21, 2023
521ffa8
Merged main (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 21, 2023
e34759b
Merged main with build.gradle (Signed-off-by: Jeffrey Liu ujeffliu@am…
CoderJeffrey Jul 21, 2023
a00dff4
resolve nit#1 (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 27, 2023
7befe7b
resolve nit#2 (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 27, 2023
de4d289
refactor OpenSearchAnalysisGraph (Signed-off-by: Jeffrey Liu ujeffliu…
CoderJeffrey Jul 27, 2023
4635d6c
stream() refactor (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 27, 2023
a2591cb
null check refactor (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 27, 2023
84c6b3b
Change LOG to debug level for unnecessary info (Signed-off-by: Jeffre…
CoderJeffrey Jul 27, 2023
630608a
refactor shar/task issue (Signed-off-by: Jeffrey Liu ujeffliu@amazon.…
CoderJeffrey Jul 27, 2023
570d488
nit fix (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Jul 27, 2023
82d9033
Added JavaDoc for Searchbp Action (Signed-off-by: Jeffrey Liu ujeffli…
CoderJeffrey Jul 28, 2023
c751032
SearchBackPressureIssue Interface created for refactor (Signed-off-by…
CoderJeffrey Jul 28, 2023
1cf5fc6
Remove dead comment (Signed-off-by: Jeffrey Liu [email protected])
CoderJeffrey Aug 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,308 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.performanceanalyzer.decisionmaker.actions;


import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.annotations.SerializedName;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.opensearch.performanceanalyzer.AppContext;
import org.opensearch.performanceanalyzer.rca.store.rca.cluster.NodeKey;

public class SearchBackPressureAction extends SuppressibleAction {
private static final Logger LOG = LogManager.getLogger(SearchBackPressureAction.class);
public static final String NAME = "SearchBackPressureAction";
private static final ImpactVector NO_IMPACT = new ImpactVector();

/*
* Time to wait since last recommendation, before suggesting this action again
*/
private static final long DEFAULT_COOL_OFF_PERIOD_IN_MILLIS = TimeUnit.DAYS.toMillis(1);

/* From Config Per Diumension Type
* canUpdate: whether the action should be emitted
* coolOffPeriodInMillis: how long the CoolOffPeriod the action should before reemit
* thresholdName: the name of threshold we are tuning (e.g. node_duress.cpu_threshold, search_heap_threshold)
* dimension: indicates whether the resource unit is caused by shard/task level searchbackpressure cancellation stats
* Step Size in percentage: how much should the threshold change in percentage
*/
private boolean canUpdate;
private long coolOffPeriodInMillis;
private String thresholdName;

private SearchbpDimension dimension;
private SearchbpThresholdActionDirection direction;
private double stepSizeInPercentage;

public SearchBackPressureAction(
final AppContext appContext,
final boolean canUpdate,
final long coolOffPeriodInMillis,
final String thresholdName,
final SearchbpDimension dimension,
final SearchbpThresholdActionDirection direction,
final double stepSizeInPercentage) {
super(appContext);
this.canUpdate = canUpdate;
this.coolOffPeriodInMillis = coolOffPeriodInMillis;
this.thresholdName = thresholdName;
this.dimension = dimension;
this.direction = direction;
this.stepSizeInPercentage = stepSizeInPercentage;
}

@Override
public String name() {
return NAME;
}

@Override
public boolean canUpdate() {
return canUpdate;
}

@Override
public long coolOffPeriodInMillis() {
return coolOffPeriodInMillis;
}

@Override
public List<NodeKey> impactedNodes() {
// all nodes are impacted by this change
return appContext.getDataNodeInstances().stream()
.map(NodeKey::new)
.collect(Collectors.toList());
}

/* Search Back Pressure Decider/Policy only tunes searchbackpressure related thresholds (e.g. search_backpressure.search_task_heap_threshold)
* and it does not correlate directly with any current dimension in the ImpactVector (e.g. CPU/HEAP).
* And the current Searchbp actions only adjust heap related Searchbp Thresholds for now.
* Dimensions in ImpactVector is used by collator to determine which action should be emitted to Publisher,
* eventually which actions should the downstream class execute. So if there are 2 actions emitting in the same time, one increase CPU and one decrease it, the collator cancel out the actions.
* However, since for Searchbp Actions we only tune the searchbp threshold once per time (it's impossible for 2 actions emitting in the same time that increase and decrease searchbackpressure heap usage threshold).
* Therefore, we put no Impact for ImpactVector for Searchbp Actions.
*/
@Override
public Map<NodeKey, ImpactVector> impact() {
Map<NodeKey, ImpactVector> impact = new HashMap<>();
for (NodeKey key : impactedNodes()) {
impact.put(key, NO_IMPACT);
}
return impact;
}

public String getThresholdName() {
return thresholdName;
}

public String getDimension() {
return dimension.toString();
}

public String getDirection() {
return direction.toString();
}

public double getStepSizeInPercentage() {
return stepSizeInPercentage;
}

@Override
public String summary() {
Summary summary =
new Summary(
thresholdName,
dimension.toString(),
direction.toString(),
stepSizeInPercentage,
DEFAULT_COOL_OFF_PERIOD_IN_MILLIS,
canUpdate);
return summary.toJson();
}

public static final class Builder {
public static final boolean DEFAULT_CAN_UPDATE = true;

private final AppContext appContext;
private final String thresholdName;
private final SearchbpDimension dimension;
private final SearchbpThresholdActionDirection direction;
private boolean canUpdate;
private double stepSizeInPercentage;
private long coolOffPeriodInMillis;

private Builder(
final AppContext appContext,
final String thresholdName,
final SearchbpDimension dimension,
final SearchbpThresholdActionDirection direction,
final long coolOffPeriodInMillis) {
this.appContext = appContext;
this.thresholdName = thresholdName;
this.dimension = dimension;
this.direction = direction;
this.coolOffPeriodInMillis = coolOffPeriodInMillis;
this.canUpdate = DEFAULT_CAN_UPDATE;
}

public Builder stepSizeInPercentage(double stepSizeInPercentage) {
this.stepSizeInPercentage = stepSizeInPercentage;
return this;
}

public Builder coolOffPeriodInMillis(long coolOffPeriodInMillis) {
this.coolOffPeriodInMillis = coolOffPeriodInMillis;
return this;
}

public SearchBackPressureAction build() {
return new SearchBackPressureAction(
appContext,
canUpdate,
coolOffPeriodInMillis,
thresholdName,
dimension,
direction,
stepSizeInPercentage);
}
}

/* Write Static Class Summary to conver the Searchbp Action POJO to JSON Object
* Key fields to be included
* 1. ThresholdName: name of the SearchBackPressure threshold to be tuned
* 2. Dimension of the action (Shard/Task)
* 3. Direction of the action (Increase/Decrease)
* 3. StepSizeInPercentage to change the threshold
* 4. CoolOffPeriodInMillis for the action
* 5. canUpdate (whether the action should be emitted)
*/
public static class Summary {
public static final String THRESHOLD_NAME = "thresholdName";
public static final String SEARCHBP_DIMENSION = "searchbpDimension";
public static final String DIRECTION = "direction";
public static final String STEP_SIZE_IN_PERCENTAGE = "stepSizeInPercentage";
public static final String COOL_OFF_PERIOD = "coolOffPeriodInMillis";
public static final String CAN_UPDATE = "canUpdate";

@SerializedName(value = THRESHOLD_NAME)
private String thresholdName;

@SerializedName(value = SEARCHBP_DIMENSION)
private String searchbpSettingDimension;

@SerializedName(value = DIRECTION)
private String direction;

@SerializedName(value = STEP_SIZE_IN_PERCENTAGE)
private double stepSizeInPercentage;

@SerializedName(value = COOL_OFF_PERIOD)
private long coolOffPeriodInMillis;

@SerializedName(value = CAN_UPDATE)
private boolean canUpdate;

public Summary(
String thresholdName,
String searchbpSettingDimension,
String direction,
double stepSizeInPercentage,
long coolOffPeriodInMillis,
boolean canUpdate) {
this.thresholdName = thresholdName;
this.searchbpSettingDimension = searchbpSettingDimension;
this.direction = direction;
this.stepSizeInPercentage = stepSizeInPercentage;
this.coolOffPeriodInMillis = coolOffPeriodInMillis;
this.canUpdate = canUpdate;
}

/*
* ThresholdName is the name of the setting to be modified
* e.g. node_duress.cpu_threshold, node_duress.search_heap_threshold
*/
public String getThresholdName() {
return thresholdName;
}

public String getSearchbpSettingDimension() {
return searchbpSettingDimension;
}

public String getDirection() {
return direction;
}

public double getStepSizeInPercentage() {
return stepSizeInPercentage;
}

public long getCoolOffPeriodInMillis() {
return coolOffPeriodInMillis;
}

public boolean getCanUpdate() {
return canUpdate;
}

public String toJson() {
Gson gson = new GsonBuilder().disableHtmlEscaping().create();
return gson.toJson(this);
}
}

// enum to indicate to increase/decrease the threshold
public enum SearchbpThresholdActionDirection {
INCREASE(SearchbpThresholdActionDirection.Constants.INCREASE),
DECREASE(SearchbpThresholdActionDirection.Constants.DECREASE);

private final String value;

SearchbpThresholdActionDirection(String value) {
this.value = value;
}

@Override
public String toString() {
return value;
}

public static class Constants {
public static final String INCREASE = "increase";
public static final String DECREASE = "decrease";
}
}

// enum to indicate to whether the action is caused by shard/task level searchbackpressure
// cancellation
public enum SearchbpDimension {
SHARD(SearchbpDimension.Constants.SHARD),
TASK(SearchbpDimension.Constants.TASK);

private final String value;

SearchbpDimension(String value) {
this.value = value;
}

@Override
public String toString() {
return value;
}

public static class Constants {
public static final String SHARD = "shard";
public static final String TASK = "task";
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import org.opensearch.performanceanalyzer.decisionmaker.deciders.configs.jvm.OldGenDecisionPolicyConfig;
import org.opensearch.performanceanalyzer.decisionmaker.deciders.configs.jvm.young_gen.JvmGenTuningPolicyConfig;
import org.opensearch.performanceanalyzer.decisionmaker.deciders.configs.searchbackpressure.SearchBackPressurePolicyConfig;
import org.opensearch.performanceanalyzer.rca.framework.core.NestedConfig;
import org.opensearch.performanceanalyzer.rca.framework.core.RcaConf;

Expand All @@ -28,11 +29,14 @@ public class DeciderConfig {
private static final String OLD_GEN_DECISION_POLICY_CONFIG_NAME =
"old-gen-decision-policy-config";
private static final String JVM_GEN_TUNING_POLICY_CONFIG_NAME = "jvm-gen-tuning-policy-config";
private static final String SEARCH_BACK_PRESSURE_POLICY_CONFIG_NAME =
"search-back-pressure-policy-config";

private final CachePriorityOrderConfig cachePriorityOrderConfig;
private final WorkLoadTypeConfig workLoadTypeConfig;
private final OldGenDecisionPolicyConfig oldGenDecisionPolicyConfig;
private final JvmGenTuningPolicyConfig jvmGenTuningPolicyConfig;
private final SearchBackPressurePolicyConfig searchBackPressurePolicyConfig;

public DeciderConfig(final RcaConf rcaConf) {
cachePriorityOrderConfig =
Expand All @@ -51,6 +55,11 @@ public DeciderConfig(final RcaConf rcaConf) {
new NestedConfig(
JVM_GEN_TUNING_POLICY_CONFIG_NAME,
rcaConf.getDeciderConfigSettings()));
searchBackPressurePolicyConfig =
new SearchBackPressurePolicyConfig(
new NestedConfig(
SEARCH_BACK_PRESSURE_POLICY_CONFIG_NAME,
rcaConf.getDeciderConfigSettings()));
}

public CachePriorityOrderConfig getCachePriorityOrderConfig() {
Expand All @@ -68,4 +77,8 @@ public OldGenDecisionPolicyConfig getOldGenDecisionPolicyConfig() {
public JvmGenTuningPolicyConfig getJvmGenTuningPolicyConfig() {
return jvmGenTuningPolicyConfig;
}

public SearchBackPressurePolicyConfig getSearchBackPressurePolicyConfig() {
return searchBackPressurePolicyConfig;
}
}
Loading