Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renaming metrics #1224

Merged
merged 11 commits into from
Aug 22, 2023
Merged

renaming metrics #1224

merged 11 commits into from
Aug 22, 2023

Conversation

dhrubo-os
Copy link
Collaborator

@dhrubo-os dhrubo-os commented Aug 19, 2023

Description

[We renamed the node level metrics:

ml_node_executing_task_count --> ml_executing_task_count
ml_node_total_model_count --> ml_deployed_model_count
ml_node_total_failure_count. --> ml_failure_count
ml_node_total_circuit_breaker_trigger_count --> ml_circuit_breaker_trigger_count
ml_node_total_request_count --> ml_request_count
ml_node_jvm_heap_usage --> ml_jvm_heap_usage
]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Dhrubo Saha <[email protected]>
Signed-off-by: Dhrubo Saha <[email protected]>
Signed-off-by: Dhrubo Saha <[email protected]>
@jackiehanyang
Copy link
Collaborator

are we removing all NODE keywords in the metric naming? like changing ML_NODE_TOTAL_MODEL_COUNT to ML_TOTAL_MODEL_COUNT. What's the reason for that?

@dhrubo-os
Copy link
Collaborator Author

are we removing all NODE keywords in the metric naming? like changing ML_NODE_TOTAL_MODEL_COUNT to ML_TOTAL_MODEL_COUNT. What's the reason for that?

Before we had only data note. But in model serving framework we also initiated ML Node. So having this ML_NODE prefix might confuse users.

@jackiehanyang
Copy link
Collaborator

are we removing all NODE keywords in the metric naming? like changing ML_NODE_TOTAL_MODEL_COUNT to ML_TOTAL_MODEL_COUNT. What's the reason for that?

Before we had only data note. But in model serving framework we also initiated ML Node. So having this ML_NODE prefix might confuse users.

LGTM, just could you check why all the builds are failing? Will approve after CI is passing

@dhrubo-os
Copy link
Collaborator Author

LGTM, just could you check why all the builds are failing? Will approve after CI is passing

It's failing here, we applied some node level vs cluster level distinction in the stats. Now it's treating all of them as Cluster level stats.

@@ -148,6 +149,10 @@ protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient cli
}

MLStatsInput createMlStatsInputFromRequestParams(RestRequest request) {

Set<String> mlNodeStatNames = EnumSet.allOf(MLNodeLevelStat.class).stream()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we construct a new Set for every request?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can construct the new set in the class initialization. Let me do that.

@@ -392,9 +391,6 @@ private void indexRemoteModel(MLRegisterModelInput registerModelInput, MLTask ml
String taskId = mlTask.getTaskId();
FunctionName functionName = mlTask.getFunctionName();
try (ThreadContext.StoredContext context = client.threadPool().getThreadContext().stashContext()) {
mlStats.getStat(MLNodeLevelStat.ML_REQUEST_COUNT).increment();
mlStats.createCounterStatIfAbsent(functionName, REGISTER, ML_ACTION_REQUEST_COUNT).increment();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is to track how many register requests on function level. By removing this, can we still track that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because we are tracking this in the parent function registerMLModel

Signed-off-by: Dhrubo Saha <[email protected]>
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:02 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:02 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:02 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:02 — with GitHub Actions Inactive
@@ -11,7 +11,8 @@
*/
public enum MLNodeLevelStat {
ML_JVM_HEAP_USAGE,
ML_EXECUTING_TASK_COUNT,
ML_EXECUTING_TASK_COUNT, // How many tasks are executing currently. If any task starts, then it will be 1, if the task finished then it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it will be 1 -> then it will increase by 1

will get back to 0 -> will decrease by 1?

Signed-off-by: Dhrubo Saha <[email protected]>
@dhrubo-os dhrubo-os had a problem deploying to ml-commons-cicd-env August 22, 2023 01:11 — with GitHub Actions Failure
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:11 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:11 — with GitHub Actions Inactive
@@ -234,12 +233,6 @@ private void registerModel(MLRegisterModelInput registerModelInput, ActionListen
throw new IllegalArgumentException("URL can't match trusted url regex");
}
}
// mlStats.getStat(MLNodeLevelStat.ML_NODE_EXECUTING_TASK_COUNT).increment();
mlStats.getStat(MLNodeLevelStat.ML_NODE_TOTAL_REQUEST_COUNT).increment();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are already counting this in the registerMLModel function in MLModelManager class

@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:30 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:30 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:30 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os had a problem deploying to ml-commons-cicd-env August 22, 2023 01:30 — with GitHub Actions Failure
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:49 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:49 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:49 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env August 22, 2023 01:49 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os merged commit 86eb953 into opensearch-project:2.x Aug 22, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 22, 2023
* renaming metrics

Signed-off-by: Dhrubo Saha <[email protected]>

* updating tests

Signed-off-by: Dhrubo Saha <[email protected]>

* updating test cases

Signed-off-by: Dhrubo Saha <[email protected]>

* removing the ML_NODE checking for node level stats

Signed-off-by: Dhrubo Saha <[email protected]>

* updating constructing new set

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless Apply

Signed-off-by: Dhrubo Saha <[email protected]>

* updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing metrics count

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing executing task

Signed-off-by: Dhrubo Saha <[email protected]>

* updating comment

Signed-off-by: Dhrubo Saha <[email protected]>

---------

Signed-off-by: Dhrubo Saha <[email protected]>
(cherry picked from commit 86eb953)
dhrubo-os added a commit that referenced this pull request Aug 22, 2023
* renaming metrics

Signed-off-by: Dhrubo Saha <[email protected]>

* updating tests

Signed-off-by: Dhrubo Saha <[email protected]>

* updating test cases

Signed-off-by: Dhrubo Saha <[email protected]>

* removing the ML_NODE checking for node level stats

Signed-off-by: Dhrubo Saha <[email protected]>

* updating constructing new set

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless Apply

Signed-off-by: Dhrubo Saha <[email protected]>

* updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing metrics count

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing executing task

Signed-off-by: Dhrubo Saha <[email protected]>

* updating comment

Signed-off-by: Dhrubo Saha <[email protected]>

---------

Signed-off-by: Dhrubo Saha <[email protected]>
(cherry picked from commit 86eb953)

Co-authored-by: Dhrubo Saha <[email protected]>
zane-neo pushed a commit to zane-neo/ml-commons that referenced this pull request Sep 1, 2023
* renaming metrics

Signed-off-by: Dhrubo Saha <[email protected]>

* updating tests

Signed-off-by: Dhrubo Saha <[email protected]>

* updating test cases

Signed-off-by: Dhrubo Saha <[email protected]>

* removing the ML_NODE checking for node level stats

Signed-off-by: Dhrubo Saha <[email protected]>

* updating constructing new set

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless Apply

Signed-off-by: Dhrubo Saha <[email protected]>

* updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing metrics count

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing executing task

Signed-off-by: Dhrubo Saha <[email protected]>

* updating comment

Signed-off-by: Dhrubo Saha <[email protected]>

---------

Signed-off-by: Dhrubo Saha <[email protected]>
zane-neo pushed a commit that referenced this pull request Sep 1, 2023
* renaming metrics

Signed-off-by: Dhrubo Saha <[email protected]>

* updating tests

Signed-off-by: Dhrubo Saha <[email protected]>

* updating test cases

Signed-off-by: Dhrubo Saha <[email protected]>

* removing the ML_NODE checking for node level stats

Signed-off-by: Dhrubo Saha <[email protected]>

* updating constructing new set

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless Apply

Signed-off-by: Dhrubo Saha <[email protected]>

* updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing metrics count

Signed-off-by: Dhrubo Saha <[email protected]>

* spotless

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing executing task

Signed-off-by: Dhrubo Saha <[email protected]>

* updating comment

Signed-off-by: Dhrubo Saha <[email protected]>

---------

Signed-off-by: Dhrubo Saha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants