Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change task worker node to list; add target worker node to cache #656

Merged
merged 3 commits into from
Jan 3, 2023

Conversation

ylwu-amzn
Copy link
Collaborator

Signed-off-by: Yaliang Wu [email protected]

Description

  1. ML task may run on multiple nodes. We save all worker nodes as String in ML task doc which is not easy to use. This PR changed worker nodes in MLTask as List
  2. Add target worker node to cache

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ylwu-amzn ylwu-amzn requested a review from a team December 31, 2022 01:05
@codecov-commenter
Copy link

codecov-commenter commented Dec 31, 2022

Codecov Report

Merging #656 (deb1c22) into 2.x (6fb7970) will increase coverage by 0.03%.
The diff coverage is 75.00%.

@@             Coverage Diff              @@
##                2.x     #656      +/-   ##
============================================
+ Coverage     84.58%   84.62%   +0.03%     
- Complexity      998     1001       +3     
============================================
  Files            93       93              
  Lines          3582     3597      +15     
  Branches        325      327       +2     
============================================
+ Hits           3030     3044      +14     
+ Misses          417      415       -2     
- Partials        135      138       +3     
Flag Coverage Δ
ml-commons 84.62% <75.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...java/org/opensearch/ml/profile/MLModelProfile.java 46.29% <40.00%> (-0.65%) ⬇️
...ain/java/org/opensearch/ml/model/MLModelCache.java 86.36% <71.42%> (-1.78%) ⬇️
...va/org/opensearch/ml/model/MLModelCacheHelper.java 95.08% <75.00%> (-0.69%) ⬇️
...earch/ml/action/load/TransportLoadModelAction.java 90.09% <100.00%> (-0.10%) ⬇️
...h/ml/action/upload/TransportUploadModelAction.java 98.63% <100.00%> (ø)
...n/java/org/opensearch/ml/model/MLModelManager.java 78.77% <100.00%> (ø)
...va/org/opensearch/ml/task/MLPredictTaskRunner.java 82.14% <100.00%> (ø)
...pensearch/ml/task/MLTrainAndPredictTaskRunner.java 78.26% <100.00%> (ø)
...a/org/opensearch/ml/task/MLTrainingTaskRunner.java 74.76% <100.00%> (ø)
...ain/java/org/opensearch/ml/task/MLTaskManager.java 67.01% <0.00%> (+2.61%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

b4sjoo
b4sjoo previously approved these changes Dec 31, 2022
b4sjoo
b4sjoo previously approved these changes Dec 31, 2022
@ylwu-amzn ylwu-amzn mentioned this pull request Dec 31, 2022
5 tasks
Comment on lines +256 to +259
} else {
String[] nodes = parser.text().split(",");
workerNodes = Arrays.asList(nodes);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't go into this branch, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For old ML tasks, the worker nodes saved as String, for example "node1,node2,node3". This branch is for BWC. When user get old ML task, will go to this branch.

workerNodes = ConcurrentHashMap.newKeySet();
modelInferenceDurationQueue = new ConcurrentLinkedQueue<>();
predictRequestDurationQueue = new ConcurrentLinkedQueue<>();
}

public void setTargetWorkerNodes(List<String> targetWorkerNodes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here targetWorkerNodes is a private concurrent set, so I assume MLModelCache needs to be threadsafe. In this case, if setTargetWorkerNodes can be run by multi-threads, we need to protect it by synchronize, otherwise, the targetWorkerNodes could be wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will be used by MLModelCacheHelper.initModelState method, which already has synchronize

@ylwu-amzn ylwu-amzn merged commit d18fd43 into opensearch-project:2.x Jan 3, 2023
ylwu-amzn added a commit to ylwu-amzn/ml-commons that referenced this pull request Feb 17, 2023
…nsearch-project#656)

* change task worker node to list; add target worker node to cache

Signed-off-by: Yaliang Wu <[email protected]>

* fix target worker node field name

Signed-off-by: Yaliang Wu <[email protected]>

* support work nodes string in old tasks

Signed-off-by: Yaliang Wu <[email protected]>

Signed-off-by: Yaliang Wu <[email protected]>
ylwu-amzn added a commit to ylwu-amzn/ml-commons that referenced this pull request Mar 2, 2023
…nsearch-project#656)

* change task worker node to list; add target worker node to cache

Signed-off-by: Yaliang Wu <[email protected]>

* fix target worker node field name

Signed-off-by: Yaliang Wu <[email protected]>

* support work nodes string in old tasks

Signed-off-by: Yaliang Wu <[email protected]>

Signed-off-by: Yaliang Wu <[email protected]>
ylwu-amzn added a commit that referenced this pull request Mar 2, 2023
… (#769)

* change task worker node to list; add target worker node to cache



* fix target worker node field name



* support work nodes string in old tasks

Signed-off-by: Yaliang Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants