Read function Name from pretrained model #1529

xinyual · 2023-10-18T04:53:23Z

Description

Currently if we register pretrained model without url, it will set the default function name to text_embedding since we only have text embedding pretrained model. But now we have sparse encoding, so we need to read the function name from pretrained model config.

Issues Resolved

[List any issues this PR will resolve]

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2023-10-18T05:14:40Z

Codecov Report

Attention: 98 lines in your changes are missing coverage. Please review.

Comparison is base (568bc7e) 79.42% compared to head (3c7216b) 79.54%.
Report is 2 commits behind head on main.

Files	Patch %	Lines
...rithms/metrics_correlation/MetricsCorrelation.java	0.00%	46 Missing ⚠️
...pensearch/ml/engine/algorithms/DLModelExecute.java	0.00%	10 Missing ⚠️
.../ml/engine/algorithms/clustering/RCFSummarize.java	78.94%	7 Missing and 1 partial ⚠️
...ain/java/org/opensearch/ml/engine/ModelHelper.java	78.26%	4 Missing and 1 partial ⚠️
..._embedding/HuggingfaceTextEmbeddingTranslator.java	33.33%	4 Missing ⚠️
...gine/algorithms/tokenize/SparseTokenizerModel.java	78.57%	2 Missing and 1 partial ⚠️
...l/engine/algorithms/ad/AnomalyDetectionLibSVM.java	87.50%	0 Missing and 2 partials ⚠️
...thms/anomalylocalization/AnomalyLocalizerImpl.java	96.66%	0 Missing and 2 partials ⚠️
...rics_correlation/MetricsCorrelationTranslator.java	0.00%	2 Missing ⚠️
...ine/algorithms/rcf/FixedInTimeRandomCutForest.java	93.33%	1 Missing and 1 partial ⚠️
... and 12 more

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1529      +/-   ##
============================================
+ Coverage     79.42%   79.54%   +0.11%     
- Complexity     3982     3987       +5     
============================================
  Files           390      390              
  Lines         16215    16277      +62     
  Branches       1751     1751              
============================================
+ Hits          12879    12947      +68     
+ Misses         2661     2655       -6     
  Partials        675      675

Flag	Coverage Δ
ml-commons	`79.54% <80.51%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhrubo-os · 2023-10-24T00:47:38Z

ml-algorithms/src/main/java/org/opensearch/ml/engine/ModelHelper.java

@@ -87,7 +87,8 @@ public void downloadPrebuiltModelConfig(String taskId, MLRegisterModelInput regi
                        .url(modelZipFileUrl)
                        .deployModel(deployModel)
                        .modelNodeIds(modelNodeIds)
-                        .modelGroupId(modelGroupId);
+                        .modelGroupId(modelGroupId)
+                        .functionName(FunctionName.from((String) config.get("model_task_type")));;


Can't we get Function name from registerModelInput like the way we get other inputs from 57-62 lines?

The registerModelInput is from the request body json. So we can request customer to provide it. But if we want to keep the request convention to only contain "name, version, model_format" for our pretrained model, we can only read it from pretrained config.

What if config.get("model_task_type") is null?

I think for each pretrained model, we should have that field.

I feel like may be we should address this section

currently we default to text embedding, which doesn't seem right. What happens if we start adding different pre-trained models like right now we added splade model.

I feel like may be we should address this section

currently we default to text embedding, which doesn't seem right. What happens if we start adding different pre-trained models like right now we added splade model.

Can we make the function name mandatory, when it's null, throw exception instead of setting to a default value?

one solution could be: we can add function_name in our model listing and then get the function name from there.

I feel like may be we should address this section
currently we default to text embedding, which doesn't seem right. What happens if we start adding different pre-trained models like right now we added splade model.

Can we make the function name mandatory, when it's null, throw exception instead of setting to a default value?

No. When we register pretrained model, we don't provide function name so in some scenario it would be null. We need a default value.

one solution could be: we can add function_name in our model listing and then get the function name from there.

I think both work. Model listing and pretrained config are both files we maintain, so it's just an option to read from which file in our s3 bucket.

I see what you mean now. We get this model_task_type from config.json like this. Yeah this should work.

ylwu-amzn · 2023-11-07T08:37:33Z

plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java

@@ -609,7 +609,7 @@ private void uploadModel(MLRegisterModelInput registerModelInput, MLTask mlTask,

    private void registerModelFromUrl(MLRegisterModelInput registerModelInput, MLTask mlTask, String modelVersion) {
        String taskId = mlTask.getTaskId();
-        FunctionName functionName = mlTask.getFunctionName();
+        FunctionName functionName = registerModelInput.getFunctionName();


Any reason to change to registerModelInput ? Will it break BWC?

If we don't provide url, the function name from ML task would be read from request body, while the modelInput would be generated from config. If we still use the function name from ml task, the name would still be null and the default would be "text_embedding". I have tried locally for body with url and without url, both worked for me.

MLTask must track the correct function name. Can you check if the function name in MLTask correct or not?

No. I have tested it again. If the request body is like:
{
"name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1",
"version": "1.0.0",
"model_format": "TORCH_SCRIPT"
}

The function name inside the ml task would be text_embedding.

I mean, after changing to registerModelInput.getFunctionName();, is the function name in MLTask correct now for both text embedding and sparse model?

Already done by set function name inside mltask and rewrite to ml index.

Signed-off-by: xinyual <[email protected]>

dhrubo-os · 2023-11-14T19:10:45Z

There's a merge conflict.

Signed-off-by: xinyual <[email protected]>

xinyual · 2023-11-14T23:48:00Z

There's a merge conflict.

I guess we could do it now. I have merged from main branch.

Signed-off-by: xinyual <[email protected]>

opensearch-trigger-bot · 2023-11-15T00:24:01Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1529-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4d53db5d987b1940102c0f1eba12295a2f1bd5ca
# Push it to GitHub
git push --set-upstream origin backport/backport-1529-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1529-to-2.x.

…roject#1529) Signed-off-by: xinyual <[email protected]>

Signed-off-by: xinyual <[email protected]>

* read Function name from pretrained config Signed-off-by: xinyual <[email protected]> * rewrite mltask Signed-off-by: xinyual <[email protected]> * optimize import Signed-off-by: xinyual <[email protected]> * apply spotless Signed-off-by: xinyual <[email protected]> * add test for function name Signed-off-by: xinyual <[email protected]> * apply spotless Signed-off-by: xinyual <[email protected]> * maintain single import Signed-off-by: xinyual <[email protected]> * add more test Signed-off-by: xinyual <[email protected]> * apply spot less Signed-off-by: xinyual <[email protected]> * apply spot less Signed-off-by: xinyual <[email protected]> --------- Signed-off-by: xinyual <[email protected]>

xinyual requested review from b4sjoo, dhrubo-os, jngz-es, model-collapse, rbhavna, wujunshen, ylwu-amzn, zane-neo and Zhangxunmt as code owners October 18, 2023 04:53

xinyual temporarily deployed to ml-commons-cicd-env October 18, 2023 04:53 — with GitHub Actions Inactive

xinyual had a problem deploying to ml-commons-cicd-env October 18, 2023 04:53 — with GitHub Actions Failure

xinyual temporarily deployed to ml-commons-cicd-env October 18, 2023 04:53 — with GitHub Actions Inactive

dhrubo-os reviewed Oct 24, 2023

View reviewed changes

ylwu-amzn reviewed Nov 7, 2023

View reviewed changes

xinyual force-pushed the readFunctionName branch from e0b2487 to 8dc7d88 Compare November 12, 2023 04:17

xinyual requested review from austintlee and HenryL27 as code owners November 12, 2023 04:17

xinyual had a problem deploying to ml-commons-cicd-env November 12, 2023 04:17 — with GitHub Actions Error

xinyual had a problem deploying to ml-commons-cicd-env November 12, 2023 04:17 — with GitHub Actions Failure

xinyual had a problem deploying to ml-commons-cicd-env November 12, 2023 04:18 — with GitHub Actions Error

xinyual had a problem deploying to ml-commons-cicd-env November 12, 2023 04:18 — with GitHub Actions Failure

xinyual added 2 commits November 12, 2023 12:18

read Function name from pretrained config

63d3351

Signed-off-by: xinyual <[email protected]>

rewrite mltask

cb35dd8

Signed-off-by: xinyual <[email protected]>

xinyual force-pushed the readFunctionName branch from 8dc7d88 to cb35dd8 Compare November 12, 2023 04:18

merge from main

fc2cac8

Signed-off-by: xinyual <[email protected]>

xinyual dismissed ylwu-amzn’s stale review via fc2cac8 November 14, 2023 23:47

xinyual had a problem deploying to ml-commons-cicd-env November 14, 2023 23:47 — with GitHub Actions Error

xinyual had a problem deploying to ml-commons-cicd-env November 14, 2023 23:47 — with GitHub Actions Failure

xinyual had a problem deploying to ml-commons-cicd-env November 14, 2023 23:47 — with GitHub Actions Error

xinyual had a problem deploying to ml-commons-cicd-env November 14, 2023 23:47 — with GitHub Actions Failure

xinyual had a problem deploying to ml-commons-cicd-env November 14, 2023 23:47 — with GitHub Actions Error

apply spot less

3c7216b

Signed-off-by: xinyual <[email protected]>

xinyual temporarily deployed to ml-commons-cicd-env November 14, 2023 23:49 — with GitHub Actions Inactive

xinyual had a problem deploying to ml-commons-cicd-env November 14, 2023 23:49 — with GitHub Actions Failure

xinyual temporarily deployed to ml-commons-cicd-env November 14, 2023 23:49 — with GitHub Actions Inactive

xinyual had a problem deploying to ml-commons-cicd-env November 15, 2023 00:14 — with GitHub Actions Error

xinyual had a problem deploying to ml-commons-cicd-env November 15, 2023 00:14 — with GitHub Actions Failure

xinyual had a problem deploying to ml-commons-cicd-env November 15, 2023 00:14 — with GitHub Actions Error

dhrubo-os approved these changes Nov 15, 2023

View reviewed changes

zane-neo approved these changes Nov 15, 2023

View reviewed changes

zane-neo merged commit 4d53db5 into opensearch-project:main Nov 15, 2023
10 of 14 checks passed

zane-neo added the backport 2.x label Nov 15, 2023

xinyual mentioned this pull request Nov 15, 2023

[Backport 2.x] Read function Name from pretrained model (#1529) #1638

Merged

5 tasks

xinyual added a commit to xinyual/ml-commons that referenced this pull request Nov 15, 2023

[Backport 2.x] Read function Name from pretrained model (opensearch-p…

01deeb7

…roject#1529) Signed-off-by: xinyual <[email protected]>

zane-neo pushed a commit that referenced this pull request Nov 15, 2023

[Backport 2.x] Read function Name from pretrained model (#1529) (#1638)

56f1663

Signed-off-by: xinyual <[email protected]>

zhichao-aws mentioned this pull request Mar 1, 2024

Update register sparse model api opensearch-project/documentation-website#6555

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read function Name from pretrained model #1529

Read function Name from pretrained model #1529

xinyual commented Oct 18, 2023

codecov bot commented Oct 18, 2023 •

edited

Loading

dhrubo-os Oct 24, 2023

xinyual Oct 24, 2023 •

edited

Loading

ylwu-amzn Oct 24, 2023

xinyual Nov 7, 2023

dhrubo-os Nov 10, 2023

zane-neo Nov 13, 2023

dhrubo-os Nov 13, 2023

xinyual Nov 13, 2023

xinyual Nov 13, 2023 •

edited

Loading

dhrubo-os Nov 13, 2023

ylwu-amzn Nov 7, 2023

xinyual Nov 9, 2023

ylwu-amzn Nov 9, 2023

xinyual Nov 10, 2023

ylwu-amzn Nov 10, 2023 •

edited

Loading

xinyual Nov 12, 2023

dhrubo-os commented Nov 14, 2023

xinyual commented Nov 14, 2023

opensearch-trigger-bot bot commented Nov 15, 2023

Read function Name from pretrained model #1529

Read function Name from pretrained model #1529

Conversation

xinyual commented Oct 18, 2023

Description

Issues Resolved

Check List

codecov bot commented Oct 18, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

xinyual Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinyual Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ylwu-amzn Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhrubo-os commented Nov 14, 2023

xinyual commented Nov 14, 2023

opensearch-trigger-bot bot commented Nov 15, 2023

codecov bot commented Oct 18, 2023 •

edited

Loading

xinyual Oct 24, 2023 •

edited

Loading

xinyual Nov 13, 2023 •

edited

Loading

ylwu-amzn Nov 10, 2023 •

edited

Loading