[Feature] Add Question Answering Model (old) #329

faradawn · 2023-11-01T04:26:26Z

Description

Added a script to trace a Question Answering models to TorchScript and Onnx format.

Issues Resolved

#304 which is [FEATURE] Trace Question Answering models to TorchScript and Onnx format.

Test

I created three tests to check if the traced model's answer matched the official model's answer.

Both the onnx and pt format passed.

== test 0, question: Who was Jim Henson?, context: Jim Henson was a nice puppet
    Official answer: a nice puppet
    Our answer: a nice puppet
=== test 1, question: Where do I live?, context: My name is Sarah and I live in London
    Official answer: London
    Our answer: London
=== test 2, question: What's my name?, context: My name is Clara and I live in Berkeley.
    Official answer: Clara
    Our answer: Clara

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: VincentZhao12 <[email protected]> Co-authored-by: VincentZhao12 <[email protected]>

…arch-project#311) * update cluster settings in developer guide Signed-off-by: kalyanr <[email protected]> * updated settings Signed-off-by: kalyanr <[email protected]> * update dev guide Signed-off-by: kalyanr <[email protected]> * fix grammar Signed-off-by: kalyan <[email protected]> --------- Signed-off-by: kalyanr <[email protected]> Signed-off-by: kalyan <[email protected]>

Signed-off-by: xinyual <[email protected]>

…base-v2 (v.1.0.0)(BOTH) (opensearch-project#321) * GitHub Actions Workflow: Update Model Upload History - sentence-transformers/paraphrase-mpnet-base-v2 (v.1.0.0)(BOTH) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * GitHub Actions Workflow: Update CHANGELOG.md - sentence-transformers/paraphrase-mpnet-base-v2 (v.1.0.0)(BOTH) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> --------- Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: dhrubo-os <[email protected]>

…ject#322) * GitHub Actions Workflow: Update Pretrained Model Listing Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * GitHub Actions Workflow: Update CHANGELOG.md - Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> --------- Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Dhrubo Saha <[email protected]> Co-authored-by: dhrubo-os <[email protected]> Co-authored-by: Dhrubo Saha <[email protected]>

Signed-off-by: kalyan <[email protected]>

Signed-off-by: faradawn <[email protected]>

dhrubo-os · 2023-11-02T02:54:49Z

opensearch_py_ml/ml_models/test_question_answering.py

@@ -0,0 +1,37 @@
+# Save our model as pt


Why do we need this file here?

It was a file that imports question_answering_model.py directly instead of going through the pip module. It was for testing convience for them to be in the same directory. We can remove this file!

Yeah that would be awesome. I would like you to present tomorrow about this model tracing in our office hour so that we can learn about this together.

dhrubo-os · 2023-11-02T02:58:10Z

tests/ml_models/test_question_answering.py

+    official_answer = tokenizer.decode(predict_answer_tokens)
+    return official_answer
+
+def test_onnx():


what are we asserting in the test?

We should assert if our answer string is equal to the official answer. I can add an assertion statement instead of printing the two outputs!

A question here. Would response be always same?

I think generative models have randomness. Here in the three runs and the outputs were the same.

If they are different, then one way is to just calculate the cosine similarity between expected output and the actual output. Usually they don't differ much and it will be having high value.

dhrubo-os · 2023-11-02T04:32:46Z

Tests are failing:

 =========================== short test summary info ============================
ERROR opensearch_py_ml/ml_models/test_question_answering.py - ModuleNotFoundE...
ERROR opensearch_py_ml/ml_models/test_question_answering.py
ERROR tests/ml_models/test_question_answering.py - KeyError: 'tokenizer_file'
ERROR tests/ml_models/test_question_answering.py - KeyError: 'tokenizer_file'

…h-project#330) * move test-search to the end Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> --------- Signed-off-by: kalyan <[email protected]>

* add ml-commons train support Signed-off-by: kalyanr <[email protected]> * update __all__ Signed-off-by: kalyanr <[email protected]> * fix test cases Signed-off-by: kalyanr <[email protected]> * sleep after bulk insert Signed-off-by: kalyanr <[email protected]> * fix formatting Signed-off-by: kalyanr <[email protected]> * remove unused imports Signed-off-by: kalyanr <[email protected]> * remove duplicate conftest Signed-off-by: kalyan <[email protected]> * delete duplicate conftest Signed-off-by: kalyan <[email protected]> * include pytest plugins Signed-off-by: kalyan <[email protected]> * revert pandas version Signed-off-by: Kalyan <[email protected]> * include license Signed-off-by: kalyan <[email protected]> * fix formatting Signed-off-by: kalyan <[email protected]> * fix imports order Signed-off-by: kalyan <[email protected]> * fix imports order Signed-off-by: kalyan <[email protected]> * lint fix Signed-off-by: kalyan <[email protected]> * update changelog Signed-off-by: kalyan <[email protected]> * revert testcases Signed-off-by: kalyan <[email protected]> * remove fixtures Signed-off-by: kalyan <[email protected]> * updated test cases Signed-off-by: kalyan <[email protected]> * lint fixes Signed-off-by: kalyan <[email protected]> * update fixture Signed-off-by: kalyan <[email protected]> * revert Signed-off-by: kalyan <[email protected]> * include train in MLCommons class as a func Signed-off-by: kalyan <[email protected]> * remove model train Signed-off-by: kalyan <[email protected]> * fix tests Signed-off-by: kalyan <[email protected]> * revert nox Signed-off-by: kalyan <[email protected]> * add tests to model_train Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> --------- Signed-off-by: kalyanr <[email protected]> Signed-off-by: kalyan <[email protected]> Signed-off-by: Kalyan <[email protected]>

* fix Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> * update changelog Signed-off-by: kalyan <[email protected]> * update changelog Signed-off-by: kalyan <[email protected]> --------- Signed-off-by: kalyan <[email protected]>

…functionality to support Model Access Control (opensearch-project#332) * init Signed-off-by: kalyan <[email protected]> * add search, delete and update Signed-off-by: kalyan <[email protected]> * add tests for register model group Signed-off-by: kalyan <[email protected]> * update cluster to 2.11 Signed-off-by: kalyan <[email protected]> * test skipif Signed-off-by: kalyan <[email protected]> * fix Signed-off-by: kalyan <[email protected]> * add tests Signed-off-by: kalyan <[email protected]> * update matrix Signed-off-by: kalyan <[email protected]> * cancel in progress Signed-off-by: kalyan <[email protected]> * update concurrency Signed-off-by: kalyan <[email protected]> * job level concurrency Signed-off-by: kalyan <[email protected]> * fix Signed-off-by: kalyan <[email protected]> * fix Signed-off-by: kalyan <[email protected]> * fix tests Signed-off-by: kalyan <[email protected]> * tests passing Signed-off-by: kalyan <[email protected]> * isort fix Signed-off-by: kalyan <[email protected]> * fix action Signed-off-by: kalyan <[email protected]> * fix action Signed-off-by: kalyan <[email protected]> * fix action Signed-off-by: kalyan <[email protected]> * fix Signed-off-by: kalyan <[email protected]> * update changelog Signed-off-by: kalyan <[email protected]> * fix os dockerfile Signed-off-by: kalyan <[email protected]> * test Signed-off-by: kalyanr <[email protected]> * pass opensearch version Signed-off-by: kalyanr <[email protected]> * fix Signed-off-by: kalyanr <[email protected]> * fix Signed-off-by: kalyanr <[email protected]> * fix Signed-off-by: kalyan <[email protected]> * update OS dockerfile Signed-off-by: kalyan <[email protected]> * fix failing tests Signed-off-by: kalyan <[email protected]> * update dockerfile for 2.11.0 Signed-off-by: kalyan <[email protected]> * remove disable warning Signed-off-by: kalyan <[email protected]> * fix upload model Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> * include reference Signed-off-by: kalyan <[email protected]> * pr fixes Signed-off-by: kalyan <[email protected]> * lint fix Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> * fix tests Signed-off-by: kalyan <[email protected]> * skip Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> * fix lint and increase coverage Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> * fix Signed-off-by: kalyan <[email protected]> * feedback fixes Signed-off-by: kalyan <[email protected]> * fix Signed-off-by: kalyan <[email protected]> * lint fix Signed-off-by: kalyan <[email protected]> * fix test cases Signed-off-by: kalyan <[email protected]> * pr feedback fixes Signed-off-by: kalyanr <[email protected]> * revert Signed-off-by: kalyanr <[email protected]> --------- Signed-off-by: kalyan <[email protected]> Signed-off-by: kalyanr <[email protected]>

* replace is_datetime_or_timedelta_dtype Signed-off-by: kalyan <[email protected]> * fix formatting Signed-off-by: kalyan <[email protected]> * fix import order Signed-off-by: kalyan <[email protected]> * update changelog Signed-off-by: kalyan <[email protected]> --------- Signed-off-by: kalyan <[email protected]> Signed-off-by: kalyanr <[email protected]>

…ch-py-ml into feature/summarization_model_conversation

Signed-off-by: faradawn <[email protected]>

faradawn · 2023-11-16T17:48:20Z

The question answering model is finished and unit tests are added:

Change to AutoModelForQuestionAnswering and AutoTokenizer to support different bert models.
Create basic unit tests.
Create a unit test to check the traced model's output.
- The question answering model will output a [start_index, end_index] to identify the answer segment within the input.
- Thus, I plan to check the [start_index, end_index] against that of the official model, instead of cosine similarity.
Added sign-off for the previous three commits.

The sign-off tests was still failing, because of two incorrect sign-offs 17 commits by old authors Vincent Zhao and xinyual.

Is there a way to resolve it? Thanks!

dhrubo-os · 2023-11-20T23:38:39Z

@faradawn DCO is missing.

dhrubo-os · 2023-11-20T23:39:47Z

Looks like you added some other commits in your PR. Please fix that. Let me know if you are facing any issue on this.

faradawn · 2023-11-23T16:56:05Z

Got it, Dhurbo! I plan to create a new branch from the current main, and commit files there, to solve the DCO issue!

faradawn · 2023-11-30T16:51:55Z

Created a new branch off main to solve the sign off issue. The new PR is here: #349.

VincentZhao12 and others added 8 commits October 5, 2023 10:50

fixed formatting of install header in README.md (opensearch-project#305)

7b0066a

Signed-off-by: VincentZhao12 <[email protected]> Co-authored-by: VincentZhao12 <[email protected]>

add doc model block (opensearch-project#319)

2f8ee19

Signed-off-by: xinyual <[email protected]>

limit pandas version to 2 (opensearch-project#313)

5ca5aa8

Signed-off-by: kalyan <[email protected]>

add question model, finished torchscript

9df65e7

Signed-off-by: faradawn <[email protected]>

add save_as_onnx and add tests

9315bdb

Signed-off-by: faradawn <[email protected]>

faradawn requested review from dhrubo-os, greaa-aws, ylwu-amzn, b4sjoo, jngz-es and rbhavna as code owners November 1, 2023 04:26

dhrubo-os reviewed Nov 2, 2023

View reviewed changes

rawwar and others added 9 commits November 2, 2023 16:25

reordering ml_commons tests : moved test_search to the end (opensearc…

da415f0

…h-project#330) * move test-search to the end Signed-off-by: kalyan <[email protected]> * fix lint Signed-off-by: kalyan <[email protected]> --------- Signed-off-by: kalyan <[email protected]>

Merge branch 'main' of https://github.com/opensearch-project/opensear…

52f91f6

…ch-py-ml into feature/summarization_model_conversation

added a first test for question_answering

b3eaae1

Signed-off-by: faradawn <[email protected]>

added tests to compare traced models output with official

25197ad

Signed-off-by: faradawn <[email protected]>

added default tracing sentence

e149375

Signed-off-by: faradawn <[email protected]>

faradawn force-pushed the feature/summarization_model_conversation branch from 72cb969 to e149375 Compare November 16, 2023 17:47

faradawn changed the title ~~[Feature] Added Question Answering Model~~ [Feature] Add Question Answering Model (old) Nov 30, 2023

faradawn closed this Nov 30, 2023

faradawn mentioned this pull request Nov 30, 2023

[Feature] Add New Question Answering Model #349

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Question Answering Model (old) #329

[Feature] Add Question Answering Model (old) #329

faradawn commented Nov 1, 2023

dhrubo-os Nov 2, 2023

faradawn Nov 2, 2023 •

edited

Loading

dhrubo-os Nov 2, 2023

dhrubo-os Nov 2, 2023

faradawn Nov 2, 2023

rawwar Nov 2, 2023

faradawn Nov 2, 2023

rawwar Nov 2, 2023 •

edited

Loading

dhrubo-os commented Nov 2, 2023

faradawn commented Nov 16, 2023 •

edited

Loading

dhrubo-os commented Nov 20, 2023

dhrubo-os commented Nov 20, 2023

faradawn commented Nov 23, 2023

faradawn commented Nov 30, 2023

[Feature] Add Question Answering Model (old) #329

[Feature] Add Question Answering Model (old) #329

Conversation

faradawn commented Nov 1, 2023

Description

Issues Resolved

Test

Check List

dhrubo-os Nov 2, 2023

Choose a reason for hiding this comment

faradawn Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

dhrubo-os Nov 2, 2023

Choose a reason for hiding this comment

dhrubo-os Nov 2, 2023

Choose a reason for hiding this comment

faradawn Nov 2, 2023

Choose a reason for hiding this comment

rawwar Nov 2, 2023

Choose a reason for hiding this comment

faradawn Nov 2, 2023

Choose a reason for hiding this comment

rawwar Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

dhrubo-os commented Nov 2, 2023

faradawn commented Nov 16, 2023 • edited Loading

dhrubo-os commented Nov 20, 2023

dhrubo-os commented Nov 20, 2023

faradawn commented Nov 23, 2023

faradawn commented Nov 30, 2023

faradawn Nov 2, 2023 •

edited

Loading

rawwar Nov 2, 2023 •

edited

Loading

faradawn commented Nov 16, 2023 •

edited

Loading