Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Question Answering Model (old) #329

Conversation

faradawn
Copy link

@faradawn faradawn commented Nov 1, 2023

Description

Added a script to trace a Question Answering models to TorchScript and Onnx format.

Issues Resolved

#304 which is [FEATURE] Trace Question Answering models to TorchScript and Onnx format.

Test

I created three tests to check if the traced model's answer matched the official model's answer.

Both the onnx and pt format passed.

== test 0, question: Who was Jim Henson?, context: Jim Henson was a nice puppet
    Official answer: a nice puppet
    Our answer: a nice puppet
=== test 1, question: Where do I live?, context: My name is Sarah and I live in London
    Official answer: London
    Our answer: London
=== test 2, question: What's my name?, context: My name is Clara and I live in Berkeley.
    Official answer: Clara
    Our answer: Clara

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

VincentZhao12 and others added 8 commits October 5, 2023 10:50
…arch-project#311)

* update cluster settings in developer guide

Signed-off-by: kalyanr <[email protected]>

* updated settings

Signed-off-by: kalyanr <[email protected]>

* update dev guide

Signed-off-by: kalyanr <[email protected]>

* fix grammar

Signed-off-by: kalyan <[email protected]>

---------

Signed-off-by: kalyanr <[email protected]>
Signed-off-by: kalyan <[email protected]>
…base-v2 (v.1.0.0)(BOTH) (opensearch-project#321)

* GitHub Actions Workflow: Update Model Upload History - sentence-transformers/paraphrase-mpnet-base-v2 (v.1.0.0)(BOTH)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* GitHub Actions Workflow: Update CHANGELOG.md - sentence-transformers/paraphrase-mpnet-base-v2 (v.1.0.0)(BOTH)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: dhrubo-os <[email protected]>
…ject#322)

* GitHub Actions Workflow: Update Pretrained Model Listing

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* GitHub Actions Workflow: Update CHANGELOG.md -

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Dhrubo Saha <[email protected]>
Co-authored-by: dhrubo-os <[email protected]>
Co-authored-by: Dhrubo Saha <[email protected]>
@@ -0,0 +1,37 @@
# Save our model as pt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this file here?

Copy link
Author

@faradawn faradawn Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a file that imports question_answering_model.py directly instead of going through the pip module. It was for testing convience for them to be in the same directory. We can remove this file!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be awesome. I would like you to present tomorrow about this model tracing in our office hour so that we can learn about this together.

official_answer = tokenizer.decode(predict_answer_tokens)
return official_answer

def test_onnx():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we asserting in the test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assert if our answer string is equal to the official answer. I can add an assertion statement instead of printing the two outputs!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question here. Would response be always same?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generative models have randomness. Here in the three runs and the outputs were the same.

Copy link
Contributor

@rawwar rawwar Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are different, then one way is to just calculate the cosine similarity between expected output and the actual output. Usually they don't differ much and it will be having high value.

@dhrubo-os
Copy link
Collaborator

Tests are failing:

 =========================== short test summary info ============================
ERROR opensearch_py_ml/ml_models/test_question_answering.py - ModuleNotFoundE...
ERROR opensearch_py_ml/ml_models/test_question_answering.py
ERROR tests/ml_models/test_question_answering.py - KeyError: 'tokenizer_file'
ERROR tests/ml_models/test_question_answering.py - KeyError: 'tokenizer_file'

rawwar and others added 9 commits November 2, 2023 16:25
…h-project#330)

* move test-search to the end

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

---------

Signed-off-by: kalyan <[email protected]>
* add ml-commons train support

Signed-off-by: kalyanr <[email protected]>

* update __all__

Signed-off-by: kalyanr <[email protected]>

* fix test cases

Signed-off-by: kalyanr <[email protected]>

* sleep after bulk insert

Signed-off-by: kalyanr <[email protected]>

* fix formatting

Signed-off-by: kalyanr <[email protected]>

* remove unused imports

Signed-off-by: kalyanr <[email protected]>

* remove duplicate conftest

Signed-off-by: kalyan <[email protected]>

* delete duplicate conftest

Signed-off-by: kalyan <[email protected]>

* include pytest plugins

Signed-off-by: kalyan <[email protected]>

* revert pandas version

Signed-off-by: Kalyan <[email protected]>

* include license

Signed-off-by: kalyan <[email protected]>

* fix formatting

Signed-off-by: kalyan <[email protected]>

* fix imports order

Signed-off-by: kalyan <[email protected]>

* fix imports order

Signed-off-by: kalyan <[email protected]>

* lint fix

Signed-off-by: kalyan <[email protected]>

* update changelog

Signed-off-by: kalyan <[email protected]>

* revert testcases

Signed-off-by: kalyan <[email protected]>

* remove fixtures

Signed-off-by: kalyan <[email protected]>

* updated test cases

Signed-off-by: kalyan <[email protected]>

* lint fixes

Signed-off-by: kalyan <[email protected]>

* update fixture

Signed-off-by: kalyan <[email protected]>

* revert

Signed-off-by: kalyan <[email protected]>

* include train in MLCommons class as  a func

Signed-off-by: kalyan <[email protected]>

* remove model train

Signed-off-by: kalyan <[email protected]>

* fix tests

Signed-off-by: kalyan <[email protected]>

* revert nox

Signed-off-by: kalyan <[email protected]>

* add tests to model_train

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

---------

Signed-off-by: kalyanr <[email protected]>
Signed-off-by: kalyan <[email protected]>
Signed-off-by: Kalyan <[email protected]>
* fix

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

* update changelog

Signed-off-by: kalyan <[email protected]>

* update changelog

Signed-off-by: kalyan <[email protected]>

---------

Signed-off-by: kalyan <[email protected]>
…functionality to support Model Access Control (opensearch-project#332)

* init

Signed-off-by: kalyan <[email protected]>

* add search, delete and update

Signed-off-by: kalyan <[email protected]>

* add tests for register model group

Signed-off-by: kalyan <[email protected]>

* update cluster to 2.11

Signed-off-by: kalyan <[email protected]>

* test skipif

Signed-off-by: kalyan <[email protected]>

* fix

Signed-off-by: kalyan <[email protected]>

* add tests

Signed-off-by: kalyan <[email protected]>

* update matrix

Signed-off-by: kalyan <[email protected]>

* cancel in progress

Signed-off-by: kalyan <[email protected]>

* update concurrency

Signed-off-by: kalyan <[email protected]>

* job level concurrency

Signed-off-by: kalyan <[email protected]>

* fix

Signed-off-by: kalyan <[email protected]>

* fix

Signed-off-by: kalyan <[email protected]>

* fix tests

Signed-off-by: kalyan <[email protected]>

* tests passing

Signed-off-by: kalyan <[email protected]>

* isort fix

Signed-off-by: kalyan <[email protected]>

* fix action

Signed-off-by: kalyan <[email protected]>

* fix action

Signed-off-by: kalyan <[email protected]>

* fix action

Signed-off-by: kalyan <[email protected]>

* fix

Signed-off-by: kalyan <[email protected]>

* update changelog

Signed-off-by: kalyan <[email protected]>

* fix os dockerfile

Signed-off-by: kalyan <[email protected]>

* test

Signed-off-by: kalyanr <[email protected]>

* pass opensearch version

Signed-off-by: kalyanr <[email protected]>

* fix

Signed-off-by: kalyanr <[email protected]>

* fix

Signed-off-by: kalyanr <[email protected]>

* fix

Signed-off-by: kalyan <[email protected]>

* update OS dockerfile

Signed-off-by: kalyan <[email protected]>

* fix failing tests

Signed-off-by: kalyan <[email protected]>

* update dockerfile for 2.11.0

Signed-off-by: kalyan <[email protected]>

* remove disable warning

Signed-off-by: kalyan <[email protected]>

* fix upload model

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

* include reference

Signed-off-by: kalyan <[email protected]>

* pr fixes

Signed-off-by: kalyan <[email protected]>

* lint fix

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

* fix tests

Signed-off-by: kalyan <[email protected]>

* skip

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

* fix lint and increase coverage

Signed-off-by: kalyan <[email protected]>

* fix lint

Signed-off-by: kalyan <[email protected]>

* fix

Signed-off-by: kalyan <[email protected]>

* feedback fixes

Signed-off-by: kalyan <[email protected]>

* fix

Signed-off-by: kalyan <[email protected]>

* lint fix

Signed-off-by: kalyan <[email protected]>

* fix test cases

Signed-off-by: kalyan <[email protected]>

* pr feedback fixes

Signed-off-by: kalyanr <[email protected]>

* revert

Signed-off-by: kalyanr <[email protected]>

---------

Signed-off-by: kalyan <[email protected]>
Signed-off-by: kalyanr <[email protected]>
* replace is_datetime_or_timedelta_dtype

Signed-off-by: kalyan <[email protected]>

* fix formatting

Signed-off-by: kalyan <[email protected]>

* fix import order

Signed-off-by: kalyan <[email protected]>

* update changelog

Signed-off-by: kalyan <[email protected]>

---------

Signed-off-by: kalyan <[email protected]>
Signed-off-by: kalyanr <[email protected]>
@faradawn faradawn force-pushed the feature/summarization_model_conversation branch from 72cb969 to e149375 Compare November 16, 2023 17:47
@faradawn
Copy link
Author

faradawn commented Nov 16, 2023

The question answering model is finished and unit tests are added:

  • Change to AutoModelForQuestionAnswering and AutoTokenizer to support different bert models.
  • Create basic unit tests.
  • Create a unit test to check the traced model's output.
    • The question answering model will output a [start_index, end_index] to identify the answer segment within the input.
    • Thus, I plan to check the [start_index, end_index] against that of the official model, instead of cosine similarity.
  • Added sign-off for the previous three commits.

The sign-off tests was still failing, because of two incorrect sign-offs 17 commits by old authors Vincent Zhao and xinyual.

Is there a way to resolve it? Thanks!

@dhrubo-os
Copy link
Collaborator

@faradawn DCO is missing.

@dhrubo-os
Copy link
Collaborator

Looks like you added some other commits in your PR. Please fix that. Let me know if you are facing any issue on this.

@faradawn
Copy link
Author

Got it, Dhurbo! I plan to create a new branch from the current main, and commit files there, to solve the DCO issue!

@faradawn faradawn changed the title [Feature] Added Question Answering Model [Feature] Add Question Answering Model (old) Nov 30, 2023
@faradawn
Copy link
Author

Created a new branch off main to solve the sign off issue. The new PR is here: #349.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants