Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Trace Question Answering models to TorchScript and Onnx format #304

Open
dhrubo-os opened this issue Sep 27, 2023 · 5 comments
Open
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@dhrubo-os
Copy link
Collaborator

dhrubo-os commented Sep 27, 2023

We are planning to add more model supports in ML-Commons. opensearch-project/ml-commons#1164

The target of this issue is to trace 3 popular pre-trained Question answering models to TorchScript and Onnx format. In this repo we traced pre-trained sentence embedding models into torchscript and onnx

We need to build the similar method to trace summarization models. Primarily we can target these models:

  1. distilbert-base-cased-distilled-squad
  2. distilbert-base-uncased-distilled-squad
  3. bert-large-uncased-whole-word-masking-finetuned-squad

I created a feature branch : feature/summarization_model_conversation/. All the development of this issue should be done in that branch.

@dhrubo-os dhrubo-os added enhancement New feature or request untriaged good first issue Good for newcomers and removed untriaged labels Sep 27, 2023
@faradawn
Copy link

faradawn commented Oct 12, 2023

[2023-10-11] Hi, I would like to tackle this issue.

I plan to create a similiar script called distilbert model besides Sentence Transformer:

opensearch-py-ml/opensearch_py_ml/ml_models/sentencetransformermodel.py
opensearch-py-ml/opensearch_py_ml/ml_models/distilbertmodel.py

Specifically, I plan to

  1. Download a Distlled Bert model.
  2. Create a save_to_pt and to_onnx function utilizing torch.jit.trace.
  3. In the future, create a train function if needed.

Should I push directly to feature/summarization_model_conversation/ branch? Since the assignee to issue #303 is also pushing to this branch.

If there is anything I can do, please let me know.

[2023-10-12] I would like to select the "cased" model.

Among the three models, only the "cased" model answered my questions correctly.

Test 1

Context: I like hot drinks. Tea is hot. Coke is cold.
Question: Which one will I pick?

distilbert-base-cased-distilled-squad -> Tea.
distilbert-base-uncased-distilled-squad -> Coke.
bert-large-uncased-whole-word-masking-finetuned-squad -> Coke.

Test 2

Context: I live in Beijing, China. People in China speak Chinese. People in U.S. speak English. 
Question: What language do I speak?

distilbert-base-cased-distilled-squad -> Chinese.
distilbert-base-uncased-distilled-squad -> english.
bert-large-uncased-whole-word-masking-finetuned-squad -> english.

@dhrubo-os
Copy link
Collaborator Author

Sure assigning to you.

@dhrubo-os
Copy link
Collaborator Author

May be we can create a question answering model class and work in there? distilbertmodel is one type of model, so I don't think we should create separate classes for distilbertmodel

@dhrubo-os
Copy link
Collaborator Author

Yeah, you can raise the PR in feature/summarization_model_conversation/ branch too.

@faradawn
Copy link

Hi @dhrubo-os,

Got it -- make sense! Will 1) create a question_answering.py model, and 2) raise PR in that branch!

Thanks,
Faradawn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants