Generate using exported model and enable gemma2-2b in ExecuTorch #33707

guangy10 · 2024-09-26T00:49:28Z

What does this PR do?

Adding generate support for exported model.
Adding gemma2-2b to ExecuTorch with tests.
Adding an integration test for gemma-2b that we've enabled already.

Additional Test in `ExecuTorch`

Running gemma2-2b E2E:

cmake-out/examples/models/llama2/llama_main --tokenizer_path=tokenizer_gemma2.bin --model_path=gemma2.pte --prompt="My name is"
I 00:00:00.001356 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
I 00:00:00.001425 executorch:cpuinfo_utils.cpp:78] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.001431 executorch:cpuinfo_utils.cpp:158] Number of efficient cores 4
I 00:00:00.001434 executorch:main.cpp:65] Resetting threadpool with num threads = 6
I 00:00:00.005564 executorch:runner.cpp:65] Creating LLaMa runner: model_path=gemma2.pte, tokenizer_path=tokenizer_gemma2.bin
E 00:00:03.701808 executorch:tiktoken.cpp:79] invalid tiktoken line:
I 00:00:03.701845 executorch:runner.cpp:88] Failed to load tokenizer_gemma2.bin as a Tiktoken artifact, trying BPE tokenizer
I 00:00:03.767485 executorch:runner.cpp:94] Reading metadata from model
I 00:00:03.767512 executorch:runner.cpp:119] Metadata: get_vocab_size = 256000
I 00:00:03.767515 executorch:runner.cpp:119] Metadata: get_bos_id = 2
I 00:00:03.767517 executorch:runner.cpp:117] Methond use_sdpa_with_kv_cache not found, using the default value 0
I 00:00:03.767518 executorch:runner.cpp:119] Metadata: use_sdpa_with_kv_cache = 0
I 00:00:03.767520 executorch:runner.cpp:119] Metadata: get_n_eos = 1
I 00:00:03.767521 executorch:runner.cpp:117] Methond append_eos_to_prompt not found, using the default value 0
I 00:00:03.767522 executorch:runner.cpp:119] Metadata: append_eos_to_prompt = 0
I 00:00:03.767524 executorch:runner.cpp:119] Metadata: get_max_seq_len = 123
I 00:00:03.767525 executorch:runner.cpp:117] Methond enable_dynamic_shape not found, using the default value 0
I 00:00:03.767527 executorch:runner.cpp:119] Metadata: enable_dynamic_shape = 0
I 00:00:03.767529 executorch:runner.cpp:119] Metadata: use_kv_cache = 1
I 00:00:03.767575 executorch:runner.cpp:119] Metadata: get_n_bos = 1
I 00:00:03.767604 executorch:runner.cpp:167] RSS after loading model: 0.000000 MiB (0 if unsupported)
I 00:00:04.408489 executorch:runner.cpp:234] RSS after prompt prefill: 0.000000 MiB (0 if unsupported)
My name is Lale and I love to play with my dolls. I started to play with dolls at the age of two years old. My favorite activity is dancing. I would like to help people. I would like to travel to Spain. I like to be a vet. I love to help people. I would like to travel. I would like to be. I love to travel. I would like to be. I like to travel. I would like to be. I love to travel. I would like to be. I love to travel. I would like to be. I love to travel
I 00:00:23.188418 executorch:runner.cpp:246] RSS after finishing text generation: 0.000000 MiB (0 if unsupported)
PyTorchObserver {"prompt_tokens":4,"generated_tokens":118,"model_load_start_ms":1727310065089,"model_load_end_ms":1727310068851,"inference_start_ms":1727310068851,"inference_end_ms":1727310088272,"prompt_eval_end_ms":1727310069492,"first_token_ms":1727310069492,"aggregate_sampling_time_ms":180,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:23.188436 executorch:stats.h:84] 	Prompt Tokens: 4    Generated Tokens: 118
I 00:00:23.188438 executorch:stats.h:90] 	Model Load Time:		3.762000 (seconds)
I 00:00:23.188440 executorch:stats.h:100] 	Total inference time:		19.421000 (seconds)		 Rate: 	6.075897 (tokens/second)
I 00:00:23.188442 executorch:stats.h:108] 		Prompt evaluation:	0.641000 (seconds)		 Rate: 	6.240250 (tokens/second)
I 00:00:23.188444 executorch:stats.h:119] 		Generated 118 tokens:	18.780000 (seconds)		 Rate: 	6.283280 (tokens/second)
I 00:00:23.188446 executorch:stats.h:127] 	Time to first generated token:	0.641000 (seconds)
I 00:00:23.188480 executorch:stats.h:134] 	Sampling time over 122 tokens:	0.180000 (seconds)

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Gemma is ExecuTorch compatible #33709
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker
@gante
@amyeroberts
@qubvel

guangy10 · 2024-09-27T00:27:23Z

Why am I start getting fetch_tests - Unauthorized error on all the PRs I'm creating?

LysandreJik · 2024-09-27T09:46:11Z

This seems to be happening more often, @ydshieh would you know what might be happening? Otherwise I'll reach out to the CircleCI team, this is hindering work on the repo

LysandreJik · 2024-09-27T09:48:05Z

I've asked the CircleCI team @guangy10, very sorry for the inconvenience.

ydshieh · 2024-09-27T13:28:30Z

Might be related to

Allow CI could be run on private forked repositories (e.g. new model additions) (#33594)

But not happening on all external contributor's PRs. Strange

ydshieh · 2024-09-27T13:40:38Z

Could you first try ..?

https://support.circleci.com/hc/en-us/articles/360048210711-How-to-Refresh-User-Permissions

ydshieh · 2024-09-27T13:45:10Z

Another thing to check

If you're following the fork instead of the upstream repo

A user who submits a pull request to your repository from a fork, but no pipeline is triggered with the pull request. This can happen when the user is following the project fork on their personal account rather than the project itself on CircleCI.

This will cause the jobs to trigger under the user's personal account. If the user is following a fork of the repository on CircleCI, we will only build on that fork and not the parent, so the parent’s PR will not get status updates. 

In these cases, the user unfollows their fork of the project on CircleCI. This will trigger their jobs to run under the organization when they submit pull requests. Those users can optionally follow the source project if they wish to see the pipelines.

HuggingFaceDocBuilderDev · 2024-09-27T16:34:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

guangy10 · 2024-09-27T20:31:27Z

Could you first try ..?

https://support.circleci.com/hc/en-us/articles/360048210711-How-to-Refresh-User-Permissions

Weird. Used to work fine on my old PRs. Tried "Refresh Permission". Let's see if it can be unblocked.

@ydshieh It's getting worse after re-fresh the permissions. check_circleci_user starts failing.

guangy10 · 2024-09-27T20:56:39Z

Another thing to check

If you're following the fork instead of the upstream repo

A user who submits a pull request to your repository from a fork, but no pipeline is triggered with the pull request. This can happen when the user is following the project fork on their personal account rather than the project itself on CircleCI.

This will cause the jobs to trigger under the user's personal account. If the user is following a fork of the repository on CircleCI, we will only build on that fork and not the parent, so the parent’s PR will not get status updates. 

In these cases, the user unfollows their fork of the project on CircleCI. This will trigger their jobs to run under the organization when they submit pull requests. Those users can optionally follow the source project if they wish to see the pipelines.

"In these cases, the user unfollows their fork of the project on CircleCI. " I have no idea how to unfollow my fork on CircleCI. I don't even see if I'm following transformers on CircleCI. I created a CircleCI account with exact email and linked to my github, and I can't see any project I'm following..

guangy10 · 2024-09-27T21:02:06Z

https://support.circleci.com/hc/en-us/articles/360008097173-Troubleshooting-why-pull-requests-are-not-triggering-jobs-on-my-organization

so please be sure "Build forked pull requests" is enabled in Project Settings > Advanced

@ydshieh Their Wiki is so bad. I can't find this setting from anywhere.

ydshieh · 2024-09-28T10:00:49Z

Hmm, weird CircleCI issue ...

Could you check a last time:

top-left (like in the image): organization : switch to your own org, follow transformers there.
Optionally: Then switch to huggingface (if you can find it) and follow transformers there

If still not working, I could try to push a commit.

ydshieh · 2024-09-30T11:52:21Z

I push a commit to trigger the CI jobs. It runs now. Got to say I have no idea what is wrong on the CircleCI side with the permission issue as other external contributors' PRs don't have it.

guangy10 · 2024-09-30T18:48:43Z

Hi @ydshieh thank you so much for debugging this issue together with me. Per the message from CircleCI support team, it seems like the TRANSFORMERS_CONTEXT in the .circleci/config.yml is causing the permission issue, and I was suggested to remove it from the config.yml. It seems like it's newly added in #33594 last week. Still I'm not sure why it only affects my PRs but works for other users, but hope this message can give you more pointer to help resolve this issue on my fork and PRs. Thank you.

guangy10 · 2024-09-30T20:36:08Z

Since @ydshieh pushed a commit to trigger the CI and all CIs are green. Can I get a review on this PR? @amyeroberts @qubvel

ArthurZucker

Thanks! 🤗 asked a question for general direction as I am wondering if there is a way for us to improve our generate make it a bit more compatible !

src/transformers/integrations/executorch.py

tests/models/gemma2/test_modeling_gemma2.py

ydshieh · 2024-10-01T08:03:10Z

Hi @guangy10 Thank you for the message and contacting CircleCI team on your side too. I also think it is from #33594 from the beginning of seeing this issue. But as you mentioned, only a few (external) contributors face this issue, and the answer from CircleCI attached above is somehow confusing (i.e. it doesn't explain what actually causes it).

I will create a new personal github account and see if I can reproduce and come up with a solution.

ydshieh · 2024-10-01T09:05:38Z

@guangy10 Could you go https://app.circleci.com/home/ and see what you get there? 🙏

And https://app.circleci.com/projects/project-dashboard/github/guangy10/

ydshieh · 2024-10-01T09:27:06Z

Well I am able to reproduce now

#33850

ydshieh · 2024-10-01T15:32:27Z

Opened a PR #33866 for this CircleCI issue

guangy10 · 2024-10-01T17:44:29Z

Opened a PR #33866 for this CircleCI issue

@ydshieh Thanks a lot. Really appreciate the quick fix! Let me rebase on top of the fix

HuggingFaceDocBuilderDev · 2024-10-01T17:45:37Z

Hey! 🤗 Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

Add the run-slow label to the PR
When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
- If the pull request affects a lot of models, put at most 10 models in the commit message
A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

guangy10 · 2024-10-01T17:58:23Z

Add the run-slow label to the PR

@ydshieh I don't see there is a way for me to add run-slow label to the PR. Is there one?

ydshieh · 2024-10-01T18:03:35Z

Are you able to see what is shown below?

guangy10 · 2024-10-01T18:32:14Z

Are you able to see what is shown below?

No, that's why I'm asking if the message is asking me to do so or the repo maintainers. Also wondering if we have a something like pytorchbot that can do labeling for any user w/o having write permission to the repo.

ydshieh · 2024-10-01T18:47:04Z

Thanks for the feedback. We will improve the stuffs. For now I just added it manually here.

guangy10 · 2024-10-02T17:31:24Z

@ArthurZucker @qubvel could you help review for this PR? Once it's merged we can add such integration tests for all executorch-compatible models, not only test the exportability but also the generate.

ydshieh · 2024-10-03T08:22:56Z

Don't forget to push an (empty) commit with message: [run_slow] gemma, gemma2, thank you

ArthurZucker

Thanks 🤗

src/transformers/integrations/executorch.py

tests/models/gemma/test_modeling_gemma.py

guangy10 · 2024-10-03T17:56:07Z

Comments addressed.
Does it require a 2nd reviewer in order to merge?

ArthurZucker · 2024-10-04T14:42:05Z

Nope, just waiting for the CIs right now!

guangy10 · 2024-10-04T17:54:17Z

tests/models/gemma/test_modeling_gemma.py

+
+        tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", pad_token="</s>", padding_side="right")
+        EXPECTED_TEXT_COMPLETION = [
+            "Hello I am doing a project on the 1990s and I need to know what the most popular music was in the 1990s. I have looked on the internet and I have found a few things but I need more. I have found that the most popular music was rap",


@ArthurZucker I see a test failure may be related to this. I copied this over from another model with additional texts added, feel free to truncate it to a shorter message and push a new commit

Ah yeah won't have time to update them tonight! But yeah we can merge otherwise!

@ArthurZucker Let me know if there is anything I can do to merge this PR

Sorry lost track was wondering if you could update the expected values, ortherwise we can merge and @ydshieh will take care of them!

I will check this afternoon and merge

Could you share your torch version ..?

Mine is a dev version 2.6.0.dev20241007. What is the torch version where the test fails?

attn_implementation = None

It doesn't look correct. If we set it to None, what is the default attention being used? If I recall correctly, SPDA is the only attention impl that supports StaticCache.

Yeah, torch version seem to be the issue. Let me dig into it and will update here shortly

@ydshieh Okay, I can confirm that exporting gemma2 model will require torch==2.5.0 in order to work correctly.
I also verified that running the test on torch==2.4.1 or torch==2.0.0 will get the original prompt as output.

Here are the detailed package info:

pip list | grep torch executorch 0.5.0a0+f8cec53 executorchcoreml 0.0.1 torch 2.5.0 torchaudio 2.5.0 torchvision 0.20.0

RUN_SLOW=1 pytest tests/models/gemma2/test_modeling_gemma2.py -k test_export_static_cache -v

=========================================================================================== test session starts =========================================================================================== platform darwin -- Python 3.10.13, pytest-7.2.0, pluggy-1.0.0 -- /Users/guangyang/miniconda3/envs/executorch/bin/python cachedir: .pytest_cache hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/guangyang/transformers/.hypothesis/examples') rootdir: /Users/guangyang/transformers, configfile: pyproject.toml plugins: cov-4.1.0, anyio-4.4.0, xdist-3.3.1, hypothesis-6.84.2 collected 389 items / 388 deselected / 1 selected tests/models/gemma2/test_modeling_gemma2.py::Gemma2IntegrationTest::test_export_static_cache PASSED [100%]

BTW, the model do require torchvision >= 0.19.0, will fail with lower version. It's already implied by requiring torch>=2.5.0

I updated the PR to bump up the required torch version to 2.5.0.

torch==2.5.0 is not available publicly until Oct 17, 2024. We can either merge this PR as-is, the test_export_static_cache for gemma2 will be skipped in the CI until torch 2.5.0 is available in transformers. Or we can defer gemma2 enablement until torch 2.5.0 is available in transformers by splitting the gemma2 related code out of this PR and merge the rest. Personally I would prefer former, but let me know how you want to proceed. cc: @ArthurZucker @ydshieh

Great! We are still using python 3.8 and torch 2.4.1. I plan to switch to python 3.9 at the end of October. Before that, python 2.5 won't be available in the system. We will think about that.

We can merge this PR as it is. Thank you for checking (it works indeed).

If we set it to None, what is the default attention being used?

Just the native implementation of attention (in the modeling files) using torch operators.

If I recall correctly, SPDA is the only attention impl that supports StaticCache.

The native implementation of attention also works with StaticCache I believe.

guangy10 · 2024-10-10T21:37:24Z

Debug the gemma2 export issue and update the required torch version for it.

…gingface#33707) * Generate using exported model and enable gemma2-2b in ExecuTorch * [run_slow] gemma, gemma2 * truncate expected output message * Bump required torch version to support gemma2 export * [run_slow] gemma, gemma2 --------- Co-authored-by: Guang Yang <[email protected]>

guangy10 marked this pull request as ready for review September 26, 2024 01:16

guangy10 mentioned this pull request Sep 26, 2024

Export to ExecuTorch #32253

Open

26 tasks

guangy10 mentioned this pull request Sep 27, 2024

Fix passing str dtype to static cache #33741

Merged

2 tasks

guangy10 force-pushed the gemma2_executorch branch from 7ab24f0 to e96f0cd Compare September 27, 2024 20:33

ArthurZucker reviewed Oct 1, 2024

View reviewed changes

src/transformers/integrations/executorch.py Show resolved Hide resolved

tests/models/gemma2/test_modeling_gemma2.py Outdated Show resolved Hide resolved

guangy10 force-pushed the gemma2_executorch branch from 2a67bb9 to e30275e Compare October 1, 2024 17:45

guangy10 requested a review from ArthurZucker October 1, 2024 17:45

ydshieh added the run-slow label Oct 1, 2024

ArthurZucker approved these changes Oct 3, 2024

View reviewed changes

src/transformers/integrations/executorch.py Outdated Show resolved Hide resolved

src/transformers/integrations/executorch.py Show resolved Hide resolved

tests/models/gemma/test_modeling_gemma.py Show resolved Hide resolved

Generate using exported model and enable gemma2-2b in ExecuTorch

bc1b056

guangy10 force-pushed the gemma2_executorch branch from e30275e to bc1b056 Compare October 3, 2024 17:33

[run_slow] gemma, gemma2

d413043

guangy10 commented Oct 4, 2024

View reviewed changes

Guang Yang added 2 commits October 9, 2024 21:32

truncate expected output message

eb06f4a

Bump required torch version to support gemma2 export

f80e54f

[run_slow] gemma, gemma2

c21a0f7

ydshieh merged commit 7d97cca into huggingface:main Oct 11, 2024
17 of 20 checks passed

Generate using exported model and enable gemma2-2b in ExecuTorch #33707

Generate using exported model and enable gemma2-2b in ExecuTorch #33707

Conversation

guangy10 commented Sep 26, 2024 • edited Loading

What does this PR do?

Additional Test in ExecuTorch

Before submitting

Who can review?

guangy10 commented Sep 27, 2024

LysandreJik commented Sep 27, 2024

LysandreJik commented Sep 27, 2024

ydshieh commented Sep 27, 2024

ydshieh commented Sep 27, 2024

ydshieh commented Sep 27, 2024

HuggingFaceDocBuilderDev commented Sep 27, 2024

guangy10 commented Sep 27, 2024 • edited Loading

guangy10 commented Sep 27, 2024

guangy10 commented Sep 27, 2024

ydshieh commented Sep 28, 2024

ydshieh commented Sep 30, 2024

guangy10 commented Sep 30, 2024

guangy10 commented Sep 30, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ydshieh commented Oct 1, 2024

ydshieh commented Oct 1, 2024 • edited Loading

ydshieh commented Oct 1, 2024

ydshieh commented Oct 1, 2024

guangy10 commented Oct 1, 2024

HuggingFaceDocBuilderDev commented Oct 1, 2024

guangy10 commented Oct 1, 2024

ydshieh commented Oct 1, 2024

guangy10 commented Oct 1, 2024

ydshieh commented Oct 1, 2024

guangy10 commented Oct 2, 2024 • edited Loading

ydshieh commented Oct 3, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

guangy10 commented Oct 3, 2024

ArthurZucker commented Oct 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guangy10 Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guangy10 commented Oct 10, 2024

guangy10 commented Sep 26, 2024 •

edited

Loading

Additional Test in `ExecuTorch`

guangy10 commented Sep 27, 2024 •

edited

Loading

ydshieh commented Oct 1, 2024 •

edited

Loading

guangy10 commented Oct 2, 2024 •

edited

Loading

guangy10 Oct 10, 2024 •

edited

Loading